The International Complex Trait Consortium

Genomic Analysis of Transcriptional Networks: Combining Microarrays with Complex Trait Analysis

Robert. W. Williams1, Siming Shou1, Lu Lu1, Yanhua Qu1, Jintao Wang2, Kenneth Manly2, Elissa Chesler3, Hui-Chen Hsu4, John Mountz4, David Threadgill5

1Center for Neuroscience and Department of Anatomy and Neurobiology, University of Tennessee, Memphis, TN 38163
2Roswell Park Cancer Institute, Department of Molecular and Cellular Biology, Buffalo, NY 14263-0001 USA
3University of Illinois at Urbana-Champaign, Department of Psychology, Neuroscience Program, Champaign IL 61801 USA
4The University of Alabama at Birmingham, Department of Medicine, Division of Clinical Immunology and Rheumatology, Birmingham, AL 35294 USA
5Dept of Genetics, University of North Carolina at Chapel Hill

ABSTRACT
 
Variation in mRNA levels measured using microarrays is generated primarily by technical error, environmental differences, and gene variants. We are exploiting recombinant inbred (RI) strains in combination with microarrays to map large sets of cis- and trans-acting modulators of transcriptional activity in brain. The use of isogenic lines allows us to reduce non-genetic variance and to boost the effective heritability of array data by resampling. RI and RIX lines also make it feasilbe to analyze changes in the transcriptome during development and in response to environmental change.

We assessed the reliability of data generated using the Affymetrix U74Av2 GeneChip by processing as many as 12 replicate samples per tissue type. Samples were taken from liver, eye, retina, forebrain, olfactory bulb, cerebellum, and brainstem. Triplicate samples provide estimates of expression level with adequate reliability for mapping (mean coefficient of error of ~ 6%). Arrays of most RI lines were hybridized with forebrain samples obtained from females at three ages (1 chip at 5-6 weeks, 1 chip at 10-12 weeks, and 1 chip at 4-6 months). For purposes of this study, data were pooled without correction for age. Our mapping panel consists of 21 BXD strains, parental lines, and the F1 intercross. This data set is being expanded to include new BXD and LXS RI strains (Peirce et al. 2002; Bennett et al., 2002), and RIX progeny described by Threadgill and colleagues (2002).

We developed custom programs and procedures (derivatives of Map Manager QTX) to map each of 12422 transcripts using a genetic data set consisting of approximately 600 MIT markers with unique strain distribution patterns (http://www.nervenet.org/papers/bxn.html). Variation in expression level is being mapped in several stages: 1. Simple point-wise marker regression analysis; 2. Estimation of the probability that variation is controlled by a cis-acting QTL; 3. Composite interval mapping with control for the interval in which the transcript maps. From 1000 to 1 million permutations were run to estimate genome-wide probabilities of false positive error rate.

Over 650 cis- and trans-acting loci that modulate the expression have now been mapped at a conventional genome-wide significance level (P < 0.05, LOD typically > 3.5, uncorrected for the analysis of multiple traits). Given the large numbers of semi-independent traits that we have tested (n = 12422), a sizable fraction of QTLs are false positives; an issue that is best dealt with by replication and increased sample size. Even by the most conservative Bonferroni correction, we have mapped 40 to 50 new trans-acting QTLs. The relative abundance of cis-acting QTLs provides an independent means to assess statistical error. The prior probability that expression of a transcript is modulated by a linked QTL (for example, a promoter variant) is far higher than that of a randomly chosen unlinked interval. The relative abundance of cis- and trans-acting QTLs as a function of LOD score provides an index of type I error rate. Trans-acting QTLs with LODs above 5 are likely to be genuine whereas a majority of cis-acting QTLs with LOD scores above 1.5 to 3.0 are likely to be correctly identified.

The collection of QTLs have unexpected and intriguing patterns of distribution. For example, an interval on mid-distal Chr 6 harbors over 12% or just over 80 QTLs of all significant loci. Novel QTLs include a modulator of engrailed 1 expression with a LOD score of 5.4. Seven of 42 Hox transcripts are associated with QTLs. A set of three QTLs modulate the expression of beta catenin and Fos.

This massively parallel approach to mapping QTLs that control gene expression poses significant statistical challenges but offers an even more significant opportunities. The combined use of mircroarrays and recombinant inbred strains makes it possible to repeat an analysis using different tissue and different stages to explore developmental changes in transcriptional activity and control. With an enlarged RI panel and by judicious use of RIX progeny it should be feasible to tease apart transcriptional networks in exquisite detail.

[Supported by the Dunavant Chair and by the Informatics Center for Mouse Neurogenetics, a Human Brain Project funded jointly by the National Institute of Mental Health, National Institute on Drug Abuse, and the National Science Foundation (P20-MH 62009). We thank Dr. Divyen Patel (www.genome-explorations.com) for help generating microarray data]