In Silico mapping of complex traits in inbred mice?
Elissa J. Chesler1, Sandra L. Rodriguez-Zas2, Robert W. Williams3, Jeffrey S. Mogil4
1University of Illinois at Urbana-Champaign, Department of Psychology, Neuroscience Program, Champaign, IL 61801 USA
2University of Illinois at Urbana-Champaign, Department of Animal Sciences, Neuroscience Program, Urbana, IL 61801 USA
3University of Tennessee Health Science Center, Center of Genomics and Bioinformatics, Memphis, TN 38163 USA
4McGill University, Dept. of Psychology, Montreal, QC H3A 1B1 Canada
Employing the genetic variability and polymorphic marker density present among inbred strains, it may be possible to map QTLs in the mouse using only the known marker strain distribution patterns (SDPs) and phenotypic data from inbred strain surveys. This is because most inbred strains are derived from a small number of progenitor strains. When strains have like alleles of polymorphic genetic markers, it is highly probable that these alleles are of common origin (i.e., identical-by-descent). In this case, identical marker alleles are likely to be co-inherited with identical QTL alleles.
The SDP of marker genotypes can be used in models of phenotype-genotype association. Such a technique has the potential to be very high resolution, inexpensive and rapid. Any type of marker or even gene polymorphism can be used, provided that allelic variants in a number of inbred strains have been identified. Such a method would also allow researchers to map complex traits in contexts where phenotyping unique individuals would not be feasible or appropriate, including observation of trait variation in multiple environments. Though the recombinant inbred strain sets already provide a resource by which such trait mapping can be achieved, QTL detection in these strains is limited to those genes polymorphic between progenitor strains, and by the resolution of recombination in these strain sets.
Grupe et al. (2001) have recently published a method of trait mapping based on the use of the inbred strain distribution of SNPs. Briefly, pair-wise strain differences in genotype are calculated for each SNP, and these are summed in 30 cM intervals each shifted by 10 cM. This produces arrays of genotypic differences which are then correlated with pair-wise phenotypic differences.
We believe that this method is statistically flawed, makes use of ad hoc peak detection procedures, has been subject to a questionable validation analysis, and is likely not to work well in practice. It has limited resolution and is heavily biased by the presence of linked markers. The creation of overlapping intervals results in the artificial appearance of increasing genotype-phenotype association approaching the putative QTL, by severely boosting the correlation of analyses between adjacent intervals. The method also has a very high error rate. The low statistical power of this method is artificially inflated through the calculation of pair-wise differences, a procedure that uses data redundantly. The majority of SNP polymorphisms are between the CAST/Ei strain and all other strains, and thus the method is not robust to the removal of this single strain. Although the reported validation of the method shows significant agreement with previously published F2 intercross, full-genome scan experiments (Grupe et al., 2001) this analysis is biased by a heavily unbalanced number of true negative results in comparison to the number of false positive, false negative and true positives, and further flawed by the determination of significance threshold without consideration of the pooling of comparison from many separate studies.
This mapping method will be compared to an alternative method in which microsatellite polymorphism information is used in a single-marker general linear model with significance thresholds determined empirically through permutation analysis. Using allele as a grouping variable in a linear model is theoretically more appropriate and meaningful in the context of other mapping methods because the assumption that one is testing linear relationships is more likely to be valid. The amount of polymorphism in a region should not be linearly related to the phenotypic difference as assumed in Grupe et al. (2001), unless one predicts multiple trait-related genetic polymorphisms in each interval, each having an equivalent additive effect on the trait. In contrast, using linear models with allele-based grouping, the additive allelic effect can be estimated from the single-marker analyses. Microsatellites currently offer much higher resolution than the SNP database, and with more strains genotyped, more statistical power. However, the microsatellite-based analysis requires the additional assumptions that markers identical-by-state are indeed identical-by-descent and that the QTLs are in a fixed relationship with the markers.
Presently, neither method has sufficient statistical power for attaining significance thresholds corrected to maintain a genome-wide error rate of 5%. Grupe et al. (2001) considered the top 5 or 10% of obtained results as peaks. This is arbitrary but has the dubious merit of guaranteeing some number of QTLs, whereas other techniques of error control can potentially identify no QTLs. The latter case has virtually no probability of being true for a heritable trait. Permutation analysis, in which for significance level ( the top (% of possible results are considered is a more valid means of determining reasonable significance thresholds, and requires no assumptions about the distribution of test statistics that one is likely to obtain.
Both methods will be compared to results from full-genome scans using high sample size experimental crosses. Other theoretical concerns with ‘in silico’ mapping will be discussed.
Chesler, E. J., Rodriguez-Zas, S. L., Mogil, J. S., Darvasi, A., Usuka, J., Grupe, A., Germer, S., Aud, D., Belknap, J. K., Klein, R. F., Ahluwalia, M. K., Higuchi, R. and Peltz, G. In Silico Mapping of Mouse Quantitative Trait Loci. Science, 294: 2423, 2001.
Grupe, A., Germer, S., Usuka, J., Aud, D., Belknap, J. K., Klein, R. F., Ahluwalia, M. K., Higuchi, R., and Peltz, G. In silico mapping of complex disease-related traits in mice. Science 292: 1915-1918, 2001.
A freely available implementation of the Grupe et al., 2001 algorithm can be downloaded from: