With the many new tools of molecular genetics described throughout this book, it has become easier and easier to clone genes defined by mutant phenotypes. Often, mutant phenotypes involve alterations in the process of development or physiology. In these cases, simply having a cloned copy of a gene is often not enough to critically examine the full range of effects exerted by that gene on the developmental or physiological process. In particular, normal development and physiology can vary significantly from one strain of mice to the next, and in the analysis of mutants, it is often not possible to distinguish subtle effects due to the mutation itself from effects due to other genes within the background of the mutant strain. To make this distinction, it is essential to be able to compare animals in which differences in the genetic background have been eliminated as a variable in the experiment. This is accomplished through the placement of the mutation into a genome derived from one of the standard inbred strains. It is then possible to perform a direct comparison between mutant and wild-type strains that differ only at the mutant locus. Phenotypic differences that persist between these strains must be a consequence of the mutant allele.
In the best of all possible worlds, the mutation of interest will have occurred spontaneously within a strain of mice that is already inbred. In this case, one can be reasonably confident that the mutant animal differs at only a single locus from non-mutant animals of the same strain. If the mutation allows homozygous viability and fertility, it can be propagated as a strain unto itself by inbreeding offspring from the original mutant animal. 16 If the mutation cannot be propagated in the homozygous state, it will be maintained by continuous backcrossing of heterozygous animals to the original inbred strain. In both cases, the new mutant strain is considered coisogenic because its genome is identical (isogenic) to that of its sister strain except at the mutant locus. In the past, coisogenic strains could only be obtained by luck when a spontaneous mutation happened to occur within an inbred strain. Today, one can initiate the production of coisogenic strains at any cloned locus through the use of the gene targeting technology described in Section 6.4.
Coisogenic strains are named with a compound symbol consisting of two parts separated by a hyphen: the first part is the full or abbreviated symbol for the original inbred strain; the second part is the symbol for the mutation or variant allele. If the mutation is maintained in a homozygous state within the coisogenic strain, the mutant symbol is used alone; if the mutation is maintained in a heterozygous state, the +/m genotype symbol is used (where m is the mutation). For example: if the mutation nude (nu) appeared in the BALB/cJ strain, and the new coisogenic strain was homozygous for this mutation, its complete symbol would be [BALB/cJ-nu]; if the semidominant lethal mutation T appeared in the C57BL/6J strain, and the new coisogenic strain was maintained by backcrossing to the parental strain, its symbol would be [B6-T/+].
A large number of mouse mutations and variants with interesting phenotypic effects have been identified and characterized over the last 90 years. Most of these mutations were not found within strains that were already inbred and, to date, most of the genes that underlie these mutations remain uncloned. Thus, in all of these cases, coisogenicity is not a possibility. However, even when a gene has been cloned, and the generation of a coisogenic mutant through the gene targeting technology is a possibility, this approach is still extremely tedious and, at the time of this writing, there is no guarantee of a successful outcome. There are other reasons why spontaneous mouse mutations are often important even when the gene underlying the mutation has been cloned. The spontaneous mutation may not be a "knockout" but instead may exert a more subtle effect on gene function which could provide special insight into the action of the wild-type allele. Furthermore, the phenotypic effects of many older mutations have been studied in tedious detail by classical embryologists and other scientists, and it can be advantageous to a contemporary scientist to build upon these classical studies.
The "low-tech" solution to the elimination of genetic background effects in the analysis of an established mutation, or any other genetic variant, is to use breeding protocols, rather than molecular biology, to generate strains of mice that approximate coisogenics to the greatest extent possible. Mice that have been bred to be essentially isogenic with an inbred strain except for a selected differential chromosomal segment are called congenic strains. The conceptual basis for the development of congenic mice was formulated by George Snell at the Jackson Laboratory during the 1940s and it led to the first and only Nobel Prize for work strictly in the field of mouse genetics.
Snell was interested in the problem of tissue transplantation. Long before 1944, it was known that tissues could be readily transplanted between individuals of the same inbred strain without immunological rejection, but that mice of different strains would reject tissue transplants from each other. Although these observations were a clear indication of the fact that genetic differences were responsible for tissue rejection, the number and types of genes involved remained entirely unknown. In absentia, these genes were named histocompatibility (or H) loci. The assumption was that the histocompatibility genes were responsible directly or indirectly for the production of tissue (or "histological") markers that could be distinguished as "self" or "non-self" by an animal's immune system. If transplanted tissue and a host recipient carried identical genotypes at all H loci, there would be no immunological response and the transplant would "take." However, if a single foreign allele at any H locus was present in the tissue, it would be recognized as foreign and attacked.
Although the number of histocompatibility loci was unknown, it was assumed to be large because of the rarity with which unrelated individuals both mice and humans accept each other's tissues. The logic behind this assumption was the empirical finding that polymorphic loci are most often di-allelic and not usually associated with more than three common alleles. If H loci showed a similar level of polymorphism, a large number would be required to ensure that there would almost always be at least one allelic difference between any two unrelated individuals. The experimental problem was to identify and characterize each of the histocompatibility loci in isolation from all of the others.
Snell's approach to this problem was to use a novel multi-generation breeding protocol based on repeated backcrossing to trap a single H locus from one mouse strain (the donor) in the genetic background of another (the inbred partner). The basic approach (developed mathematically in the following section) caused the newly forming congenic strain to become increasingly similar to the inbred partner at each generation, but only those offspring who remained histo-incompatible with the inbred partner were selected to participate in the next round of backcrossing. It was assumed that a difference at any one H locus would be sufficient to allow full histo-incompatibility. Thus, at the end of the process, Snell expected to find that each independently derived congenic line would have trapped the donor strain allele at a single random H locus. With random selection, all H loci could be isolated in different congenic strains so long as a large enough number were generated.
With this outcome in mind, Snell began the production of histo-incompatible congenic strains (originally called "congenic resistant" strains) with 125 independent lines of matings (Snell, 1978). Of these, 27 were carried through to the point at which it was possible to determine which H locus had been trapped. Surprisingly, 22 of the 27 lines had trapped the same locus, which was given the name H-2 (by chance, it was the second one identified). Contrary to expectations, the H-2 locus (now called the H2 complex since it is known to be a tightly linked complex of genes) acts, for all effective purposes, as the only strong determinant of histocompatibility. Snell and his predecessors were misled by the false assumption that only a limited number of alleles are possible at any one locus. Instead, a subset of genes within the H2 complex known as the class I genes are the most polymorphic in the genome with hundreds of alleles at each individual locus. The generic term "major histocompatibility complex" (MHC) is now used to designate this complex locus in mice as well as its homolog in all other mammalian species including humans, where it was historically called HLA (for human leukocyte antigen).
In the past, there were several different breeding schemes used to produce congenic mice depending on whether animals heterozygous for the donor allele at the differential locus were phenotypically distinguishable through a dominant form of expression from those not carrying the donor allele. It was often the case that the heterozygote could not be distinguished and, as a consequence, congenic strains had to be created through complex breeding schemes that allowed the generation of homozygotes for the variant allele in alternating generations. Today, identifying the heterozygote is almost never a problem since one will almost certainly map the locus of interest before undertaking the production of a congenic strain, and with a map position will come closely linked DNA markers. Therefore, the following discussion will be limited to the most direct, simple and efficient method of congenic construction known as the backcross or NX system, which is illustrated in Figure 3.4 (Flaherty, 1981). 17
The backcross system of congenic strain creation is straightforward in both concept and calculation. The first cross is always an outcross between the recipient inbred partner and an animal that carries the donor allele. The donor animals need not be inbred or homozygous at the locus of interest, but the other partner must be both. The second generation cross and all those that follow to complete the protocol are backcrosses to the recipient inbred strain. At each generation, only those offspring who have received the donor allele at the differential locus are selected for the next round of backcrossing.
The genetic consequences of this breeding protocol are easy to calculate. First, one can start with the conservative assumption that the donor (D) and recipient (R) strains are completely distinct with different alleles at every locus in the genome.
Then, all F1 animals will be 100% heterozygous D/R at every locus. According to Mendel's laws, equal segregation and independent assortment will act to produce gametes from these F1 animals that carry R alleles at a random 50% of their loci and D alleles at the remaining 50%. When these gametes combine with gametes produced by the recipient inbred partner (which, by definition, will have only R alleles at all loci), they will produce N2 progeny having genomes in which approximately 50% of all loci will be homozygous R/R and the remaining loci will be heterozygous D/R as illustrated in Figure 3.4. Thus, in a single generation, the level of heterozygosity is reduced by about 50%. Furthermore, it is easy to see that at every subsequent generation, random segregation from the remaining heterozygous alleles will cause a further ~50% overall reduction in heterozygosity.
In mathematical terms, the fraction of loci that are still heterozygous at the Nth generation can be calculated as [(1/2)N-1], with the remaining fraction [1 - (1/2)N-1] homozygous for the inbred strain allele. These functions are represented graphically in Figure 3.5. At the fifth generation, after only four backcrosses, the developing congenic line will be identical to the inbred partner across ~94% of the genome. By the tenth generation, identity will increase to ~99.8%. It is at this stage that the new strain is considered to be a certified congenic. As one can see by comparing Figures 3.2 and 3.5, the development of a congenic line will take approximately half the time that it takes to develop a simple inbred line from scratch. The reason for this more rapid pace is the fact that one of the two mates involved at every generation of congenic development is already inbred.
Backcrossing can continue indefinitely after the tenth generation, but if the donor allele does not express a dominant effect that is visible in heterozygous animals, it will be easier to maintain it in a homozygous state. To achieve this state, two tenth generation or higher carriers of the selected donor allele are intercrossed and homozygous donor offspring are selected to continue the line through brother-sister matings in all following generations. The new congenic strain is now effectively inbred, and in conjunction with the original inbred partner, the two strains are considered a "congenic pair".
In some cases, it will be possible to distinguish animals heterozygous for the donor allele from siblings that do not carry it. In a subset of these cases, as well as others, a donor allele may have recessive deleterious effects on viability or fertility. In all such instances, it is advisable to maintain the congenic strain by a continuous process of backcrossing and selection for the donor allele at every generation. Congenic strains that are maintained in this manner are considered to be in a state of "forced heterozygosity". There are two major advantages to pursuing this strategy whenever possible. First, the level of background heterozygosity will continue to be reduced by ~50% through each round of breeding. Second, the use of littermates with and without the donor allele as representatives of the two parts of the congenic pair will serve to reduce the effects of extraneous variables on the analysis of the specific phenotypic consequences of the donor allele.
The rapid elimination of heterozygosity occurs only in regions of the genome that are not linked to the donor allele which, of course, is maintained by selection in a state of heterozygosity throughout the breeding protocol. Unfortunately, linkage will also cause the retention of a significant length of chromosome flanking the differential locus which is called the differential chromosomal segment. Even for congenic lines at the same backcross generation, the length of this segment can vary greatly because of the inherently random distribution of crossover sites. Nevertheless, the expected average length of the differential chromosomal segment in centimorgans can also be calculated as [200 (1 - 2-N)/N] where N is the generation number. For all values of N greater than 5, this equation can be simplified to [200/N]. This function is represented graphically in Figure 3.6. As one can see, the average size of the differential segment decreases very slowly. At the tenth generation, there will still be, on average, a 20 cM region of chromosome encompassing the differential locus derived from the donor strain.
It is possible to reduce the length of the differential chromosomal segment more rapidly by screening backcross offspring for the occurrence of crossovers between the differential locus of interest and nearby DNA markers. As an example of this strategy, one could recover fifty congenic offspring from the tenth backcross generation and test each for the presence of donor alleles at DNA markers known to map at distances of one to five centimorgans on both sides of the locus of interest. It is very likely that at least one member of this backcross generation will show recombination between the differential locus and a nearby marker. The animal with the closest recombination event can be backcrossed again to the recipient strain to produce congenic mice of the eleventh backcross generation. By screening a sufficient number of these N11 animals, it should be possible to identify one or more that show recombination on the opposite side of the differential locus. In this manner, an investigator should be able to obtain a founder for a congenic strain with a defined differential chromosomal segment of five centimorgans or less after just eleven generations of breeding.
As the preceding discussion indicates, congenic strains differ from the previously described coisogenic strains in two important respects which must always be considered in the interpretation of unexpected data. First, congenic strains, especially those that have undergone only a minimum number of backcrosses, will have small random remnants of the donor strain so-called passenger loci scattered throughout the genome. In congenic strains maintained by inbreeding, the same passenger genes will be present in all members of the strain. In rare instances, traits attributed to the selected donor allele may actually result from one of these cryptic passenger genes. Such effects can be sorted out by breeding the congenic strain back to its original inbred partner. If a trait is due to a passenger gene, it will assort independently of the donor locus in subsequent backcrosses.
The second difference between a congenic strain and a coisogenic strain is in the chromosomal vicinity of the differential locus. Congenic strains will always differ from their inbred partner along a significant length of chromosome flanking the differential locus; coisogenic strains will only differ at the differential locus itself and nowhere else. Thus, there is always the possibility that phenotypic differences between the two members of a congenic pair are actually caused by a closely linked gene rather than the selected differential locus. This potential problem is much more difficult to resolve by simple breeding protocols.
The nomenclature used for congenic strains is so similar to that used for coisogenic strains that it is sometimes not possible to distinguish between the two by name alone. In such cases, it is necessary to go back to the original source publication for clarification. There are, however, two nomenclature components which are unique to congenic strains. The first is used in those cases where a mutant or variant allele is transferred from one defined genetic background onto another. For example, one might wish to transfer the albino (c) mutation from the BALB/c strain onto a B6 background. In cases of this type, the strain which "donates" the variant allele is symbolized after the recipient strain with the two strain symbols separated by a period. This is followed by a hyphen and the symbol for the variant allele. Thus, in the example just described, the congenic strain would be named B6.BALB-c.
The final nomenclature component is an indication of the number and type of crosses that have occurred subsequent to the original mating between the recipient and donor animals. In the derivation of any new congenic strain, the first cross is always an outcross, and the offspring are considered members of the F1 generation. The second cross is always a backcross, and the offspring are considered members of the N2 generation. (Note that there is no such thing as an N1 generation). The letter "N" is always used, followed by a subscripted number (Ni), to describe a series of backcross events leading to a particular generation of animals. However, remember that N10 generation offspring are the result of one outcross followed by an uninterrupted sequence of nine backcrosses to the same parental strain. Once a congenic strain is established, backcrossing to the parental stain is often stopped, and future generations are propagated by a simple inbreeding protocol. The number of generations of inbreeding is indicated, as always, with the filial generation symbol "F". For example, suppose that the albino mutation has been placed onto the B6 background by an outcross followed by 14 generations of backcrosses, after which a brother-sister mating regime is begun and followed for eight more generations. The offspring produced at this stage would be considered to be members of the N15F8 generation. When generational information is incorporated into the name of a congenic strain, the numbers are no longer subscripted. So, in this example, the complete name for the congenic animals at the stage indicated would be B6.BALB-c (N15F8).
Consomic strains are a variation on congenic strains in which a whole chromosome rather than one local chromosomal region is backcrossed from a donor strain onto a recipient background. In almost all cases, the donor chromosome is the Y. Like congenics, consomics are produced after a minimum of 10 backcross generations. Backcrossing to obtain consomics for the Y chromosome must be carried out in a single direction males that contain the donor chromosome are always crossed to inbred females of the recipient strain. For example, to obtain a B6 strain consomic for the M. m. castaneus Y chromosome, one would start with an outcross between a B6 female and a castaneus male. F1 males, and those from all subsequent generations, would also be mated with B6 females. After ten generations, the genetic background would be essentially B6, but the Y chromosome would be castaneus. This strain could be symbolized as B6-YCAS.
Conplastic strains are another variation on the congenic theme, except that in this case, the donor genetic material is the whole mitochondrial genome which is placed into an alternative host. Since the mitochondrial genomes carried by all of the classical inbred strains are indistinguishable, conplasticity makes sense only in the context of interspecific or inter-subspecies crosses. Conplastic lines are generated by sequential backcrossing of females from the donor strain to recipient males; this protocol is reciprocal to the one used for the generation of Y chromosome-consomics. For example, to obtain a B6 strain conplastic for the M. m. castaneus mitochondrial genome, one would start with an outcross between a B6 male and a castaneus female. F1 females, and those from all subsequent generations, would also be mated with B6 males. After ten generations, the nuclear genome would be essentially B6 with the same statistics that hold for congenic production (Figure 3.5), but all mitochondria would be derived from castaneus. This strain could be symbolized as B6-mtCAS.
Recombinant inbred (RI) strains are formed from an initial cross between two different inbred strains followed by an F1 intercross and 20 generations of strict brother-sister mating. This breeding protocol allows the production of a family of new inbred strains with special properties relative to each other that are discussed fully in Section 9.2. Different RI strains derived from the same pair of original inbred parents are considered members of a set. Each RI set is named by joining an abbreviation of each parental strain together with an "X". For example, RI strains derived from a C57BL/6J (B6) female and a DBA/2J male are members of the BXD set, and RI strains derived from AKR/J and C57L/J are members of the AKXL set. A complete listing of commonly used RI sets is given in Table 9.3. Each RI strain in a particular set is distinguished by appending a hyphen to the series name followed by a letter or number. Thus, BXD-15 is a particular RI strain that has been formed from an initial cross between a B6 female and a DBA male. At any point in time, it is always possible to add a new strain to a particular set through an outcross between the same two progenitor strains followed by 20 generations of inbreeding. The RI strains represent an important tool in the arsenal available for linkage studies of newly defined DNA loci.
Recombinant congenic strains (abbreviated as RC strains) are a variation on the recombinant inbred concept (Demant and Hart, 1986). As with RI strains, the initial cross is between two distinct inbred strains. However, the next two generations are generated by backcrossing, without selection, to one of the parental strains. This sequence is followed by brother-sister mating for at least 14 generations. Whereas standard RI strains have genomes that are a mosaic of equal parts derived from both parents (as detailed in Section 9.2.2), RC strains will have mosaic genomes that are skewed in the direction of the parent to which the backcrossing occurred such that a random 7/8 fraction of the genome will be derived from this parent, and a random 1/8 fraction will be derived from the other parent. Sets of RC strains have some interesting properties in terms of limiting the amount of the genome that has to be searched for multiple genes involved in quantitative traits. However, with the new PCR-based methods for genotyping highly polymorphic loci discussed in Section 8.3, the advantages of the RC strains appear to have been superseded and they have not been used widely by the mouse genetics community.