| Literature DB >> 33874978 |
Ruqian Lyu1,2, Vanessa Tsui3,4, Davis J McCarthy5,6, Wayne Crismani7,8.
Abstract
Genetic maps have been fundamental to building our understanding of disease genetics and evolutionary processes. The gametes of an individual contain all of the information required to perform a de novo chromosome-scale assembly of an individual's genome, which historically has been performed with populations and pedigrees. Here, we discuss how single-cell gamete sequencing offers the potential to merge the advantages of short-read sequencing with the ability to build personalized genetic maps and open up an entirely new space in personalized genetics.Entities:
Mesh:
Year: 2021 PMID: 33874978 PMCID: PMC8054432 DOI: 10.1186/s13059-021-02327-w
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Meiosis and linkage. a Meiosis involves two rounds of cell divisions following DNA replication. In the first division, meiosis I, homologous chromosomes pair for crossover formation, creating a physical link (chiasmata), to exchange some genetic material and resulting in two haploid cells that have half the number of chromosomes as the original cell. Meiosis II occurs when the sister chromatids segregate to generate four genetically unique gametes (sperm or egg). b Comparison of genetic, cytological, and physical maps, all of which characterize genetic markers. A genetic map is based on the frequency of co-segregation of linked markers. A cytological map can be constructed by labeling certain DNA markers or particular staining methods. cM, centiMorgan; Mbp, megabase base pair of DNA. c An iMap with an inversion does not alter the DNA sequence but changes the linear ordering of markers. Translocation as a result of chromosome breakage and fusion affects crossover formation and changes the marker distance
Fig. 2Pedigree-based maps and iMaps. Genetic maps can be constructed de novo using pedigree data. Personalized genetic maps can be inferred from gamete-derived sequence data
Fig. 3Representation of plate- and droplet-based methods for isolating and sequencing gametes and in silico map construction pipeline. a Schematic representation of plate- and bead-based approaches for profiling gametes. Gametes collected from a donor can be processed through plate-based methods; the single gamete is projected to individual compartments, and DNA amplification is carried out within each chamber for each gamete that can further be used for genotyping using SNP array or DNA sequencing. Bead-based methods either encapsulate single gametes or HMW DNA in a droplet with beads that contain a barcode. Pooled barcode-tagged reads are sequenced in parallel and provide gamete sources or HMW DNA sources. b General pipeline for crossover detection for individuals using gamete-based data. Reads from multiple gametes are aggregated for hetSNP identification. hetSNPs are phased based on SNP co-appearance in gametes. Genotypes of gametes and phased hetSNPs are used for constructing haplotypes of gametes that can be further used for crossover detection. c Illustration of marker ordering for an individual which shows different ordering and distancing from the reference genetic map
A comparison of the strengths and limitations of different methods for assaying crossovers. The assumptions used to build the table are the following: A male mouse would be used to obtain sperm. The mouse genome is approximately 2.5 Gbp, and generation times require 12 weeks. In bead-based single-cell experiments, 1000 gametes are captured and sequenced. 1× genome coverage is used in all sequence-based experiments. Representative costs have been used from experiments in our laboratories, or appropriate quotes, and are intended as a guide only. Costs are only for reagents and sequencing. Costs are not included for wet-lab and bioinformatics researchers, animal housing costs, and equipment.
Fig. 4Statistical methods for crossover detection using a hidden Markov model. a The true haplotypes of the markers (h1, h2) are unknown, and the transitions between haplotype states are modeled by a hidden Markov model. The genotypes are observed from data and are controlled via an emission model b. Integrating information from the observed data, the transition model, and emission model, the most likely true haplotype sequence is inferred. b Example of a missing markers in Gamete 2 and Gamete 4. In Gamete 4 missing data for marker m4 creates ambiguity in crossover identification. Statistical inference methods can be used to probabilistically assign crossovers to the subinterval where information is missing
| Bulk sequencing | The sequencing of a pool of nuclear DNA from many cells belonging to an individual. |
| centiMorgan | A map unit for measuring recombination to infer relative distances between linked markers. |
| Crossover | Large reciprocal exchanges of DNA between homologous chromosomes which produce recombinant chromatids. Crossovers are required for the correct segregation of chromosomes during meiosis. |
| Crossover interference | A biological phenomenon where one meiotic crossover reduces the probability of a crossover at an adjacent internal, in the same meiosis, in a distance-dependent manner. |
| DNA double-strand break (DSB) | Programmed DNA double-strand breaks are formed to initiate homologous recombination in meiosis I and are essential to make crossovers. |
| Genetic distance | A measure of the likelihood of a crossover occurring between two genetic markers. The smaller the genetic distance between markers, the more likely they will be inherited together. |
| Genome-wide association study | An approach to link genetic variants with traits. |
| Haplotype | A group of alleles that tend to segregate together and are inherited from one parent. |
| Heterochiasmy | Difference in the frequency and location of crossovers occurring between sexes of the same species. |
| Hidden Markov model | Markov process models random system where the future is independent of the past given the current status. Hidden Markov model applies to systems with Markov property, with unobservable (hidden) variable. It consists of two layers of stochastic processes including Markovian transitions between hidden states (transition model) along sequential time steps and the distribution of observable data (emission model) over hidden states. |
| Individual genetic map (iMap) | The genetic map derived from an individual’s gametes. |
| Mapping functions | Haldane, cM = − 0.5 × ln(1 − 2 Kosambi, cM = 0.25 × ln ((1 + 2 The Haldane mapping function adds mathematical adjustments to the recombination fraction. It assumes that crossover events are random and independent along the chromosome, and the number of crossover events between two loci follows a Poisson distribution. Haldane’s mapping function adjusts underestimated crossover rate in larger intervals that are likely to have unobserved even number of crossovers. Kosambi’s mapping function was derived based on Haldane’s and takes consideration of crossover interference. |
| Markers | Polymorphic DNA sequences that are located at known positions in the genome and used as genetic features to distinguish sequences between people/populations. |
| Markov chain | A stochastic system which models the transitioning among states. The probability of transitioning to any particular state is dependent solely on the current state and time elapsed. |
| Non-crossover | A type of homologous recombination used in the repair of DNA double-strand breaks, which does not result in a crossover. The repair between two homologs is non-reciprocal. |
| Physical distance | An absolute measure of DNA length in nucleotide base pairs. |
| Quantitative trait locus (QTL) | A genomic region that contributes to a trait of interest. QTL mapping often aims to identify the gene that controls the measurable trait. |
| Single-cell sequencing | The sequencing of nucleic acids from an individual cell using optimized short-read sequencing technology. Sequencing single gametes of an individual overcomes the necessity of recruiting thousands of family trios to generate a reference genetic map that is not a representation of any individual. |
| Single-nucleotide polymorphism (SNP) | Alteration of a single nucleotide at a specific position in the genome that is present in a large fraction of the population. |
| Structural variation (SV) | Large genomic alterations, which can include inversions, duplications, translocations, insertions, and deletions. The minimum size is arbitrary, but in this review, SV refers to events > 50 kb unless specified otherwise. |