| Literature DB >> 24704915 |
Elizabeth A Robb1, Mary E Delany2.
Abstract
Chicken developmental mutants are valuable for discovering sequences and pathways controlling amniote development. Herein we applied the advanced technologies of targeted sequence genomic capture enrichment and next-generation sequencing to discover the causative element for three inherited mutations affecting craniofacial, limb and/or organ development. Since the mutations (coloboma, diplopodia-1 and wingless-2) were bred into a congenic line series and previously mapped to different chromosomes, each targeted mutant causative region could be compared to that of the other two congenic partners, thereby providing internal controls on a single array. Of the ~73 million 50-bp sequence reads, ~76% were specific to the enriched targeted regions with an average target coverage of 132-fold. Analysis of the three targeted regions (2.06 Mb combined) identified line-specific single nucleotide polymorphism (SNPs) and micro (1-3 nt) indels. Sequence content for regions indicated as gaps in the reference genome was generated, thus contributing to its refinement. Additionally, Mauve alignments were constructed and indicated putative chromosomal rearrangements. This is the first report of targeted capture array technology in an avian species, the chicken, an important vertebrate model; the work highlights the utility of employing advanced technologies in an organism with only a "draft stage" reference genome sequence.Entities:
Year: 2012 PMID: 24704915 PMCID: PMC3899949 DOI: 10.3390/genes3020233
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Targeted Sequence Genomic Enrichment Methodology, Part I. Targeted genomic capture enrichment paired with next-generation sequencing technology was utilized to sequence, in their entirety, the three chromosomal regions associated with three developmental mutations in the chicken. Approximate timeline from contract with the service provider to transfer of data (steps 1–6) was ~6 months.
Figure 2Targeted Sequence Genomic Enrichment Methodology, Part II. Analysis of the targeted genomic capture enrichment and next-generation sequencing data allowed for the identification of variants and chromosomal rearrangements which were further validated using new mutant samples in order to identify the causative element for each of the developmental mutations. Approximate timeline from obtaining the raw SOLiD™ (colorspace) sequence reads to validating the sequence variants identified (steps 7–8) was 6 months.
Statistics of the three regions captured using the targeted genomic capture enrichment technology.
| Genetic Lines | Total Sequenced Region | |||||
|---|---|---|---|---|---|---|
| Chr | Size (nt) | No. Genes | No. Gaps A | NCBI Mean Quality Score | GC richness | |
| Diplopodia-1.003 | 1 | 595,821 | 19 | 5 | 87.3% | 39.6% |
| Wingless-2.331 | 12 | 465,570 | 13 | 15 | 94.5% | 41.7% |
| Coloboma.003 | Z | 994,523 | 6 | 0 | 94.5% | 39.4% |
| Total | - | 2,055,914 | 38 | 20 | - | - |
| Average | - | 685,305 | 12.7 | 7 | 92.1% | 40.2% |
Three unique chromosomal regions were targeted for utilization in the genomic capture enrichment technology. Descriptive measures include targeted chromosome and size, number of genes and sequence gaps found within targeted region, percent GC richness and quality score. A Gaps were identified through assessment of each linked, genomic region for the three mutations using the UCSC Genome Browser [4]. Gaps identified for the Dp-1.003 chromosome 1 region were ≤1000 nt while Wg-2.331 region gaps were ≤1500 nt. No gaps were present in the 995 kb chromosome Z region for the Co.003 genetic line. However, given the repetitive nature of the Z chromosome, probes were only generated for 990,270 of the 994,523 nts. Note that the 4253 nts were present in a genomic region shown to no longer be linked to the Co.003 mutation [1].
Summary of targeted genomic enrichment sequencing results for three developmental mutant congenic lines.
| Genetic Lines | Sequencing Read Statistics | Region Reduction (Post-Analysis) | ||||||
|---|---|---|---|---|---|---|---|---|
| Total Reads Generated A | Total Mapped Reads B | Average Coverage C | Region Sequenced D | No. Coverage Gaps E | Remaining Size (nt) | Fold Reduction | No. Genes Remaining | |
| Diplopodia-1.003 | 21.0M | 15.5M | 107.2× | 96.9% | 232 | 261,947 | 2.3× | 12 |
| Wingless-2.331 | 36.0M | 28.3M | 217.1× | 85.3% | 274 | 259,545 | 1.8× | 13 |
| Coloboma.003 | 15.7M | 11.9M | 72.1× | 98.4% | 525 | 306,847 | 1.3× | 5 |
| Total | 72.7M | 55.7M | - | - | 1,031 | 828,339 | - | 30 |
| Average | 24.2M | 18.6M | 132.1× | 93.5% | 344 | 276,113 | 1.8× | 10 |
Targeted genomic capture enrichment results: read statistics and post-analysis assessment. Descriptive measures include: total reads generated, total reads mapped to targeted region (2.06 Mb), average fold coverage, percentage region sequenced, and number of sequence/coverage gaps. Assessment of the sequence reads resulted in a reduction in the linked-region size. A Total number of reads generated, M = million; B Total reads mapped to the 2.06 Mb sequence used in the capture array, M = million. The entire chicken genome (Gallus gallus v2.1 (galGal3) assembly (WASHUC2, May 2006)) was used in the mapping (chr 1–28, W, and the 995 kb Z) rather than only the three linked chromosomal segments; C Average fold sequence coverage of the 2.06 Mb targeted region; D Region sequenced refers to the dp-1, wg-2, or co (596, 466, and 995 kb, respectively) sequence information in which RNA-bait probes were generated from and sequence data aligned to. For example, of the 595 kb dp-1 sequence information for which RNA-bait probes were generated, 96.9% or ~577 kb had sequence reads map to it. Thus 3.1% of the region had no reads mapped to it; E A gap was defined as any fragment of DNA absent in a genetic line relative to the reference genome sequence. A gap could be due to: (1) an RNA-bait probe was not designed correctly; (2) the NGS technology failed to sequence the fragment of DNA due to sequence structure/quality (e.g., repeat, GC-rich); (3) the reference genome was composed of unknown sequence (N) and therefore a probe could not be generated for that region but the size of the fragment was still known and accounted for in the reference genome; and (4) the genetic line of interest does not have that fragment of DNA, i.e., the gap is a “true deletion”.
Assessment of SNPs and micro-indels within three congenic lines: Diplopodia-1.003, Wingless-2.331, Coloboma.003.
| Chr | # of SNPs in Sequenced Region | # of Short Indels (1–23 nt) in Sequenced (2.06 Mb) Region | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Dp-1.003 | Wg-2.331 | Co.003 | Dp-1.003 | Wg-2.331 | Co.003 | ||||
| Insertions | Deletions | Insertions | Deletions | Insertions | Deletions | ||||
| 1 |
| 2,434 | 2,478 |
|
| 116 | 130 | 109 | 110 |
| 12 | 1,245 |
| 1,225 | 79 | 101 |
|
| 71 | 93 |
| Z | 2,903 | 1,787 |
| 128 | 171 | 150 | 185 |
|
|
Total number of SNPs, insertions, and deletions identified within the targeted regions of interest.
Figure 3Illustration of the comparative genomic analysis strategy to identify mutant-specific polymorphisms. Comparative genomic analyses (CGA) were conducted to eliminate SNPs and indels shared between two or more of the congenic lines or to previously identified polymorphisms. Those variants not observed in any other genetic line or chicken species were denoted as unique and potential causative elements. An example, to illustrate how this analysis was performed, is shown in reference to the UCD-Coloboma.003 linked region on chromosome Z. All three genetic lines were assessed at the same location for polymorphisms, those variants shared between Co.003 and any other line were removed. Only those elements unique to that region in the Coloboma.003 genetic line are of interest and should be further assessed for causation.