| Literature DB >> 20838655 |
Rami A Dalloul1, Julie A Long, Aleksey V Zimin, Luqman Aslam, Kathryn Beal, Le Ann Blomberg, Pascal Bouffard, David W Burt, Oswald Crasta, Richard P M A Crooijmans, Kristal Cooper, Roger A Coulombe, Supriyo De, Mary E Delany, Jerry B Dodgson, Jennifer J Dong, Clive Evans, Karin M Frederickson, Paul Flicek, Liliana Florea, Otto Folkerts, Martien A M Groenen, Tim T Harkins, Javier Herrero, Steve Hoffmann, Hendrik-Jan Megens, Andrew Jiang, Pieter de Jong, Pete Kaiser, Heebal Kim, Kyu-Won Kim, Sungwon Kim, David Langenberger, Mi-Kyung Lee, Taeheon Lee, Shrinivasrao Mane, Guillaume Marcais, Manja Marz, Audrey P McElroy, Thero Modise, Mikhail Nefedov, Cédric Notredame, Ian R Paton, William S Payne, Geo Pertea, Dennis Prickett, Daniela Puiu, Dan Qioa, Emanuele Raineri, Magali Ruffier, Steven L Salzberg, Michael C Schatz, Chantel Scheuring, Carl J Schmidt, Steven Schroeder, Stephen M J Searle, Edward J Smith, Jacqueline Smith, Tad S Sonstegard, Peter F Stadler, Hakim Tafer, Zhijian Jake Tu, Curtis P Van Tassell, Albert J Vilella, Kelly P Williams, James A Yorke, Liqing Zhang, Hong-Bin Zhang, Xiaojun Zhang, Yang Zhang, Kent M Reed.
Abstract
A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20838655 PMCID: PMC2935454 DOI: 10.1371/journal.pbio.1000475
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 8.029
Summary of the Roche 454 and Illumina GAII data used for assembling the turkey genome sequence.
| Number of Reads (Million) | Average Usable Read Length (bp) | |
| 454/Roche data: | ||
| Shotgun | 13 | 366 |
| 3 Kbp paired end | 3 | 180 |
| 20 Kbp paired end | 1 | 195 |
| Illumina data: | ||
| Shotgun | 200 | 74 |
| 180 bp paired end | 200 | 74 |
Chromosome sizes in the draft turkey genome assembly.
| Chromosome | Number of Contigs | Number of Bases (Excluding Gaps) |
| 1 | 26,557 | 181,826,552 |
| 2 | 14,384 | 106,718,223 |
| 3 | 12,649 | 91,132,767 |
| 4 | 9,170 | 68,844,569 |
| 5 | 7,553 | 56,965,239 |
| 6 | 6,534 | 48,705,183 |
| 7 | 4,755 | 35,338,084 |
| 8 | 4,751 | 35,279,744 |
| 9 | 2,286 | 18,014,631 |
| 10 | 3,733 | 28,668,829 |
| 11 | 2,720 | 22,659,912 |
| 12 | 2,372 | 18,944,919 |
| 13 | 2,354 | 18,696,996 |
| 14 | 2,367 | 19,181,786 |
| 15 | 2,265 | 16,791,072 |
| 16 | 1,967 | 14,411,805 |
| 17 | 1,635 | 12,015,459 |
| 18 | 51 | 139,801 |
| 19 | 1,399 | 9,478,246 |
| 20 | 1,424 | 9,943,105 |
| 21 | 1,328 | 9,405,728 |
| 22 | 1,865 | 13,252,797 |
| 23 | 937 | 6,420,024 |
| 24 | 569 | 3,613,335 |
| 25 | 834 | 4,963,017 |
| 26 | 1,040 | 5,925,429 |
| 27 | 161 | 687,724 |
| 28 | 717 | 4,244,239 |
| 29 | 803 | 3,649,262 |
| 30 | 693 | 3,524,564 |
| W | 50 | 108,225 |
| Z | 24,970 | 47,735,835 |
| Un | 7,748 | 18,627,908 |
| Total | 152,641 | 935,915,009 |
Major characteristics of the turkey and chicken genome assemblies.
| Turkey 2.01 | Chicken 2.1 | |
| Number of scaffolds >1 Kb | 26,917 | 32,767 |
| Number of contigs >1 Kb | 128,271 | 98,612 |
| Scaffolded sequence (excluding gaps) | 931 Mb | 1,047 Mb |
| Largest scaffold | 9 Mb | 33 Mb |
| N50 scaffold size | 1.5 Mb | 7.1 Mb |
| N50 contig size | 12,594 b | 36,000 b |
| Largest contig | 90 Kb | 442 Kb |
| Contig coverage | 17× | 7× |
| Cost of sequencing | <$0.25 M | >$10 M |
Figure 1Synteny map of chicken (left) and turkey (right).
Each chromosome is assigned a color in the chicken chromosome, ranging from red (Chr 1) through the spectrum to yellow, green, and blue. Turkey chromosomes are shown using the same colors, indicating differences due to chromosome numbering; e.g., turkey Chr 8 matches chicken Chr 6. The figure shows that there have been no large-scale chromosomal rearrangements in either species since their divergence.
Figure 2Venn diagram showing the amount of sequence (in Mbp) aligned among the three avian genomes.
Numbers in brackets refer to the amount of sequence that is part of the alignments, but as species-specific insertions. For instance, out of the 142 Mbp of the turkey genome not aligned to the other two genomes, 105 Mbp are included in the alignments as turkey-specific insertions. The lower panel shows an example alignment. Regions where all three species are aligned are highlighted with a black line, and species-specific sequence is shown with an arrow.
Conservation of repetitive DNA.
| Repeat Group | Number of Repeats | Total Length of Repeats | Total Number of Conserved Bases | As a Percentage |
| Eulor | 1,581 | 214,392 | 130,210 | 60.73% |
| UCONS | 3,281 | 508,818 | 262,553 | 51.60% |
| MER | 1,686 | 225,328 | 127,573 | 56.62% |
| X*-LINE | 876 | 125,896 | 63,185 | 50.19% |
| SINE | 2,900 | 413,703 | 166,890 | 40.34% |
Listed are the numbers of repeats and their conservation for the most conserved repeats.
GERP constrained elements were used to define the set of conserved bases.
Figure 3Top 20 most expanded and contracted gene families in turkey genome assembly as compared to the chicken.
The axis is the log ratio of copy number in turkey versus copy number in chicken.
Top 20 avian-specific gene families with known functions.
| Family ID | Turkey | Chicken | Zebra Finch | Non-Avian Species | Description |
| ENSFM00500000278106 | 5 | 5 | 2 | 0 | Cytidine deaminase |
| ENSFM00250000010664 | 1 | 3 | 1 | 0 | C type lectin |
| ENSFM00520000517850 | 1 | 3 | 10 | 0 | Class II histocompatibility antigen b l, beta chain fragment |
| ENSFM00250000011687 | 1 | 2 | 1 | 0 | Early response to neural induction ERNI |
| ENSFM00540000719139 | 1 | 1 | 1 | 0 | 16 kDa beta galactoside binding lectin C, 16 galectin (CG 16) |
| ENSFM00250000030665 | 1 | 1 | 1 | 0 | 2 receptor |
| ENSFM00500000306697 | 1 | 1 | 1 | 0 | 28 s ribosomal S6 mitochondrial S6mt MRP-S6 |
| ENSFM00540000721500 | 1 | 1 | 1 | 0 | Amyloid precursor |
| ENSFM00250000013480 | 1 | 1 | 1 | 0 | B6 BU |
| ENSFM00500000292985 | 1 | 1 | 1 | 0 | CD30 ligand |
| ENSFM00540000719360 | 1 | 1 | 2 | 0 | CD30 precursor |
| ENSFM00500000279114 | 1 | 1 | 2 | 0 | CD47 glycoprotein |
| ENSFM00540000720384 | 1 | 1 | 1 | 0 | CD5 precursor |
| ENSFM00540000719692 | 1 | 1 | 1 | 0 | CD80 |
| ENSFM00500000291092 | 1 | 1 | 1 | 0 | CD86 precursor |
| ENSFM00500000281340 | 1 | 1 | 1 | 0 | CENP-C |
| ENSFM00500000296154 | 1 | 1 | 1 | 0 | Centromere Q [CENP-Q] |
| ENSFM00540000721306 | 1 | 1 | 1 | 0 | Centromere U [CENP-U]; centromere p50 of 50 kDa CENP-50 MLF1 interacting protein |
| ENSFM00500000287565 | 1 | 1 | 1 | 0 | Cholecystokinin precursor CCK [contains cholecystokinin (CCK); CCK-8; CCK-7] |
| ENSFM00560000772828 | 1 | 1 | 1 | 0 | COMM domain-containing protein 6 |
Figure 4Lineage events in the turkey: variation of (a) synonymous (d), (b) non-synonymous (d), and (c) d/d ratios based on chromosome sizes.
The chromosome lengths are expressed as log base2 (nucleotide lengths in base pairs).
Figure 5Rapid evolution of sex-linked genes in birds.
Figure 6Significant GO terms in the accelerated genes in: (a) turkey compared with chicken and (b) chicken compared with turkey.
Number in parenthesis indicates non-redundant number of genes in each group. The representative term in each group was selected manually.
Figure 7Comparison of the d ratios between innate immune related genes and other genes.
Error bars indicate 95% standard error of the mean d/d ratios. Significance tests were performed using Wilcoxon rank sum test since the d/d ratios did not follow normal assumptions (Table S9).
Innate immune system genes found in turkey, chicken, zebra finch, mouse, and human genomes.
| Birds | Mammals | ||||
| Gene Family Name | Turkey | Chicken | Zebra Finch | Human | Mouse |
|
| |||||
| CCL chemokines | 11 | 14 | 11 | 27 | 24 |
| CXCL/CX3CL chemokines | 7 | 9 | 9 | 12 | 13 |
| XCL chemokines | 1 | 2 | 1 | ||
| Chemokine receptors | 14 | 15 | 14 | 20 | 20 |
|
| |||||
| IL-1 | 2 | 4 | 2 | 10 | 9 |
| IL-1 receptor family | 11 | 11 | 11 | 11 | 11 |
| IL10 family | 4 | 4 | 4 | 6 | 5 |
| IL-10 receptor family | 5 | 5 | 5 | 5 | 5 |
| IL-12 receptor family | 2 | 2 | 2 | 4 | 4 |
| IL-16 family | 1 | 1 | 1 | 1 | 1 |
| IL-17 family | 5 | 5 | 5 | 6 | 6 |
| IL-32 | 1 | ||||
| IL-33 | 1 | 1 | |||
| IL-5 family | 1 | 1 | 1 | 1 | 1 |
| IL-6 family | 3 | 3 | 4 | 7 | 7 |
| IL-6 receptor family | 3 | 4 | 5 | 7 | 9 |
| Common gamma chain family | 8 | 8 | 8 | 8 | 8 |
| Common gamma chain receptor family | 10 | 12 | 11 | 12 | 12 |
| Other interleukins receptors | 4 | 4 | 5 | 7 | 7 |
|
| |||||
| Interferons | 4 | 8 | 5 | 21 | 23 |
| Interferon receptors | 6 | 6 | 6 | 6 | 6 |
| CSFs | 4 | 4 | 3 | 4 | 4 |
| CSF1R | 1 | 1 | 1 | 1 | 1 |
| TGFs | 2 | 3 | 3 | 3 | 3 |
|
| |||||
| TNFSF | 9 | 10 | 10 | 18 | 18 |
| TNFRSF | 15 | 17 | 20 | 20 | 19 |
|
| |||||
| Defensins | 18 | 17 | 22 | 39 | 45 |
|
| |||||
| NODL receptor family | 6 | 6 | 6 | 22 | 32 |
| RNA helicases | 2 | 2 | 3 | 3 | 3 |
| TLRs | 10 | 10 | 11 | 10 | 12 |
|
|
|
|
|
|
|
Major repeat content in the turkey genome (also see Dataset S1).
| Repeat Type | Count | Total bp (% of Genome) |
|
| 166,756 | 49,130,504 (4.81) |
| LTR retrotransposon | 16,181 | 5,181,044 (0.51) |
| Mariner (Class II DNA transposon) | 19,527 | 6,640,260 (0.65) |
| Unclassified interspersed repeats | 83,060 | 10,010,105 (0.98) |
| Total interspersed repeats | 285,524 | 70,961,913 (6.95) |
| Low complexity and simple repeats | 200,695 | 7,872,500 (0.77) |
| Grand total | 486,219 | 78,128,846 (7.63) |