| Literature DB >> 25061512 |
Ganeshkumar Ganapathy1, Jason T Howard1, James M Ward2, Jianwen Li3, Bo Li3, Yingrui Li3, Yingqi Xiong3, Yong Zhang3, Shiguo Zhou4, David C Schwartz4, Michael Schatz5, Robert Aboukhalil5, Olivier Fedrigo6, Lisa Bukovnik7, Ty Wang2, Greg Wray8, Isabelle Rasolonjatovo9, Roger Winer10, James R Knight10, Sergey Koren11, Wesley C Warren12, Guojie Zhang3, Adam M Phillippy11, Erich D Jarvis1.
Abstract
BACKGROUND: Parrots belong to a group of behaviorally advanced vertebrates and have an advanced ability of vocal learning relative to other vocal-learning birds. They can imitate human speech, synchronize their body movements to a rhythmic beat, and understand complex concepts of referential meaning to sounds. However, little is known about the genetics of these traits. Elucidating the genetic bases would require whole genome sequencing and a robust assembly of a parrot genome.Entities:
Keywords: Budgerigar; Hybrid assemblies; Melopsittacus undulatus; Next-generation sequencing; Optical maps; Parakeet; Vocal learning
Year: 2014 PMID: 25061512 PMCID: PMC4109783 DOI: 10.1186/2047-217X-3-11
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Summary of genomic reads
| Shotgun, 3 kb, 8 kb, 20 kb mate pair | 41,898,557 | 19,736 | 15.4× | |
| 220, 230, 500, 400–600, 800, 2 kb, 5 kb, 10 kb, 20 kb, 40 kb paired end | 561,074,047 | 356,597 | 289× | |
| 7.5Kb, 13 kb | 4,176,242 | 6,763 | 5.5× | |
| 607,148,846 | 383,096 | 309.9× |
Figure 1The distribution of read lengths in 454, Illumina and PacBio budgerigar sequences. The reads are binned into 5 bp buckets based on their lengths, and the fraction of reads (normalized by the size of the largest bucket) falling into each bucket is shown. Thus, curves shifted towards the right indicate longer read lengths. The reads labeled “20 Kbp”, “8 Kbp” and “3 Kbp”, “FLX Titanium” and “FLX Titanium XL+” are 454 reads. The reads labeled “PacBio pre-release C2” are uncorrected PacBio reads. The Illumina read lengths appear as colored square boxes, since these read lengths are uniform. The “Illumina Duke” reads are of length 76, The “Illumina UK” reads are of length 101, and the “Illlumina BGI” reads are of lengths 90 or 150. The longest reads come from PacBio sequencing, followed by 454 FLX + (i.e., FLX Titanium XL+) sequencing.
Summary of assemblies
| Assembler | Celera CABOG [ | PBcR assembler [ | | | SOAPdenovo | PCAP [ | NA | PCAP [ | SOAPdenovo [ | Ray [ | CLC Genomics Workbench |
| Sequence method | 454 FLX, FLX+, Illumina | PacBio corrected with Illumina, 454 FL×, FL×+ | 454 FLX, FLX+, Illumina, Optical Maps. | PacBio corrected with Illumina, 454 FL×, FL×+, Optical Maps. | Illumina, 454 FL×+ | Sanger | Sanger, 454 | Sanger v2.1 | Illumina | Illumina | Illumina, 454 FL×+ |
| Coverage | 14× | 17× | | | 137.59 Illumina, 6.85 FL×+ | 6× | 19.1× | 7.1×s | 107× | 26.9× | 26× |
| Genome size | 1.2Gbp | 1.2Gbp | 1.2Gbp | 1.2Gbp | 1.2Gbp | 1.2Gbp | 1.2Gbp | 1.05Gbp | 1.2Gbp | 1.58Gbp | 1.2 Gbp |
| Total bases in scaffolds | 1,117,358,947 | 1,219,132,003 | 1,118,758,630 | 1,241,439,339 | 1,169,860,945 | 1,224,525,252 | 1,046,932,099 | 1,047,124,295 | 1,174,046,505 | 1,164,566,833 | 997,000 |
| Number of scaffolds | 25,212 | 54,668 | 25,163 | 54,138 | 151,393 | 37,698 | 15,932 | 23,776 | 21,224 | 148,255 | 140,453 |
| Avg. scaffold size | 44,319 | 22,300 | 44,460 | 22,931 | 7,727 | 32,482 | 65,713 | 44,041 | 55,317 | 7,855 | Not available |
| N50 scaffold size | 10,614,387 | 1,705,751 | 13,823,040 | 7,280,340 | 13,497,021 | 10,409,499 | 90,216,835 | 11,125,310 | 3,891,469 | 19,470 | 15,968 |
| Largest scaffold size | 39,887,647 | 11,564,683 | 61,483,320 | 33,208,800 | 66,566,439 | 56,620,707 | 195,276,750 | 51,053,708 | 18,327,016 | 206,462 | 177,843 |
| Total gaps in scaffolds | 51,150 | 26,444 | 51,295# | 27,118 | 60810 | 124,736 | NA | NA | 77,368 | Not available | Not available |
| Number of Contigs | 70,863 | 77,556 | NA | NA | 212,203 | 126,053 | 27,027 | 85,191 | 98,540 | 259,423 | 214,754* |
| Avg. contig size | 15,334 | 15,344 | NA | NA | 4664 | 9,714 | 38,736 | 12,291 | 11,914 | 4,304 | Not available |
| N50 contig size | 55,633 | 102,885 | NA | NA | 51,034 | 38,549 | 279,750 | 45,280 | 28,599 | 6,983 | 6,366 |
| Largest contig size | 465,633 | 849,044 | NA | NA | 500,974 | 424,635 | NA | 624,663 | 247,807 | 75,003 | 87,225 |
*The Chicken v4 assembly consists of chromosomes and not scaffolds with explains the very high scaffold length statistics.
#The increased number of gaps in megascaffolds reflects the fact that each megascaffold may be merger of many original scaffolds with gaps in between them.
Figure 2Number of nucleotide gaps assess relative assembly incompleteness. A) Shows the total number of gaps in genes and the surrounding 10,000 base pair regions upstream and downstream (collectively called gene territories). B) Shows the number of such gene territories with gaps. In both the panels, different species assemblies are colored differently, with the budgerigar assemblies shown in dark blue. The budgerigar assemblies with the “-mega” suffix are optical map enhanced versions of the Budgerigar_v6.3 and PBcR assemblies. The budgerigar assemblies have the highest numbers of gapless gene territories (right panel) and the fewest number of gaps of all assemblies except the recent chicken v4 assembly, which used a similar technology (left panel).
Figure 3The number of genes that are part of a syntenic block between different budgerigar assemblies (A) and between budgerigar and non-budgerigar assemblies (B). The numbers were calculated from CoGE syntenic dotplots (not shown), as the total number of genes represented in syntenic blocks. The y-axis limits have been cut off close to the minimum value in the plot to show a more detailed spread of values.