| Literature DB >> 32153100 |
Jocelyn P Colella1,2, Anna Tigano1,2, Matthew D MacManes1,2.
Abstract
High-throughput sequencing technologies are a proposed solution for accessing the molecular data in historical specimens. However, degraded DNA combined with the computational demands of short-read assemblies has posed significant laboratory and bioinformatics challenges for de novo genome assembly. Linked-read or "synthetic long-read" sequencing technologies, such as 10× Genomics, may provide a cost-effective alternative solution to assemble higher quality de novo genomes from degraded tissue samples. Here, we compare assembly quality (e.g., genome contiguity and completeness, presence of orthogroups) between four new deer mouse (Peromyscus spp.) genomes assembled using linked-read technology and four published genomes assembled from a single shotgun library. At a similar price-point, these approaches produce vastly different assemblies, with linked-read assemblies having overall higher contiguity and completeness, measured by larger N50 values and greater number of genes assembled, respectively. As a proof-of-concept, we used annotated genes from the four Peromyscus linked-read assemblies and eight additional rodent taxa to generate a phylogeny, which reconstructed the expected relationships among species with 100% support. Although not without caveats, our results suggest that linked-read sequencing approaches are a viable option to build de novo genomes from degraded tissues, which may prove particularly valuable for taxa that are extinct, rare or difficult to collect.Entities:
Keywords: zzm321990Peromyscuszzm321990; 10× genomics; assembly quality; natural history collections
Mesh:
Year: 2020 PMID: 32153100 PMCID: PMC7496956 DOI: 10.1111/1755-0998.13155
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 7.090
Natural history data for specimens sequenced using 10× Genomics (Peromyscus spp.) and for publicly available, shotgun assemblies used for comparison
| Common name | Genus | Species | Collection. year | Collection locality | Voucher | Publication |
|---|---|---|---|---|---|---|
| Texas deer mouse |
|
| 1995 | Texas, USA | MSB:Mamm:84733 | This study |
| Aztec deer mouse |
|
| 1982 | Michoacan, Mexico | MSB:Mamm:48205 | This study |
| Plateau deer mouse |
|
| 2006 | Coahuila, Mexico | MSB:Mamm:273915 | This study |
| La carpintera deer mouse |
|
| 1995 | Guanacaste, Costa Rica | MSB:Mamm:70743 | This study |
| Red vizcacha rat |
|
| na | Mendoza, Argentina | AO245 | Evans, Upham, Golding, Ojeda, and Ojeda ( |
| Mountain vizcacha rat |
|
| na | San Juan, Argentina | AO248 | Evans et al. ( |
| Siberian hamster |
|
| na | Laboratory | Unvouchered | Bao et al. ( |
| Three‐banded armadillo |
|
| na | na | Voucher not reported | Johnson et al. ( |
Qubit concentrations of original DNA extractions, DNA quantifications before and after size selection with Circulomics (short‐read eliminator kit), and the percentage (%) of DNA lost Peak sizes before and after size selection
| Species | Collection year | Qubit concentration (ng/μl) | DNA quantity pre‐Circulomics | DNA quantity (μg) post‐Circulomics | Percentage lost | Peak size (kb) pre‐Circulomics | Peak size (kb) post‐Circulomics | Peak size (kb) post‐Circulomics |
|---|---|---|---|---|---|---|---|---|
|
| 1995 | 390 | 10 | 0.391 | 96 | 24 | 30 | 16 |
|
| 1982 | 370 | 10 | 0.755 | 92 | 12 | 18 | 11 |
|
| 2006 | 191 | 10 | 4.2 | 58 | — | — | 9 |
|
| 1995 | 486 | 10 | 0.115 | 99 | 27 | 27 | 10 |
|
| 1997 | 282 | 10 | 2.4 | 76 | 26 | 26 | — |
|
| 2017 | 208 | 10 | 7.1 | 29 | >60 | >60 | — |
TapeStation measurement.
Femto Pulse measurement.
Sequencing and assembly quality statistics for each examined genome, including: sequencing platform, assembler, number of reads PE [paired‐end], SE [single‐end]), scaffold N50, longest contig, percentage complete BUSCOs, and the number of genes annotated
| Species | Library preparation | Sequencing platform | Assembler(s) | Number of reads (million) | Scaffold N50 (bp) | Longest scaffold (bp) | BUSCO (%) | Orthogroups | Number of genes |
|---|---|---|---|---|---|---|---|---|---|
| 10× Assemblies | |||||||||
|
| 10× Genomics | Illumina Hiseq X |
| 409 PE | 40,046 | 284,598 | 66.4 | 14,035 [9,560] | 19,008 |
|
| 10× Genomics | Illumina Hiseq X |
| 423 PE | 34,920 | 176,898 | 61.7 | 13,525 [9,122] | 18,061 |
|
| 10× Genomics | Illumina Hiseq X |
| 405 PE | 32,296 | 386,405 | 55.1 | 13,080 [8,748] | 17,244 |
|
| 10× Genomics | Illumina Hiseq X |
| 377 PE | 38,243 | 240,034 | 63.1 | 13,533 [9,188] | 17,960 |
| Comparative shotgun Assemblies | |||||||||
|
| Shotgun | SL Illumina HiSeq |
| 342 PE (7 SE) | 4,698 | 113,146 | 23 | [5,305] | 11,177 |
|
| Shotgun | SL Illumina HiSeq |
| 168 PE (3 SE) | 4,874 | 75,086 | 16.9 | [4,154] | 9,631 |
|
| Shotgun | SL Illumina HiSeq |
| na | 2,392 | 34,960 | 12.7 | [2,337] | 3,233 |
|
| Shotgun | SL Illumina HiSeq |
| na | 10,217 | 251,906 | 42.1 | [7,171] | 19,557 |
| Phylogenetic outgroup taxa | |||||||||
|
| na | Illumina HiSeq | na | na | 115,033,041 | 193,310,054 | 95.3 | 15,593 | 20,060 |
|
| na | Illumina | na | na | 117,603,569 | 213,001,178 | 95.4 | 15,667 | 19,768 |
|
| Shotgun + SMRTbell | PacBio, Illumina HiSeq, Hi‐C | Platanus | 794M PE | 114,273,790 | 193,658,164 | 94.5 | 15,147 | 19,740 |
|
| Shotgun | Illumina HiSeq |
| 674M PE | 151,693 | 4,085,094 | 78 | 13,322 | 17,282 |
|
| Shotgun | Illumina HiSeq |
| na | 20,878 | 372,325 | 63.4 | 13,905 | 18,310 |
|
| Shotgun | Illumina HiSeq |
| na | 101,373 | 1,032,205 | 86.6 | 14,351 | 24,474 |
|
| Shotgun + BAC | Sanger, SOLiD, PacBio |
| na | 145,729,302 | 282,763,074 | 91.6 | 14,257 | 27,491 |
|
| Shotgun + BAC | na |
| >44M | 130,694,993 | 195,471,971 | 95.2 | 13,883 | 21,907 |
SL, single lane. Number of orthogroups identified among the 12 species included in phylogenetic analysis and, in square brackets, the number of orthogroups identified between 10× and shotgun assemblies.
Kajitani et al. (2014).
Chin et al. (2016).
Chin et al. (2013).
Walker et al. (2014).
Butler et al. (2008).
Atlas Assembly Suite, Havlak et al. (2004).
FIGURE 1Maximum‐likelihood phylogeny of the four new linked‐read Peromyscus genome assemblies, three publicly available Peromyscus assemblies and five outgroup assemblies within Rodentia. The phylogeny, generated from consensus orthogroups, demonstrates complete resolution (100% bootstrap support for all nodes). The heat map details the number of shared orthogroups across taxa, with the diagonal indicating the total number of orthogroups identified for each species [Colour figure can be viewed at wileyonlinelibrary.com]