| Literature DB >> 23922691 |
Martijn Staats1, Roy H J Erkens, Bart van de Vossenberg, Jan J Wieringa, Ken Kraaijeveld, Benjamin Stielow, József Geml, James E Richardson, Freek T Bakker.
Abstract
Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal. Furthermore, NGS of historical DNA enables recovering crucial genetic information from old type specimens that to date have remained mostly unutilized and, thus, opens up a new frontier for taxonomic research as well.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23922691 PMCID: PMC3726723 DOI: 10.1371/journal.pone.0069189
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Specimen information, tissue type sampled, DNA yield and DDBJ/EMBL/Genbank accession. See further specimen information in table S2.
| Species, type of material | Sample/Collection date | Tissue type sampled (Total DNA yield in ng) | DDBJ/EMBL/Genbank study accession |
|
| |||
|
| 21 April 1969 | Leaf (2400) | ERP001797 |
|
| July 2010 | Leaf (9890) | ERP001798 |
|
| 28 June 1897 | Leaf (3500) | ERP001799 |
|
| 8 July 2010 | Leaf (9405) | ERP001800 |
|
| 17 May 1946 | Leaf (30000) | ERP001801 |
|
| 8 July 2010 | Leaf (15000) | ERP001802 |
|
| |||
|
| 16 November 1990 | Basidiome (15000) | ERP001803 |
|
| 4 October 1931 | Basidiome (8000) | ERP001804 |
|
| 7 October 1989 | Basidiome (45000) | ERP001805 |
|
| |||
|
| December 1999 | Complete specimen (650) | - |
|
| December 2010 | Part of larva stadium (9100) | - |
|
| July 1992 | One rear leg (1500) | ERP001808 |
|
| December 2010 | Two legs (750), and thorax/head (1000) | ERP001807 |
|
| April 1995 | Three legs (800), and thorax/head (1200) | ERP001806 |
NASC = The European Arabidopsis Stock Centre, Nottingham, UK.
Alignments of reads generated for archival plant, fungal and insect specimens, and fresh control tissues. All percentages of reads are relative to the number of trimmed reads.
| Specimen, type of material | Trimmed reads | Mapped reads (%) | Uniquely mapped reads (%)1 | Read depth3 | % Genome coverage4 | Reference genome | |||||
|
| Genome | Source | |||||||||
|
| 36,926,748 | 22,021,533 | (59.6) | 16,345,196 | (44.3) | 12.1 | (12.8) | 94.0 | (98.4) |
| GCF_000001735.3 |
|
| 36,926,748 | 3,771,157 | (10.2) | 293,443 | (0.79) | 167.2 | (167.2) | 100 |
| NC_000932 | |
|
| 93,810,738 | 4,169,008 | (4.4) | 241,721 | (0.26) | 174.4 | (174.7) | 99.9 |
| fresh tissue, this study2 | |
|
| 54,746,358 | 2,592,231 | (4.7) | 311,437 | (0.57) | 171.4 | (171.5) | 99.9 |
| NC_008326 | |
|
| |||||||||||
|
| 30,898,216 | 19,780,655 | (64.0) | 16,344,621 | (52.9) | 12.1 | (12.1) | 99.8 | (100) |
| GCF_000001735.3 |
|
| 30,898,216 | 3,185,803 | (10.3) | 307,989 | (0.99) | 175.4 | (175.4) | 100 |
| NC_000932 | |
|
| 44,672,406 | 2,467,061 | (5.52) | 244,192 | (0.55) | 176.2 | (176.4) | 99.9 |
| fresh tissue, this study2 | |
|
| 52,065,984 | 8,307,499 | (15.9) | 316,717 | (0.61) | 174.3 | (174.3) | 100 |
| NC_008326 | |
|
| |||||||||||
|
| 23,852,078 | 13,330,723 | (55.9) | 10,525,133 | (44.1) | 28.7 | (30.0) | 95.4 | (97.9) |
| H97 v2.0, MycoCosm |
|
| 23,852,078 | 1,808,428 | (7.6) | 310,765 | (1.30) | 156.3 | (156.3) | 99.9 |
| H97 v2.0, MycoCosm | |
|
| 49,124,456 | 22,591,856 | (45.9) | 14,969,141 | (30.5) | 20.7 | (29.1) | 71.2 | (81.4) |
| v2.0, MycoCosm |
|
| 49,124,456 | 1,020,156 | (2.1) | 81,921 | (0.17) | 166.1 | (166.5) | 99.9 |
| herbarium, this study2 | |
|
| 50,890,906 | 23,594,103 | (46.4) | 15,909,901 | (31.3) | 35.8 | (45.6) | 78.4 | (88.8) |
| PC15 v2.0, MycoCosm |
|
| 50,890,906 | 238,898 | (0.47) | 81,226 | (0.16) | 127.3 | (127.4) | 99.9 |
| herbarium, this study2 | |
|
| |||||||||||
|
| 49,813,018 | 29,789 | (0.06) | 8,994 | (0.02) | 37.6 | (41.4) | 90.9 |
| NC_008221 | |
|
| 29,343,030 | 657 | (0.002) | 594 | (0.002) | 2.3 | (3.5) | 63.5 |
| NC_006817 | |
|
| 29,864,834 | 114,771 | (0.38) | 24,577 | (0.08) | 135.4 | (135.4) | 100 |
| NC_000857 | |
|
| 25,896,990 | 211,743 | (0.82) | 26,527 | (0.10) | 146.1 | (146.1) | 100 |
| NC_000857 | |
|
| |||||||||||
|
| 6,487,061 | 31,223 | (0.48) | 12,563 | (0.19) | 47.8 | (49.8) | 95.9 |
| NC_008221 | |
|
| 41,781,720 | 80,323 | (0.19) | 26,059 | (0.06) | 143.6 | (143.8) | 99.9 |
| NC_000857 | |
|
| 32,447,206 | 118,065 | (0.36) | 29,045 | (0.09) | 159.9 | (159.9) | 100 |
| NC_000857 | |
1 The number of uniquely mapped reads after filtering for PCR duplicates.
2 Reference sequence was generated using de novo assembly.
3 Average read depth for covered positions, i.e. regions with non-zero coverage only, is given in brackets.
4 Percentage coverage of exonic regions is given in brackets.
De novo assemblies of reads generated for archival plant, fungal and insect specimens, and fresh control tissues.
| Specimen, type of material | Velvet settings1 | Assembly size (Mb) | N50 (nt) | Contigs | Alignable contigs2 | % Genome coverage3 | Reference genome |
|
| |||||||
|
| 27* | 67.1 | 1,107 | 65,388 | 64,024 | 55.5 (16.2) |
|
|
| 47, 50, 150* | 0.19 | 15,211 | 34 | 8 | 98.4 |
|
|
| 47, 50, 2000 | 0.82 | 3,614 | 280 | 3 | 81.2 |
|
|
| 39, 150* | 0.18 | 22,290 | 34 | 6 | 100 |
|
|
| |||||||
|
| 27* | 78.0 | 1,154 | 73,838 | 65,612 | 60.9 (19.8) |
|
|
| 47, 50, 150* | 0.14 | 9,114 | 28 | 18 | 97.2 |
|
|
| 47, 50, 2000 | 1.49 | 12,726 | 223 | 3 | 81.4 |
|
|
| 39, 150* | 0.21 | 8,296 | 43 | 15 | 99.2 |
|
|
| |||||||
|
| 41, 5, 50 | 27.7 | 20,217 | 1,820 | 1,720 | 81.1 (71.0) |
|
|
| 41, 50, 2000 | 0.24 | 41,776 | 43 | 2 | 99.9 |
|
|
| 41, 5, 50 | 38.8 | 19,276 | 2,559 | 2,495 | 62.0 (34.4) |
|
|
| 51, 50, 1000 | 1.55 | 6,809 | 384 | 5 | 24.5 |
|
|
| 41, 5, 100 | 34.3 | 85,861 | 858 | 725 | 78.5 (56.4) |
|
|
| 41, 50, 300 | 2.24 | 9,705 | 405 | 9 | 90.2 |
|
|
| |||||||
|
| xx | xx | Xx | xx | xx | xx | |
|
| 51, 20* | 0.15 | 2,022 | 77 | 0 | 0 |
|
|
| 39, 150* | 0.04 | 5,249 | 12 | 1 | 96.8 |
|
|
| 39, 150* | 0.03 | 3,282 | 11 | 2 | 96.8 |
|
|
| |||||||
|
| 51, 20* | 0.43 | 3,650 | 106 | 5 | 90.3 |
|
|
| 39, 150* | 0.05 | 3,794 | 17 | 2 | 94.2 |
|
|
| 39, 150* | 0.05 | 2,778 | 25 | 2 | 96.5 |
|
1 Velvet setting: k-mer length, coverage cutoff and expected coverage. * Analysis was run in single-end modus.
2 The number reference-sequence alignable contigs.
3 Percentage coverage of coding sequences (CDS) is given in brackets.
4 Not shown: the read library contained extensive contamination with bacteriophage (M14428) and fungal DNA (e.g. closest related to Aspergillus niger rDNA AM270052).