| Literature DB >> 30395612 |
Emily Telfer1, Natalie Graham1, Lucy Macdonald1, Shane Sturrock1,2, Phillip Wilcox3, Lisa Stanbra1.
Abstract
There is a wide diversity of bioinformatic tools available for the assembly of next generation sequence and subsequence variant calling to identify genetic markers at scale. Integration of genomics tools such as genomic selection, association studies, pedigree analysis and analysis of genetic diversity, into operational breeding is a goal for New Zealand's most widely planted exotic tree species, Pinus radiata. In the absence of full reference genomes for large megagenomes such as in conifers, RNA sequencing in a range of genotypes and tissue types, offers a rich source of genetic markers for downstream application. We compared nine different assembler and variant calling software combinations in a single transcriptomic library and found that Single Nucleotide Polymorphism (SNPs) discovery could vary by as much as an order of magnitude (8,061 SNPs up to 86,815 SNPs). The assembler with the best realignment of the packages trialled, Trinity, in combination with several variant callers was then applied to a much larger multi-genotype, multi-tissue transcriptome and identified 683,135 in silico SNPs across a predicted 449,951 exons when mapped to the Pinus taeda ver 1.01e reference.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30395612 PMCID: PMC6218030 DOI: 10.1371/journal.pone.0205835
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Transcriptomes generated from the following tissues.
| Genotype | Tissues | Date | Tree Owner | Tree location |
|---|---|---|---|---|
| Tree 1 | 6 year old | Mar 2008 | Waimangu Forest owned by Kaingaroa Timberlands | LAT -38.258 |
| 6 year old | Mar 2008 | |||
| Tree 2 | Needles (N) | Nov 2012 | Scion Clonal archive | LAT -38.156 |
| Spring Buds (SB) | Nov 2012 | |||
| Tree 3 | Needles (N) | Nov 2012 | Scion Clonal archive | LAT -38.156 |
| Needles (infected) (NI) | Nov 2012 | |||
| Spring Buds (SB) | Nov 2012 | |||
| Tree 4 | Needles (N) | Nov 2012 | Scion Clonal archive | LAT -38.156 |
| Spring Buds (SB) | Nov 2012 | |||
| Tree 5 | Needles (N) | Nov 2012 | Scion Clonal archive | LAT -38.156 |
| Needles (infected) (NI) | Nov 2012 | |||
| Tree 6 | Spring xylem (SPX) 1.4 metres | Nov 2000 | Scion research Trial RO 664/3 | LAT -38.622 |
| Summer xylem (SUX) 1.4 metres | Mar 2001 | |||
| Autumn Buds (AB) | Mar 2001 | |||
| Summer phloem (Ph) 1.4 metres | Mar 2001 | |||
| Tree 7 | 2 year old Seedling xylem (X) | Oct 2012 | Scion Field Trial | LAT -38.155 LON 176.268 |
| 2 year old Seedling phloem (Ph) | Oct 2012 | |||
| Tree 8 | Summer xylem (SUX) 1.4 metres | Mar 2001 | Scion research Trial RO 664/3 Forest owned by Kaingaroa Timberlands | LAT -38.622 |
Fig 1Tissues used to isolate RNA.
A) compression (CW) and opposite wood (OW), B) developing xylem and phloem, C) developing buds and D) on-tree needles inoculated and un-inoculated with Phytophthora pluvialis spores.
Software tools used for short read sequence alignment and SNP detection.
| Software | Function | Version | Reference |
|---|---|---|---|
| SOAPdenovo | Li | ||
| SOAPdenovo-Trans | 1.03 | Xie | |
| Trinity | r2012-01-25 | Grabherr | |
| Trinity RNASeq | r2013-02-25 | Grabherr | |
| Velvet | 1.2.10 | Zerbino et al. (2008) | |
| Oases | 0.2.08 | Schulz | |
| BWA | Global alignment | 0.5.9-r16 | Li and Durbin (2009) |
| Bowtie2 | Global alignment | 2.1.0 | Langmead |
| MAQ | Quality score conversion, global alignment, polymorphic site identification | 0.7.1 | Li, Ruan and Durbin (2008) |
| rtg-GA | Global alignment, polymorphic site identification | 2.2.1 | |
| Mosaik | Global alignment | 1.1.0021 | Lee (2010) |
| GATK | Local realignment, polymorphic site identification | 1.0.5777 | McKenna |
| BLAST | Similarity searching Basic Local Alignment Search Tool | 2.2.28+ | Altschul |
| PERL | Scripting language for file manipulation | 5.10.1 | Christians |
| SAMtools | Polymorphic site identification | 0.1.14 | Li |
| Freebayes | Polymorphic site identification | 0.6.5 |
Different workflows applied for short read sequence alignment and SNP detection in Tree 1 pilot assemblies.
| Pipeline | Quality score | Global alignment software | Local realignment software | Polymorphic identification |
|---|---|---|---|---|
| 1 | Solexa | BWA | - | SAMtools |
| 2 | Sanger | BWA | - | SAMtools |
| 3 | Sanger | BWA | GATK | SAMtools |
| 4 | Sanger | BWA | GATK | GATk |
| 5 | Sanger | BWA | GATK | Freebayes |
| 6 | Sanger | MAQ | - | MAQ |
| 7 | Solexa | rtg-GA | - | rtg-GA |
| 8 | Solexa | Mosaik | GATK | GATK |
| 9 | Solexa | Mosaik | GATK | freebayes |
Summary of transcriptome assemblies for each genotype using Trinity v2.0.
| Tree ID | Tissues | Total trimmed Contigs | Total length | Min contig (b) | Median contig (b) | Mean contig (b) | Max contig (b) | N50 Contig | N50 Length (b) | N90 Contig | N90 Length |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Tree 1 | OW, CW | 240,053 | 189,954,978 | 201 | 384 | 791 | 16,502 | 35,503 | 1,517 | 160,376 | 288 |
| Tree 1 | OW, CW | 137,228 | 201 | 9,175 | |||||||
| Tree 2 | N, SB | 174,382 | 135,676,827 | 201 | 377 | 778 | 11,558 | 26,833 | 1,504 | 116,553 | 281 |
| Tree 3 | N, SB | 144,891 | 128,260,169 | 201 | 417 | 885 | 13,455 | 22,746 | 1,735 | 92,347 | 309 |
| Tree 4 | SB, N, NI | 164,911 | 140,803,864 | 201 | 419 | 853 | 11,048 | 26,095 | 1,625 | 107,140 | 305 |
| Tree 5 | N, NI | 194,849 | 142,994,312 | 201 | 350 | 733 | 11,536 | 28,358 | 1,433 | 132,473 | 267 |
| Tree 6 | SPX, SUX, AB, Ph | 223,427 | 189,323,752 | 201 | 420 | 847 | 16,579 | 34,701 | 1,591 | 145,615 | 727 |
| Tree 7 | SUX, Ph | 122,659 | 114,034,559 | 201 | 505 | 929 | 9,798 | 21,562 | 1,672 | 78,394 | 346 |
| Tree 8 | SUX | 112,461 | 110,811,316 | 201 | 511 | 985 | 12,357 | 19,320 | 1,819 | 70,137 | 359 |
1 See Table 1 for tissue codes
2 N50 contig is the number of large contigs that collectively contain 50% of the nucleotide bases.
3 N50 Length is the length of the shortest N50 contig.
4 N90 contig is the number of large contigs that collectively contain 90% of the nucleotide bases.
5 N90 Length is the length of the shortest N90 contig.
6 SOAPdenovo assembly
7 Trinity assembly
Pair-wise analysis of SNPs predicted among pairs of pipelines.
Diagonal line represents SNPs unique to that combination, with the number of total quality SNPs identified by each method shown in the final row.
| Pipeline 1 | Pipeline 2 | Pipeline 3 | Pipeline 4 | Pipeline 5 | Pipeline 6 | Pipeline 7 | Pipeline 8 | Pipeline 9 | |
|---|---|---|---|---|---|---|---|---|---|
| Pipeline 1 | 2,251 | 4,175 | 4,161 | 800 | 1,080 | 2,730 | 3,873 | 758 | 782 |
| Pipeline 2 | 46 | 32,048 | 4,670 | 7,623 | 15,470 | 21,184 | 4,117 | 4,677 | |
| Pipeline 3 | 28 | 4,651 | 7,615 | 15,450 | 21,159 | 4,116 | 4,671 | ||
| Pipeline 4 | 1,663 | 1,385 | 6,106 | 6,764 | 7,808 | 863 | |||
| Pipeline 5 | 6,598 | 8,111 | 11,506 | 1,301 | 2,643 | ||||
| Pipeline 6 | 8,684 | 41,058 | 6,325 | 7,153 | |||||
| Pipeline 7 | 16,194 | 7,246 | 7,897 | ||||||
| Pipeline 8 | 21,846 | 2,060 | |||||||
| Pipeline 9 | 31,154 | ||||||||
| Total quality SNPs identified | |||||||||
| Quality scores | Solexa | Sanger | Sanger | Sanger | Sanger | Sanger | Solexa | Solexa | Solexa |
| Global alignment software | BWA | BWA | BWA | BWA | BWA | MAQ | rtg-GA | Mosaik | Mosaik |
| Local alignment software | - | - | GATK | GATK | GATK | - | - | GATK | GATK |
| Polymorphism identification | SAMtools | SAMtools | SAMtools | GATK | Freebayes | MAQ | rtg-GA | GATK | Freebayes |
Frequency of SNP detection across all 9 discovery pipelines.
| Number of pipelines detecting the same SNP | Number of SNPs |
|---|---|
| 9 | 6 |
| 8 | 74 |
| 7 | 626 |
| 6 | 2,991 |
| 5 | 7,208 |
| 4 | 11,021 |
| 3 | 15,888 |
| 2 | 37,867 |
| 1 | 88,464 |
| Total SNPs | 164,145 |
Summary of SNP discovery within individual genotypes.
| Genotype | Tissues | SNP discovery algorithms | Total SNPs | Unique SNPs | ||
|---|---|---|---|---|---|---|
| rtg-GA | GATK | SAMtools | ||||
| Tree 1 | OW, CW | 59,744 | 27,627 | 65,554 | 152,925 | 108,319 |
| Tree 2 | N, SB | 58,320 | 23,192 | 53,715 | 135,227 | 92,232 |
| Tree 3 | N, SB | 48,786 | 20,587 | 44,897 | 114,270 | 76,912 |
| Tree 4 | SB, N, NI | 57,550 | 21,303 | 41,184 | 120,037 | 87,291 |
| Tree 5 | N, NI | 63,650 | 15,023 | 39,853 | 118,526 | 89,516 |
| Tree 6 | SPX, SUX, AB, Ph | 63,716 | 33,965 | 58,707 | 156,388 | 107,290 |
| Tree 7 | X, Ph | 39,171 | 19,300 | 37,053 | 95,524 | 65,695 |
| Tree 8 | SUX | 35,761 | 14,433 | 33,938 | 84,132 | 55,880 |
1See Table 1 for tissue codes
2Total SNPs is the cumulative total for a genotype across the three algorithms.
3 All SNPs is the cumulative total for a genotype, with redundant detections across multiple algorithms removed.
4 Cumulative total for all genotypes; SNPS which may be counted multiple times if they appear in multiple genotypes.