| Literature DB >> 34336399 |
Todd G B McLay1,2,3, Joanne L Birch2, Bee F Gunn1,2, Weixuan Ning4, Jennifer A Tate4, Lars Nauheimer5,6, Elizabeth M Joyce5,6, Lalita Simpson5,6, Alexander N Schmidt-Lebuhn3, William J Baker7, Félix Forest7, Chris J Jackson1.
Abstract
PREMISE: Universal target enrichment kits maximize utility across wide evolutionary breadth while minimizing the number of baits required to create a cost-efficient kit. The Angiosperms353 kit has been successfully used to capture loci throughout the angiosperms, but the default target reference file includes sequence information from only 6-18 taxa per locus. Consequently, reads sequenced from on-target DNA molecules may fail to map to references, resulting in fewer on-target reads for assembly, and reducing locus recovery.Entities:
Keywords: Angiosperms353; HybPiper; locus recovery; target capture; target file
Year: 2021 PMID: 34336399 PMCID: PMC8312740 DOI: 10.1002/aps3.11420
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
FIGURE 1Overview of the steps involved in creating the mega353 target file. First, loci in the default353 file are aligned and hidden Markov model (HMM) profiles are created for each locus. The HMM profiles are used to identify these loci in the 1KP transcriptomes, and transcript hits are added to the alignment. The alignment of each locus is then trimmed, grafted if necessary, a frameshift correction is performed, and all loci are combined in the mega353 target file. The gray boxes indicate steps the user can take to modify the mega353 target file. The mega353 target file can be filtered (based on sample identifiers in the filtering_options.csv file) to select samples included in the target file. The BYO_transcriptome.py script can be used to add GenBank or personal transcriptomes to the filtered mega353 target file.
Summary of recovery statistics produced by HybPiper comparing the default353 target set to the mega353 target set (filtered by family or order). Values represent averages of each data set for each target file.
| Data set (no. of samples) | Target file | % reads on target (average) | No. of loci with sequences (average) | No. of loci at 75% of target length (average) | Length of concatenated loci (bp, average) |
|---|---|---|---|---|---|
| Angiosperms353 exemplar data (41) | default353 | 22.3% | 275.5 | 117.7 | 144,283.5 |
| mega353 | 32.3% | 287.9 | 132.2 | 165,867.4 | |
| mega353 vs. default353 % improvement | 44.9% | 4.5% | 12.3% | 15% | |
| Asparagales (8) | default353 | 1.2% | 146.9 | 22.9 | 55,484.3 |
| Order (Asparagales) | 1.7% | 159.5 | 27.8 | 65,637.4 | |
| Order vs. default353 % improvement | 37.1% | 8.6% | 21.3% | 18.3% | |
|
| default353 | 15.7% | 292.8 | 89 | 131,292.6 |
| Family (Apiaceae) | 16.3% | 299.2 | 107.6 | 144,951 | |
| Order (Apiales) | 19.4% | 309 | 119.8 | 158,014.8 | |
| Family vs. default353 % improvement | 3.7% | 2.2% | 20.9% | 10.4% | |
| Order vs. default353 % improvement | 23.1% | 5.5% | 34.6% | 20.4% | |
|
| default353 | 12.30% | 238.8 | 46 | 93,043 |
| Family (Orchidaceae) | 14.6% | 268.2 | 75.5 | 122,451.8 | |
| Family + genus (Orchidaceae+ | 15% | 273.1 | 83.8 | 131,549.8 | |
| Family vs. default353 % improvement | 19% | 12.3% | 64.1% | 31.6% | |
| Family+genus vs. default353 % improvement | 22.2% | 14.3% | 82.2% | 41.4% | |
| Cyperaceae (6) | default353 | 9.4% | 201.1667 | 68 | 91,865.5 |
| Family (Cyperaceae) | 11.1% | 249 | 103.8333 | 129,220 | |
| Order (Poales) | 12.1% | 251.3333 | 100.3333 | 131,571 | |
| Family vs. default353 % improvement | 18.1% | 23.8% | 52.7% | 40.7% | |
| Order vs. default353 % improvement | 28.3% | 24.9% | 47.5% | 43.2% | |
| Ericaceae (12) | default353 | 7.5% | 307 | 97.2 | 145,031.8 |
| Family (Ericaceae) | 11.9% | 335.8 | 185.6 | 198,784.3 | |
| Order (Ericales) | 12.9% | 338.6 | 189.2 | 205,629 | |
| Family vs. default353 % improvement | 60% | 9.4% | 91% | 37.1% | |
| Order vs. default353 % improvement | 73.3% | 10.3% | 94.7% | 41.8% | |
|
| default353 | 8.8% | 306.6 | 105.5 | 145,598.6 |
| Order (Caryophyllales) | 12% | 322.9 | 147.5 | 182,845.1 | |
| Order vs. default353 % improvement | 36% | 5.3% | 39.78% | 25.6% | |
| Sapindales (6) | default353 | 26.6% | 335.6 | 188.6 | 193,205.6 |
| Order (Sapindales) | 31.3% | 341.4 | 248.7 | 229,415.1 | |
| Order vs. default353 % improvement | 17.4% | 1.7% | 31.9% | 18.7% | |
| Average percentage improvement | 31.9% | 10.2% | 49.4% | 28.7% | |
| Minimum percentage improvement | 3.7% | 1.7% | 12.3% | 10.4% | |
| Maximum percentage improvement | 73.3% | 24.9% | 94.7% | 43.2% |
FIGURE 2The number of loci represented for each order in the default353 (red) compared to the mega353 (blue) target files.
FIGURE 3Summary of recovery statistics produced by HybPiper comparing the default353 target set to the mega353 target set (filtered by family or order), for (A) the number of reads mapped to reference sequences, (B) the number of loci with sequences, and (C) the number of loci recovered at ≥75% of reference length. Data set abbreviations for boxplot headings: og353 (Angiosperms353 exemplar), asp (Asparagales), azo (Azorella), bul (Bulbophyllum), cyp (Cyperaceae), eri (Ericaceae), nep (Nepenthes), sap (Sapindales).
FIGURE 4Heatmap of locus length changes for each locus, averaged across all samples for each data set, where the default353 locus lengths are subtracted from the mega353 locus lengths. Increases in length are shown in blue, decreases in length are shown in red.