| Literature DB >> 27437175 |
Matthew G Johnson1, Elliot M Gardner2, Yang Liu3, Rafael Medina3, Bernard Goffinet3, A Jonathan Shaw4, Nyree J C Zerega2, Norman J Wickett2.
Abstract
PREMISE OF THE STUDY: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). METHODS ANDEntities:
Keywords: Hyb-Seq; bioinformatics; phylogenomics; sequence assembly
Year: 2016 PMID: 27437175 PMCID: PMC4948903 DOI: 10.3732/apps.1600016
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
Sample information, sequencing run and hybridization pool, summary of sequencing, and target enrichment results for the Artocarpus/Morus bait set.
| Sample ID | Species | Run | Pool | Paired reads | Paired surviving QC | Percent reads on target | Genes recovered | Subgenus/Tribe |
| NZ866 | 1 | 4 | 237,638 | 212,318 | 70.1 | 456 | ||
| NZ728 | 1 | 1 | 657,701 | 592,305 | 75.2 | 458 | ||
| NZ739 | 1 | 2 | 410,273 | 343,182 | 73.8 | 456 | ||
| NZ606 | 1 | 3 | 507,744 | 456,512 | 65.3 | 457 | ||
| NZ814 | 1 | 1 | 757,804 | 697,873 | 76.7 | 458 | ||
| NZ612 | 1 | 3 | 590,801 | 502,324 | 68.1 | 458 | ||
| EG92 | 1 | 3 | 422,739 | 383,077 | 68.4 | 458 | ||
| EG87 | 1 | 4 | 508,620 | 456,063 | 72.6 | 458 | ||
| NZ771 | 1 | 4 | 437,596 | 368,357 | 71.7 | 458 | ||
| NZ946 | 1 | 3 | 409,715 | 379,410 | 64.7 | 457 | ||
| MW_lowii-2 | 1 | 4 | 417,260 | 350,643 | 72.4 | 458 | ||
| NZ780 | 1 | 3 | 328,567 | 291,565 | 64.7 | 458 | ||
| NZ918 | 1 | 3 | 296,053 | 273,231 | 64.4 | 457 | ||
| EG98 | 1 | 2 | 634,153 | 523,372 | 75.5 | 458 | ||
| NZ694 | 1 | 4 | 444,734 | 369,717 | 72.4 | 458 | ||
| NZ687 | 1 | 2 | 353,376 | 316,194 | 75.7 | 458 | ||
| NZ420 | 1 | 2 | 425,368 | 386,539 | 77.9 | 457 | ||
| NZ911 | 1 | 4 | 403,279 | 340,166 | 72.3 | 457 | ||
| NZ402 | 1 | 1 | 208,369 | 188,696 | 77.5 | 458 | ||
| NZ929 | 1 | 2 | 385,183 | 345,495 | 76.0 | 458 | ||
| GW1701 | 1 | 2 | 146,460 | 129,755 | 74.4 | 458 | [ | |
| NZ609 | 1 | 1 | 520,398 | 478,292 | 72.8 | 457 | ||
| NZ281 | 2 | 1 | 1,122,018 | 866,706 | 5.0 | 380 | Castilleae | |
| EG139 | 2 | 4 | 441,498 | 236,834 | 56.7 | 417 | Maclureae | |
| EG78 | 2 | 1 | 484,680 | 297,401 | 23.9 | 392 | Moreae | |
| EG30 | 2 | 4 | 1,522,294 | 1,047,142 | 71.6 | 423 | Ficeae | |
| NZ311 | 1 | 1 | 91,831 | 83,447 | 16.3 | 294 | Dorstenieae | |
| NZ874 | 1 | 1 | 54,069 | 45,534 | 31.4 | 378 | [unnamed tribe-level clade] |
Brackets indicate subgenera or tribes with uncertain taxonomic designation.
Fig. 1.Heat map showing recovery efficiency for 458 genes enriched in Artocarpus and other Moraceae and recovered by HybPiper using the BWA method. Each column is a gene, and each row is one sample. The shade of gray in the cell is determined by the length of sequence recovered by the pipeline, divided by the length of the reference gene (maximum of 1.0). Three types of genes were enriched: phylogenes (left), MADS-box genes (center), and volatiles (right) for 22 Artocarpus samples (top) and six outgroup species (bottom). Full data for this chart can be found in Appendices S2 and S3.
Recovery efficiency of HybPiper for 22 Artocarpus and six other Moraceae, using two methods for assigning reads to lociBLASTX (mapping to protein sequences) and BWA (mapping to nucleotide sequences).
| BLASTX method | BWA method | ||||||
| Taxon | Phylogenetic loci | MADS box | Volatiles | Phylogenetic loci | MADS box | Volatiles | |
| Total genes in array | 333 | 98 | 27 | 333 | 98 | 27 | |
| 12 | 329.6 | 96.4 | 26.4 | 331 | 94 | 26.5 | |
| 2 | 328.5 | 96.5 | 26.5 | 330.5 | 93 | 26.5 | |
| 6 | 326.7 | 95.3 | 26.2 | 329.8 | 89.3 | 26.2 | |
| 2 | 326 | 95 | 26 | 327 | 87.5 | 25.5 | |
| 1 | 257 | 41 | 13 | 259 | 8 | 10 | |
| 1 | 307 | 53 | 21 | 299 | 10 | 19 | |
| 1 | 318 | 30 | 17 | 315 | 3 | 17 | |
| 1 | 315 | 56 | 25 | 311 | 17 | 20 | |
| 1 | 135 | 15 | 2 | 118 | 0 | 1 | |
| 1 | 127 | 21 | 4 | 120 | 0 | 3 | |
Note: N = number of individuals sampled.
Fig. 2.Depth-of-coverage plots for four exemplar loci based on reads aligned to the Artocarpus camansi draft genome. Each gray line represents a rolling average depth across 50 bp for one of 22 Artocarpus species. The dark line represents the average depth of coverage. Red bars indicate the location of exon boundaries predicted in the Artocarpus camansi draft genome.