| Literature DB >> 34268018 |
Madison Caballero1, Edwin Lauer2, Jeremy Bennett1, Sumaira Zaman1, Susan McEvoy1, Juan Acosta2, Colin Jackson2, Laura Townsend2, Andrew Eckert3, Ross W Whetten2, Carol Loopstra4, Jason Holliday5, Mihir Mandal6, Jill L Wegrzyn1, Fikret Isik2.
Abstract
PREMISE: An informatics approach was used for the construction of an Axiom genotyping array from heterogeneous, high-throughput sequence data to assess the complex genome of loblolly pine (Pinus taeda).Entities:
Keywords: Pinus taeda; exome capture; genomic selection; genotype array; genotyping‐by‐sequencing (GBS); loblolly pine; variant detection
Year: 2021 PMID: 34268018 PMCID: PMC8272584 DOI: 10.1002/aps3.11439
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 1.936
Summary of populations and variant filters.
| Information category | Cohort | |||||
|---|---|---|---|---|---|---|
| Exome capture of 375 trees | Exome capture of 24 trees | ddRAD of 1536 trees | ddRAD of 753 trees | WGS of 10 trees | Illumina Infinium array | |
| Tissue and ploidy | Needle (diploid) | Megagametophyte (haploid) |
Phloem (diploid) | Phloem (diploid) | Megagametophyte (haploid) | Megagametophyte (haploid) |
| Sequencing platform (estimated coverage) | Illumina HiSeq 2500 (30×) | Illumina HiSeq 2000 (30×) | Illumina HiSeq 2000 (15×) | Illumina HiSeq 2500 (22×) | Illumina HiSeq 3000 (>10×) | Probes designed from Sanger resequenced ESTs |
| Total reads aligned to reference (%) | 91% | 98% | 35% | 75% | >99% | NA |
| Total strict quality variants | 7,702,804 | 1,516,877 | 261,768 | 1,105,218 | 1,546,311 | 1181 aligned, 1840 unaligned |
| Total pre‐screening variants | 109,602 | 86,200 | 268,154 | 34,388 | 156,456 | 1178 aligned, 1656 unaligned |
| Total post‐screening variants | 28,518 | 7642 | 27,657 | 6009 | 17,973 | 108 aligned, 1209 unaligned |
| Total variants on final array (Pita50K) | 13,962 | 4432 | 15,635 | 3398 | 10,854 | 36 aligned, 919 unaligned |
ddRAD = double‐digest RAD sequencing; NA = not applicable; WGS = whole genome sequencing.
Total number of strict quality variants: 8,272,630.
Total number of pre‐screening variants: 642,275.
Total number of post‐screening variants: 84,845.
Total number of variants on the final array: 46,439.
FIGURE 1Informatic workflow describing the Pita50K array design. Four types of genomic data across six data sets were used in array design. Genomic reads from exome capture, ddRADseq, and whole genome sequencing (WGS) studies were aligned to the Pinus taeda reference genome, and variants were called with two thresholds. Probes designed around strict quality variants of exome capture and ddRADseq studies were assessed for potential off‐target hybridization through k‐mer to genome alignment scores. Variants from the WGS study (and later all variants from haploid megagametophyte tissue) that were heterozygous were removed. Previously successful Illumina Infinium array probes that align to the genome were assessed alongside the candidate variants for polymorphisms within flanking regions of probes. Passing probes were scored by Thermo Fisher Scientific, and recommended probes were further filtered via a screening array to create the final Pita50K Thermo Fisher Axiom array. This array contains 919 probes from the Illumina Infinium array that did not align to the reference genome and 36 that did align.
FIGURE 2Proximity and function of pre‐screening candidate variants and final array variants to genes. (A) Annotation of 642,275 candidate probes prior to scoring by Thermo Fisher Scientific and screening array selection. Results do not include 1656 Illumina Infinium array probes that did not have reference gene coordinates. Rare functions were grouped as “other.” All annotation categories and effects are available in Appendices S3 and S7. (B) Annotation of the final array probes. Results do not include 919 Illumina Infinium array probes that do not have reference genome coordinates.