| Literature DB >> 35774992 |
Marco Dorfner1, Tankred Ott1, Philipp Ott1, Christoph Oberprieler1.
Abstract
Premise: Most phylogenomic library preparation methods and bioinformatic analysis tools in restriction site-associated DNA sequencing (RADseq)/genotyping-by-sequencing (GBS) studies are designed for use with Illumina data. The lack of alternative bioinformatic pipelines hinders the exploration of long-read multi-locus data from other sequencing platforms. The Simple Long-read loci Assembly of Nanopore data for Genotyping (SLANG) pipeline enables locus assembly, orthology estimation, and single-nucleotide polymorphism (SNP) calling using Nanopore-sequenced multi-locus data. Methods andEntities:
Keywords: AFLP; Leucanthemum; Nanopore; Senecio; genotyping
Year: 2022 PMID: 35774992 PMCID: PMC9215276 DOI: 10.1002/aps3.11484
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 2.511
Figure 1The SLANG workflow. During within‐sample clustering, quality‐ and length‐filtered reads are clustered according to their locus identity. Reads are then mapped to their cluster‐consensus sequence. Unmapped reads are filtered out under the assumption that they do not belong to the locus concerned. Only clusters meeting the mapped‐read depth threshold are eligible for the among‐samples clustering analysis, where consensus sequences of the passing clusters are clustered to estimate locus orthology across samples. Clusters with only one consensus sequence per sample and enough samples per locus pass the filters. Finally, sequences of the among‐samples clusters are mapped to their consensus sequence for reference‐based SNP calling.
Sample information. Senecio reads were filtered for read lengths between 50 and 1000 bp, while Leucanthemum was filtered for reads between 200 and 1000 bp. A total of 310,336,638 bp of Leucanthemum sequences and 244,902,300 bp of Senecio sequences passed the Q7 quality filter.
| Sample | Sample ID | Longitude | Latitude | Raw reads (after qcat) | Raw bases (after qcat) | Reads after filtering | Bases after filtering |
|---|---|---|---|---|---|---|---|
|
| 120‐02 | 43.8925 | 3.2477222 | 156,281 | 65,540,162 | 139,944 | 58,525,941 |
|
| 131‐01 | 44.141167 | 3.7316389 | 207,747 | 87,943,849 | 181,131 | 76,063,161 |
|
| 276‐01 | 46.860333 | 13.817233 | 197,957 | 80,059,219 | 169,044 | 68,019,760 |
|
| 495‐02 | 45.404022 | 22.885686 | 180,232 | 76,793,408 | 155,960 | 65,887,521 |
|
| 01‐02 | 47.699850 | 10.183917 | 79,547 | 27,804,961 | 62,028 | 21,381,063 |
| 01‐03 | 47.699850 | 10.183917 | 69,116 | 22,901,337 | 54,964 | 17,649,906 | |
| 01‐04 | 47.699850 | 10.183917 | 69,087 | 24,695,263 | 53,836 | 18,869,793 | |
|
| 02‐02 | 49.049767 | 12.257717 | 85,092 | 30,998,678 | 66,178 | 20,887,198 |
| 02‐01 | 49.049767 | 12.257717 | 75,204 | 26,453,432 | 60,319 | 20,887,198 | |
| 02‐05 | 49.049767 | 12.257717 | 80,383 | 29,748,871 | 63,851 | 23,136,493 | |
|
| 03‐03 | 49.052850 | 11.973900 | 74,672 | 27,267,933 | 59,789 | 21,595,843 |
| 03‐04 | 49.052850 | 11.973900 | 76,213 | 27,754,247 | 59,332 | 21,004,957 | |
| 03‐05 | 49.052850 | 11.973900 | 72,262 | 27,277,578 | 57,889 | 21,225,204 |
Figure 2Phylogenetic network reconstructions of the Leucanthemum and Senecio data sets. (A) Phylogenetic network reconstruction of the Leucanthemum data set based on GBS high‐quality short Illumina reads assembled using ipyrad (left) and AFLP‐based Nanopore reads assembled using SLANG (right). (B) Phylogenetic network reconstruction of the Senecio nemorensis group data set produced using SLANG. Nei–Li distances were calculated based on the base frequencies in the VCF file and used as inputs in SplitsTree version 4.16.1 (Huson and Bryant, 2006).