| Literature DB >> 31681426 |
Liang Guo1,2, Hong Yao1, Brian Shepherd3, Osvaldo J Sepulveda-Villet3,4, Dian-Chang Zhang2, Han-Ping Wang1.
Abstract
Entities:
Keywords: RAD-seq; aquaculture; conservation; genotyping; germplasm collection; polymorphic SSR; yellow perch
Year: 2019 PMID: 31681426 PMCID: PMC6802114 DOI: 10.3389/fgene.2019.00992
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Characteristics of yellow perch RAD-Seq. (A) Illustrating the removal of adaptor sequences at the 3′ end of read. The red arrows show the forward reads, and blue arrows show the reverse reads. Usually, each read contains adaptor sequence at the 5- end and removed routinely. The adaptor sequence would appear in 3′ end when the read length is longer than the library size. Based on the characters of rare Indel and overlapping genome sequence fragment, the 5′ end sequence fragment probe (green) in the forward read (red) was used to scan the reverse read (dark blue) in the direction from 3′ end to 5′ end with a step size of 1 bp. Once the distance fell below a certain threshold value, the sequence fragment located on the 3′ end of the matched sequence fragment was treated as an adapter and then removed. The scan was also applied to the forward read. (B) The read number that clustered for each RAD-tag. The blue line shows the expected distribution fitting Gaussian distribution. (C) The distribution of final contig length. (D) The distribution of SNPs along the contig (red points: SNPs in the first dataset, blue points: SNPs in the second dataset). (E) The percentage of SNP number for each individual in the whole SNPs (blue points, left axis) in the first dataset and read pair number for each individual after removing duplication (red points, red axis). The nine individuals with less than 50% of total SNPs were removed in population genetics. (F) The distribution of the 18 individuals from six strains ( ) along the first and second principal components. (G) The phylogenetic tree for the same individuals in graph G with the same color schedule. (H) The electrophoretogram for the 40 randomly chosen SSRs. The last second lane on the right-down figure is negative control without primer inside.
Description of samples and statistic of reads.
| Sample code | Index | Strain | Platform | Read pair (M) | Second dataset2 | Distribution | |
|---|---|---|---|---|---|---|---|
|
|
| ||||||
| 1 | CCAAC | NC1 | HiSeq | 7.1 | 1.39 | Y | South Atlantic coast |
| 2 | GAGAT | NC1 | HiSeq | 8.1 | 1.61 | Y | |
| 3 | CGACGATACTTG | NC1 | HiSeq | 18.8 | 6.02 | Y | |
| 4 | TCTGAGCGTACA | NE | HiSeq | 16.0 | 5.85 | Y | Northwest Lake Plains |
| 5 | GATCG | NE | HiSeq | 8.4 | 1.76 | Y | |
| 6 | GCATT | NE | HiSeq | 6.3 | 1.31 | Y | |
| 7 | ATGTGTCGCCAA | NY | HiSeq | 25.6 | 7.93 | Y | Lake Ontario |
| 8 | AAGGG | NY | HiSeq | 4.4 | 0.96 | Y | |
| 9 | ACACG | NY | HiSeq | 6.1 | 1.20 | Y | |
| 10 | CACAG | OH | HiSeq | 7.1 | 1.33 | Y | Lake Erie West |
| 11 | CAGTC | OH | HiSeq | 6.0 | 1.11 | Y | |
| 12 | CATGA | OH | HiSeq | 6.2 | 1.31 | Y | |
| 13 | TAGCA | PA | HiSeq | 6.0 | 1.33 | Y | Lake Erie East |
| 14 | TATAC | PA | HiSeq | 9.2 | 1.95 | Y | |
| 15 | TCAGA | PA | HiSeq | 5.6 | 1.38 | Y | |
| 16 | GACTA | WI | HiSeq | 5.4 | 1.20 | Y | Lake Michigan |
| 17 | AAAAA | WI | HiSeq | 5.0 | 1.19 | Y | |
| 18 | AACCC | WI | HiSeq | 2.1 | 1.06 | Y | North Atlantic coast |
| 19 | TATAC | MD | HiSeq | 4.9 | 0.26 | N | |
| 20 | TCAGA | MD | HiSeq | 5.3 | 0.37 | N | |
| 21 | CTTCCGG | MD | HiSeq | 1.9 | 0.02 | N | |
| 22 | TGGTATG | MD | HiSeq | 1.0 | 0.01 | N | South Atlantic coast |
| 23 | ATGTGTCGCCAA | NC2 | HiSeq | 2.8 | 0.24 | N | |
| 24 | TCTGAGCGTACA | NC2 | HiSeq | 5.7 | 0.49 | N | |
| 25 | TAGCA | NC2 | HiSeq | 3.3 | 0.17 | N | |
| 26 | CGCACTC | NC2 | HiSeq | 1.6 | 0.02 | N | |
| 27 | ATGTGTCGCCAA | NY | MiSeq | 0.9 | 0.77 | N | Lake Ontario |
| 28 | TCTGAGCGTACA | NE | MiSeq | 0.7 | 0.58 | N | Northwest Lake Plains |
| 29 | CGACGATACTTG | NC1 | MiSeq | 0.7 | 0.57 | N | South Atlantic coast |
1This column shows the number of read pairs after removing of duplication.
2Y indicates the individual was included in the second dataset, otherwise represented by “N”.