| Literature DB >> 29115917 |
Chao Feng1, Meizhen Xu1,2, Chen Feng1,2, Eric J B von Wettberg3, Ming Kang4.
Abstract
BACKGROUND: Primulina Hance is an emerging model for studying evolutionary divergence, adaptation and speciation of the karst flora. However, phylogenetic relationships within the genus have not been resolved due to low variation detected in the cpDNA regions. Chloroplast genomes can provide important information for phylogenetic and population genetic studies. Recent advances in next-generation sequencing (NGS) techniques greatly facilitate sequencing whole chloroplast genomes for multiple individuals. Consequently, novel strategies for development of highly polymorphic loci for population genetic and phylogenetic studies based on NGS data are needed.Entities:
Keywords: Chloroplast assembly; High-variation regions; Next-generation sequencing; Primulina; RAD-Seq; Sub-super-marker
Mesh:
Substances:
Year: 2017 PMID: 29115917 PMCID: PMC5678776 DOI: 10.1186/s12862-017-1067-z
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Fig. 1Schematic of sub-assembly of chloroplast genome from PE RAD-seq. a Illustration of RAD-seq method. b Pipeline for sub-assembly of cpDNA from PE RAD-seq. This pipeline, SACRing, was divided into 4 steps, and step 3 and 4 were highlighted in top and bottom boxes with a purple dashed line, respectively. The detail of assembly was enlarged with the boxes with a grey dashed line. c Illustration of cluster analysis based on the result of SACRing
The basic information of genome survey data of the three Primulina species related to chloroplast genomes
| Species | Population code | Reads No. (M)a | Throughput(G)b | Quality Q30c | Cp Size (bp) | Coverage |
|---|---|---|---|---|---|---|
|
| WHY01 | 12.5 | 3.14 | 97.2; 96.9 | 152,373 | 20,585 |
|
| GDHJ02 | 18.5 | 4.63 | 97.8; 97.5 | 153,401 | 30,191 |
|
| GXNN01 | 21.1 | 4.22 | 95.4; 95.4 | 153,493 | 27,517 |
aReads No. was counted based on the Reads which was used in assembly of cp genomes, instead of whole genome survey sequencing data
bThroughput = Read No. x read length
cQuality Q30 was counted by Read1 and Read2 respectively
Fig. 2Circular map of chloroplast genomes of three Primulina species and Boea hygrometrica. Genes shown inside and outside of the outer circle are transcribed clockwise and counterclockwise, respectively. Genes belonging to different groups are marked with different color. The distribution of GC content was shown in the inner circle. Circular map of cp genomes of P. eburnea (a), P. huaijiensis (b), P. linearifolia (c) and Boea hygrometrica (d)
Fig. 3Circular map of cross-genus consensus chloroplast genome sequences of three Primulina species and Boea hygrometrica. The outermost circle is positions (in Kb) of consensus cp genome sequences. a Annotation of consensus cp genome sequences. Genes shown inside and outside of the circle are transcribed clockwise and counterclockwise, respectively, and their gene names are marked as black and red, respectively. Genes belonging to different groups are marked with different color, with the bar shown in the center. b Distribution of conserved regions of the four cp genomes. The grey columns represent the Con_Islands, which were defined as the regions containing over 50 continuous conserved sites in cross-genus consensus cp genome sequences. c Distribution of intrageneric and intergeneric polymorphism. The red and blue lines represent average intrageneric and intergeneric PICs in a 100 bp windows with a step of 25 bp, respectively. The PICs were counted by the sum of SNP and Indel between two cp genomes. d Distribution of restriction enzyme site of EcoRI. The outer to inner circles represent the distribution of EcoR I in P. eburnea, P. huaijiensis, P. linearifolia and B. hygrometrica in turns
Fig. 4Intrageneric and intergeneric polymorphism of chloroplast variation regions of three Primulina species and Boea hygrometrica. Intrageneric (a) and intergeneric (b) polymorphism (PICs) of noncoding regions of cp genomes. Intrageneric (c) and intergeneric (d) polymorphism (PICs) of Con_Sea regions of cp genomes. PICs were counted by the sum of SNP and Indel between two cp genomes. Con_Sea is a region between two adjacent Con_Islands, which is defined as the regions containing over 50 continuous conserved sites in cross-genus consensus genome sequences. Eight regions used for experimental evaluation were signed as filled circles, while others were empty circles. The name of the regions with high variation or used for experimental evaluation was marked along the circles. It was noting that the name of these regions in part C and D was shown as the name of noncoding regions overlapped with corresponding Con_Seas, instead of original ID of Con_Seas. The yellow lines link the same regions, while the red lines link the corresponding regions between noncoding regions and Con_Seas
The basic information of 21 Primulina RAD-Seq data related to chloroplast genomes
| Species | ID | Reads No.a | Quality Q30b | RAD tags No.c | Paired tags No. - ratio (%)d | Overlap tags No. - ratio (%)e | Paired & Overlap tags No. - ratio (%)f | Cpcontigsg | ||
|---|---|---|---|---|---|---|---|---|---|---|
| No. | Length | Ratio (%)h | ||||||||
|
| CZYX01–1 | 320,451 | 96.0; 97.2 | 164 | 73 (×2) - 89 | 132 - 80 | 50 (×2) - 61 | 62 | 69,862 | 55.1 |
| CZYX01–2 | 239,167 | 96.3; 95.7 | 151 | 64 (×2) - 85 | 130 - 86 | 52 (×2) - 69 | 62 | 65,591 | 51.7 | |
| CZYX01–3 | 426,610 | 94.7; 96.1 | 151 | 61 (×2) - 81 | 121 - 80 | 44 (×2) - 58 | 75 | 63,584 | 50.1 | |
| CZYX01–4 | 474,623 | 95.4; 96.4 | 153 | 63 (×2) - 82 | 119 - 78 | 43 (×2) - 56 | 76 | 62,588 | 49.3 | |
| CZYX01–5 | 797,450 | 95.4; 96.6 | 148 | 59 (×2) - 80 | 115 - 78 | 40 (×2) - 54 | 79 | 65,103 | 51.3 | |
| CZYX01–6 | 427,701 | 96.4; 95.6 | 154 | 64 (×2) - 83 | 126 - 82 | 48 (×2) - 62 | 70 | 67,315 | 53.0 | |
| CZYX01–7 | 404,182 | 96.4; 96.3 | 156 | 67 (×2) - 86 | 129 - 83 | 50 (×2) - 64 | 65 | 66,895 | 52.7 | |
| CZYX01i | 441,455 | 95.8; 96.3 | 154 | 64 (×2) - 84 | 125 - 81 | 47 (×2) - 61 | 70 | 65,848 | 51.9 | |
|
| CZYX02–1 | 278,229 | 96.2; 97.4 | 145 | 58 (×2) - 80 | 117 - 81 | 42 (×2) - 58 | 68 | 62,883 | 49.6 |
| CZYX02–2 | 343,741 | 96.0; 96.1 | 145 | 61 (×2) - 84 | 111 - 77 | 44 (×2) - 61 | 69 | 65,103 | 51.3 | |
| CZYX02–3 | 228,868 | 96.2; 95.5 | 137 | 52 (×2) - 76 | 109 - 80 | 38 (×2) - 55 | 72 | 58,660 | 46.2 | |
| CZYX02–4 | 230,025 | 96.2; 95.4 | 138 | 53 (×2) - 77 | 110 - 80 | 38 (×2) - 55 | 72 | 58,696 | 46.3 | |
| CZYX02–5 | 211,979 | 96.3; 95.6 | 140 | 56 (×2) - 80 | 115 - 82 | 46 (×2) - 66 | 71 | 61,728 | 48.6 | |
| CZYX02–6 | 227,541 | 96.3; 96.5 | 145 | 59 (×2) - 81 | 117 - 81 | 44 (×2) - 61 | 69 | 61,609 | 48.6 | |
| CZYX02–7 | 277,044 | 96.3; 96.5 | 143 | 57 (×2) - 80 | 111 - 78 | 42 (×2) - 59 | 75 | 63,284 | 49.9 | |
| CZYX02j | 256,775 | 96.2; 96.1 | 142 | 57 (×2) - 80 | 113 - 80 | 43 (×2) - 59 | 71 | 61,709 | 48.6 | |
|
| CZYX03–1 | 208,356 | 95.9; 96.0 | 141 | 60 (×2) - 85 | 114 - 81 | 42 (×2) - 60 | 61 | 63,457 | 50.0 |
| CZYX03–2 | 315,085 | 95.9; 95.9 | 142 | 60 (×2) - 85 | 109 - 77 | 40 (×2) - 56 | 65 | 64,130 | 50.5 | |
| CZYX03–3 | 172,588 | 96.1; 96.7 | 139 | 54 (×2) - 78 | 110 - 79 | 39 (×2) - 56 | 71 | 65,415 | 51.6 | |
| CZYX03–4 | 145,876 | 96.3; 96.1 | 139 | 54 (×2) - 78 | 116 - 83 | 45 (×2) - 65 | 68 | 61,485 | 48.5 | |
| CZYX03–5 | 694,697 | 96.3; 97.4 | 155 | 65 (×2) - 84 | 126 - 81 | 40 (×2) - 52 | 69 | 65,392 | 51.5 | |
| CZYX03–6 | 333,471 | 96.1; 97.1 | 154 | 63 (×2) - 82 | 129 - 84 | 52 (×2) - 68 | 67 | 69,166 | 54.5 | |
| CZYX03–7 | 381,048 | 95.9; 96.0 | 140 | 58 (×2) - 83 | 112 - 80 | 41 (×2) - 59 | 61 | 63,551 | 50.1 | |
| CZYX03k | 321,589 | 96.1; 96.5 | 144 | 59 (×2) - 82 | 117 - 81 | 43 (×2) - 59 | 65 | 64,647 | 51.0 | |
aReads No. was counted from the Reads which was mapped into the reference cp genome P.eburnea (WHY01), instead of whole sequencing data.
bQuality Q30 was counted from Read1 and Read2 respectively.
cRAD tags No.: The number of Tag1s, clustered from read1 of RAD-Seq.
dPaired tags No. & ratio: The number of pTs (paired tags) and the value of pTs No. / RAD tags No., where pT was mixed from two Tag1s at forward and reverse directions of the same RE (restriction enzyme site).
eOverlap tag No. & ratio: The number of oTs (overlap tags) and the value of oTs No. / RAD tags No., where oT was mixed from Tag1 and its paired Tag2 (assembly of read2) according to the overlap.
fPaired & Overlap tags No. & ratio: The number of poTs (paired & overlap tags) and the value of poTs No. / RAD tags No., where poT was mixed from paired tags (two Tag1s) and their paired Tag2s (two Tag2s). Both of two Tag1 have overlap regions with paired Tag2s.
gCpcontigs: it is a longer sequences without unknown nucleotides, and it was assembled from all the kinds of tags, including pTs, oTs, poTs and other types, according to the their position and overlap information.
hRatio: it was counted as following: The length of Cpcontigs / the length of reference cp genome of P.eburnea (WHY01), while the cp genome length here is 126,890, excluding the length of IRa.
iCZYX01, jCZYX02, kCZYX03: they represent the index in the species level, which was calculated by the average level of 7 individuals from the same species
Fig. 5The heatmap of poTs among 21 individuals from three Primulina populations. poTs, short for paired & overlap tags, is a contig assembled from Tag1s (clustered from read1 of RAD-Seq) at forward and reverse directions of the same restriction enzyme site (RE) and their paired Tag2s (assembled from read2 of RAD-Seq). CpContigs are longer contigs further assembled from all kinds of RAD tags, including poTs. X axis of the heatmap showed the name of poTs, composed by “poT-” and a digit, while the digit represents the position of RE, around the middle position of this poT. The grids filled with different color represent the length of poTs at specific site and specific individual, with the bar shown in the bottom right corner. While black grids mean these poTs were not be assembled. The length distribution of 65 poTs and 21 individuals were shown at the top and the right of heat map, respectively, with the same bar shown at the top right corner
Fig. 6The distribution, polymorphic indexes of poTs and CpContigs in 21 individuals from three Primulina populations. poTs, short for paired & overlap tags, is a contig assembled from Tag1s (clustered from read1 of RAD-Seq) at forward and reverse directions of the same restriction enzyme site (RE) and their paired Tag2s (assembled from read2 of RAD-Seq). CpContigs are longer contigs further assembled from all kinds of RAD tags, including poTs. a, b The distribution of cp fragments (poTs) and CpContigs. The circle (a) showed the gene annotation of WHY01 (Primulina eburnea). Genes shown inside and outside of the circle are transcribed clockwise and counterclockwise, respectively, and their gene names are marked as black and red, respectively. Genes belonging to different groups are marked with different color, with the bar shown in the center. The outside and inside of circle (b) showed the distribution of cp fragments (poTs) and CpContigs, respectively. And brown line in the outside of this circle represented the position of REs. c, d The Bayesian / Maximum Likelihood phylogenetic tree and population genetic indexes based on concatenate sequences of poTs (c) and CpContigs (d). Posterior probabilities >0.5 in BI analysis and bootstrap values >50% in ML analysis are indicated on the left and right of slash respectively