| Literature DB >> 26084484 |
Ming-Ju Amy Lyu1, Udo Gowik2, Steve Kelly3, Sarah Covshoff4, Julia Mallmann5, Peter Westhoff6, Julian M Hibberd7, Matt Stata8, Rowan F Sage9, Haorong Lu10, Xiaofeng Wei11, Gane Ka-Shu Wong12,13,14, Xin-Guang Zhu15.
Abstract
BACKGROUND: The genus Flaveria has been extensively used as a model to study the evolution of C4 photosynthesis as it contains C3 and C4 species as well as a number of species that exhibit intermediate types of photosynthesis. The current phylogenetic tree of the genus Flaveria contains 21 of the 23 known Flaveria species and has been previously constructed using a combination of morphological data and three non-coding DNA sequences (nuclear encoded ETS, ITS and chloroplast encoded trnL-F).Entities:
Mesh:
Substances:
Year: 2015 PMID: 26084484 PMCID: PMC4472175 DOI: 10.1186/s12862-015-0399-9
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
RNA-Seq data and cross mapping result
| Sample | PS type | # read | Average length | # mapping read | % mapping read | # target CDS | % mapping target |
|---|---|---|---|---|---|---|---|
| Pair-end RNA-seq of from Illumina, read length: 75–90 bp (from 1KP) | |||||||
|
| a | 16,809,686.00 | 90 | 7,902,191.00 | 47.01 % | 25,692.00 | 72.60 % |
|
| a | 15,622,832.00 | 90 | 8,114,254.00 | 51.94 % | 25,337.00 | 71.60 % |
|
| a | 20,064,474.00 | 90 | 9,372,051.00 | 46.71 % | 26,209.00 | 74.07 % |
|
| a | 16,219,108.00 | 90 | 9,150,289.00 | 56.42 % | 25,308.00 | 71.52 % |
|
| b | 16,668,010.00 | 90 | 7,983,526.00 | 47.90 % | 25,599.00 | 72.34 % |
|
| b | 18,085,350.00 | 90 | 9,059,885.00 | 50.10 % | 24,877.00 | 70.30 % |
|
| b | 18,897,990.00 | 90 | 11,118,929.00 | 58.84 % | 25,933.00 | 73.29 % |
|
| b | 20,703,102.00 | 90 | 8,502,290.00 | 41.07 % | 25,438.00 | 71.89 % |
|
| b | 22,424,194.00 | 90 | 11,170,838.00 | 49.82 % | 25,959.00 | 73.36 % |
|
| c | 19,329,884.00 | 90 | 10,666,300.00 | 55.18 % | 25,669.00 | 72.54 % |
|
| c | 18,876,338.00 | 90 | 8,830,209.00 | 46.78 % | 25,814.00 | 72.95 % |
|
| d | 25,424,874.00 | 90 | 13,020,754.00 | 51.21 % | 26,103.00 | 73.77 % |
|
| d | 23,089,000.00 | 90 | 12,163,057.00 | 52.68 % | 25,582.00 | 72.29 % |
|
| d | 19,220,058.00 | 90 | 10,881,823.00 | 56.62 % | 25,359.00 | 71.66 % |
|
| d | 23,726,482.00 | 90 | 11,295,399.00 | 47.61 % | 26,442.00 | 74.72 % |
|
| d | 27,345,748.00 | 90 | 13,030,756.00 | 47.65 % | 25,665.00 | 72.53 % |
|
| a | 25,213,280.00 | 90 | 7,916,909.00 | 31.40 % | 26,039.00 | 73.59 % |
|
| a | 19,828,848.00 | 75 | 5,069,369.00 | 25.57 % | 26,041.00 | 73.59 % |
|
| a | 23,106,402.00 | 90 | 9,485,525.00 | 41.05 % | 26,235.00 | 74.14 % |
| Average | 20,560,824.20 | 47.67 % | 73.78 % | ||||
| Single-end RNA-seq from Illumina, read length: 100 bp from (HHU). | |||||||
|
| a | 38,529,805.00 | 90.1 | 20,920,082.00 | 54.30 % | 28,605.00 | 80.84 % |
|
| a | 33,113,842.00 | 90.1 | 9,625,516.00 | 29.07 % | 29,033.00 | 82.05 % |
|
| b | 31,408,476.00 | 85.1 | 14,328,304.00 | 45.62 % | 28,533.00 | 80.63 % |
|
| b | 31,056,596.00 | 91.1 | 15,457,407.00 | 49.77 % | 26,676.00 | 75.39 % |
|
| b | 39,911,614.00 | 89.9 | 18,468,621.00 | 46.27 % | 28,375.00 | 80.19 % |
|
| b | 38,236,849.00 | 84.9 | 18,685,391.00 | 48.87 % | 28,465.00 | 80.44 % |
|
| b | 29,940,352.00 | 91.4 | 15,965,038.00 | 53.32 % | 28,957.00 | 81.83 % |
|
| b | 35,283,647.00 | 90.4 | 20,060,016.00 | 56.85 % | 29,010.00 | 81.98 % |
|
| c | 43,802,834.00 | 91.6 | 20,986,495.00 | 47.91 % | 28,180.00 | 79.64 % |
|
| c | 27,804,586.00 | 84 | 12,421,541.00 | 44.67 % | 28,926.00 | 81.74 % |
|
| c | 35,000,281.00 | 84 | 16,077,619.00 | 45.94 % | 28,772.00 | 81.31 % |
|
| d | 25,312,995.00 | 84.1 | 10,357,274.00 | 40.92 % | 27,387.00 | 77.40 % |
|
| d | 34,333,242.00 | 90.9 | 16,600,362.00 | 48.35 % | 27,731.00 | 78.37 % |
|
| d | 33,540,674.00 | 91.2 | 19,511,743.00 | 58.17 % | 29,059.00 | 82.12 % |
| Averag | 34,091,128.00 | 47.86 % | 0.815 | ||||
|
| 34,491,406.00 | 91.8 | 16,020,332.00 | 46.45 % | 28,180.00 | 79.68 % | |
|
| 36,588,034.00 | 91.3 | 17,261,488.00 | 47.18 % | 28,465.00 | 80.48 % | |
|
| 38,514,685.00 | 91.8 | 17,220,911.00 | 44.71 % | 28,772.00 | 81.35 % | |
|
| 23,089,000.00 | 90 | 12,163,057.00 | 52.68 % | 25,582.00 | 72.33 % | |
| Average | 20,440,345.00 | 47.64 % | 78.46 % | ||||
Note: Abbreviations: F: Flaveria, H: Helenium, Ta: Tanacetum, Tr: Tragopogon, −j/m: juvenile/mature leaf sample from 1KP, #: leaf sample from HHU. PS. (photosynthetic) type: a: C3, b: C3-C4, c: C4-like, d: C4
Fig. 1The workflow for data matrix construction. a–e: the workflow for obtaining data matrix. a: the coding sequence (CDS) of A. thaliana was used as template for mapping. RNA-Seq reads were translated into amino acid sequences and mapped to the template using BLAT in protein space; b: Continuously mapped reads were retained after passing minimal BLAT mapping score (see Methods), and exact read-mapped regions on the template were then extracted. c: UCS, CS and AS were determined by calculating the nucleotide frequency at each site based on the mapping result (see Methods); d: The codons were extracted from CS using sliding windows. e: linking retained codons for each CDS, CS data matrix was then built by concatenating retained codons from all CDS for ML method. (Abbreviations: UCS: uncovered site, CS: consensus site, AS: ambiguous site.)
Fig. 2Phylogeny of ten mosquito species. a: phylogeny of 10 mosquito species constructed using our strategy. Both Bayesian inference (BI) tree and Maximum likelihood (ML) tree were inferred from 1,678 genes with 251,184 sites with GTR + GAMMA + I model of sequence substitution and variation. The number besides each node was posterior probability inferred from 1,000,000 generations/bootstrap score from 100 bootstrap sampling. b: Phylogeny of 10 mosquito species using ML method in Hittinger et al. (2010)
The percentage of sites with a C3 origin, or C3-C4 origin in F. pri × F. ang, F. angustifolia and F. sonorensis
|
|
|
| ||||
|---|---|---|---|---|---|---|
| Category | # sites | Proportion (%) | # sites | Proportion | # sites | Proportion (%) |
| Expressed from C3 allele | 731 | 9.04 | 764 | 9.76 | 0 | 0 |
| Expressed from C3 -C4 allele | 1633 | 20.2 | 6668 | 85.17 | 6609 | 97.62 |
| Expressed from both alleles | 3075 | 38.03 | 18 | 0.23 | 0 | 0 |
| Uncertain | 2573 | 31.83 | 305 | 4.84 | 161 | 3.38 |
apulled RNA-Seq date sets from HHU and 1KP to interpret the Pulling F. pri × F. ang
Fig. 3Phylogenetic tree of individual Flaveria samples based on m-CDS. To remove the effect of F. pri × F. ang on phylogenetic relationships among other species, the phylogenetic tree was constructed without F. pri × F. ang. The m-CDS of A. thaliana was used as mapping reference to construct consensus sequence (CS) matrix according to Fig. 1. A CS matrix with 343,590 sites from 2,190 genes was used to infer phylogenetic relationships based on both Bayesian inference (BI) and Maximum likelihood (ML) using GTR + GAMMA + I model of sequence substitution and variation. BI tree and ML tree showed consistent topology. The numbers besides each node were posterior probability inferred from 1000,000 generations (up) and bootstrap score (down) from 500 bootstrap sampling. (#/shoot#/root#/: leaf/shoot/root sample from HHU, j/m: juvenile/mature leaf sample from 1KP. m-CDS: reference contains the longest gene for each paralog family)
Fig. 4Phylogenetic tree of 16 Flaveria species using m-CDS. Pooled RNA-Seq reads of 16 Flaveria species were mapped to m-CDS of A. thaliana, consensus sequence matric was then built according to method shown in Fig. 1. Both Bayesian inference (BI) tree and Maximum likelihood (ML) tree were inferred from 2, 462 genes with 539,391 sites with GTR + GAMMA + I model of sequence substitution and variation. The numbers besides each node were posterior probability (up) inferred from 1000,000 generations and bootstrap score (down) from 500 bootstrap sampling. The numbers in brackets were relative branch length estimated from Bayesian. (m-CDS: reference contains the longest gene for each paralog family)