| Literature DB >> 24759094 |
Rajat S Roy1, Dana C Price2, Alexander Schliep3, Guohong Cai4, Anton Korobeynikov5, Hwan Su Yoon6, Eun Chan Yang7, Debashish Bhattacharya2.
Abstract
A broad swath of eukaryotic microbial biodiversity cannot be cultivated in the lab and is therefore inaccessible to conventional genome-wide comparative methods. One promising approach to study these lineages is single cell genomics (SCG), whereby an individual cell is captured from nature and genome data are produced from the amplified total DNA. Here we tested the efficacy of SCG to generate a draft genome assembly from a single sample, in this case a cell belonging to the broadly distributed MAST-4 uncultured marine stramenopiles. Using de novo gene prediction, we identified 6,996 protein-encoding genes in the MAST-4 genome. This genetic inventory was sufficient to place the cell within the ToL using multigene phylogenetics and provided preliminary insights into the complex evolutionary history of horizontal gene transfer (HGT) in the MAST-4 lineage.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24759094 PMCID: PMC3998028 DOI: 10.1038/srep04780
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Analysis of protist SCG data.
Phylogenetic position of the Rhode Island MAST-4 single cell isolate used for genome sequencing. NCBI “gi” numbers are shown for each rDNA sequence. Known members of the MAST-4 clade24 are shown in red text. Bootstrap values shown above and below the branches are from RAxML and PhyML (in Italic text) analyses, respectively, using 1,000 iterations and the GTRGAMMA model of sequence evolution.
Assembly statistics for the MDA samples from the MAST-4 cell and for the three (A, B, C) T. pseudonana samples
| Dataset | Total data (Gbp) | Assembly Size (Mbp) | Scaffold length (Mbp) with ≥90% alignment to reference genome | No of Scaffolds | N50 (Kbp) | Maximum Scaffold Length (Kb) |
|---|---|---|---|---|---|---|
| MAST-4 | 6.61 | 17 | N/A | 4611 | 14 | 111 |
| A | 1.30 | 45 | 33.80 | 38245 | 7 | 101 |
| B | 0.98 | 41 | 21.43 | 22838 | 25 | 204 |
| C | 1.18 | 44 | 25.79 | 23466 | 17 | 168 |
| Combined | 3.46 | 48 | 28.48 | 27397 | 16 | 201 |
The scaffold length alignment was not done for the MAST-4 cell because of the lack of a reference genome.
Protein prediction results for the three (A, B, C) diatom MDA samples
| Dataset | Number of predicted proteins | Reference proteins with ≥70% alignment | Number of complete core proteins found | Proteins with ≥60% alignment to |
|---|---|---|---|---|
| A | 13523 | 7500 | 373 | 398 |
| B | 13022 | 7658 | 397 | 421 |
| C | 14933 | 8060 | 397 | 432 |
| Combined | 16439 | 8341 | 398 | 421 |
| Reference | 9413 | 8644 | 396 | 413 |
Figure 2Analysis of proteins derived from MAST-4 SCG data.
Phylogenetic tree inferred from the concatenated alignment of the core 458 CEGMA proteins with the results of 100 bootstrap replicates (when ≥50%) shown at the branches. The numbers in Italics below the branches derive from a RAxML bootstrap analysis using a subset of 159 CEGMA proteins that were full-length in the MAST-4 SCG assembly. The complete tree is shown in Supplementary Fig. S7.
Figure 3Phylogeny of MAST-4 proteins.
(a) RAxML tree of violaxanthin de-epoxidase (VDE). The results of 100 RAxML and PhyML bootstrap replicates (when ≥50%) are above and below the branches, respectively, and “gi” numbers are shown for taxa. (b) Coverage map and gene predictions for contig 104 in the MAST-4 SCG assembly that encodes the alga-derived VDE shown above.