| Literature DB >> 25880302 |
Sandeep K Kushwaha1, Lokeshwaran Manoharan2, Tejashwari Meerupati3, Katarina Hedlund4, Dag Ahrén5,6.
Abstract
BACKGROUND: Massive sequencing of genes from different environments has evolved metagenomics as central to enhancing the understanding of the wide diversity of micro-organisms and their roles in driving ecological processes. Reduced cost and high throughput sequencing has made large-scale projects achievable to a wider group of researchers, though complete metagenome sequencing is still a daunting task in terms of sequencing as well as the downstream bioinformatics analyses. Alternative approaches such as targeted amplicon sequencing requires custom PCR primer generation, and is not scalable to thousands of genes or gene families.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25880302 PMCID: PMC4355349 DOI: 10.1186/s12859-015-0501-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Flowchart of the MetCap probe designing pipeline.
List of MetCap generated files and their descriptions
|
|
|
|
|---|---|---|
| 1. | probe_run1_summary | Summary of initial probe generation |
| 2. | probe_run2_summary | Summary of final probe generation |
| 3. | probe_file1 | Contains the initially (first) generated probes |
| 4. | probe_file2 | Contains all probes from both generations |
| 5. | core_cluster.txt | Clustering file of submitted sequences |
| 6. | probe_non_redundant.txt | Non-redundant set of generated probes |
Quality control results
| Number of sequenced reads | 138,970 |
| Number of sequenced reads after filtering | 129,198 |
| Number of reads failed in quality control | 9,772 |
| Base count among filtered reads (bps) | 44,281,009 |
| Mean sequence length (bps) | 342 ± 212 |
| Mean GC percent | 63 ± 5% |
Comparative table of BLAST hits and capture efficiency against different databases on the e-value 1e-10
|
|
|
|
|
|
|---|---|---|---|---|
| 1. | Targeted Nucleotide Database | BLASTN | 27,131 | 20.99 |
| 2. | Targeted Protein Database | BLASTX | 37,329 | 28.89 |
| 3. | NCBI-NR | BLASTX | 66,822 | 51.72 |
| 4. | Uniprot | BLASTX | 65,903 | 51.00 |
Figure 2BLASTX hits against targeted databases on different e-values.
Figure 3Functional profile in sequenced reads through MEGAN.
Figure 4Mapping of sequenced reads over 10 bacterial genomes and full description of abbreviations along with mapping percentage. Bradyrhizobium (Bradyrhizobium japonicum USDA 6, 17.8%), Kribbella (Kribbella flavida DSM 17836, 13.9%), Streptomyces (Streptomyces coelicolor A3(2), 13.4%), Nocardioides (Nocardioides sp. JS614, 10.5%), Sorangium (Sorangium cellulosum So0157-2, 10.0%), Mycobacterium (Mycobacterium smegmatis JS623, 8.8%), Frankia (Frankia sp. EAN1, 7.7%), Myxococcus (Myxococcus xanthus DK 1622, 5.4%), Conexibacter (Conexibacter woesei DSM 14684, 5.3%), Candidatus (Candidatus Solibacter usitatus Ellin6076, 3.2%).
Figure 5Distribution of kingdom for CAZy families in collected sequences.
Figure 6Venn diagram of reads assigned by three methods.
Figure 7Mapping of reads for CAZy families through read assignment.