| Literature DB >> 23082187 |
Mária Džunková1, Giuseppe D'Auria, David Pérez-Villarroya, Andrés Moya.
Abstract
Natural environments represent an incredible source of microbial genetic diversity. Discovery of novel biomolecules involves biotechnological methods that often require the design and implementation of biochemical assays to screen clone libraries. However, when an assay is applied to thousands of clones, one may eventually end up with very few positive clones which, in most of the cases, have to be "domesticated" for downstream characterization and application, and this makes screening both laborious and expensive. The negative clones, which are not considered by the selected assay, may also have biotechnological potential; however, unfortunately they would remain unexplored. Knowledge of the clone sequences provides important clues about potential biotechnological application of the clones in the library; however, the sequencing of clones one-by-one would be very time-consuming and expensive. In this study, we characterized the first metagenomic clone library from the feces of a healthy human volunteer, using a method based on 454 pyrosequencing coupled with a clone-by-clone Sanger end-sequencing. Instead of whole individual clone sequencing, we sequenced 358 clones in a pool. The medium-large insert (7-15 kb) cloning strategy allowed us to assemble these clones correctly, and to assign the clone ends to maintain the link between the position of a living clone in the library and the annotated contig from the 454 assembly. Finally, we found several open reading frames (ORFs) with previously described potential medical application. The proposed approach allows planning ad-hoc biochemical assays for the clones of interest, and the appropriate sub-cloning strategy for gene expression in suitable vectors/hosts.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23082187 PMCID: PMC3474745 DOI: 10.1371/journal.pone.0047654
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
InterProScan annotation overview.
| Annotation tool | Total number of matches | Total number of unique protein names |
| BlastProDom | 26 | 18 |
| Coil | 190 | 1 |
| FPrintScan | 732 | 107 |
| Gene3D | 1312 | 226 |
| HAMAP | 112 | 95 |
| HMMPanther | 1017 | 165 |
| HMMPfam | 1526 | 742 |
| HMMPIR | 93 | 74 |
| HMMSmart | 318 | 109 |
| HMMTigr | 341 | 257 |
| PatternScan | 257 | 140 |
| ProfileScan | 384 | 129 |
| Seg | 1859 | 1 |
| SignalPHMM | 394 | 1 |
| superfamily | 1188 | 226 |
| TMHMM | 1535 | 1 |
Total number of matches and total number of unique protein names assigned by different annotation tools provided by InterProScan. This table summarizes Table S1, which contains the whole list of protein matches in our assembly. The number of matches is higher than the number of unique protein names because one type of protein could be found in several contigs or one ORF could contain several matches to the same protein.
Figure 1KEGG categories distribution.
Distribution of KEGG categories identified among ORFs.
Figure 2Annotated ORFs with reported industrial applications.
Figure describes ORFs annotation of selected clones of interest. Annotation colors describe the kind of annotation (see legend). Every panel describes a different clone (see results section text for detailed descriptions).
ORFs with potential industrial or medical applications.
| Protein name | Contig ID | ORF number | Contig length |
| Arginine deiminase | 2H2 | 19 | 14.867 bp |
| Uracil phosphoribosyl transferase/Uridine kinase | 2C1 | 1 | 7.390 bp |
| Choloylglycine hydrolase | 2B3 | 4 | 6.393 bp |
| Alginate lyase | HK3UA | 4 | 8.204 bp |
| Spermine synthase | 7H8 | 17 | 16.889 bp |
| Cystathionine synthase | GZINT | 16 and 19 | 12.265 bp |
Columns describe ORF annotations, contig identifiers and ORFs identifier.