| Literature DB >> 26861660 |
Petra ten Hoopen1, Clara Amid2, Pier Luigi Buttigieg3, Evangelos Pafilis4, Panos Bravakos4, Ana M Cerdeño-Tárraga2, Richard Gibson2, Tim Kahlke5, Aglaia Legaki4, Kada Narayana Murthy6, Gabriella Papastefanou4, Emiliano Pereira7, Marc Rossello2, Ana Luisa Toribio2, Guy Cochrane2.
Abstract
Discoverability of sequence data in primary data archives is proportional to the richness of contextual information associated with the data. Here, we describe an exercise in the improvement of contextual information surrounding sample records associated with metagenomics sequence reads available in the European Nucleotide Archive. We outline the annotation process and summarize findings of this effort aimed at increasing usability of publicly available environmental data. Furthermore, we emphasize the benefits of such an exercise and detail its costs. We conclude that such a third party annotation approach is expensive and has value as an element of curation, but should form only part of a more sustainable submitter-driven approach. Database URL: http://www.ebi.ac.uk/ena.Entities:
Mesh:
Year: 2016 PMID: 26861660 PMCID: PMC4747322 DOI: 10.1093/database/bav126
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1An example of a sample record improvement by the Sample Record Annotation Workshop. The Attributes tab of the ENA sample record SAMEA1573721 is shown here with the originally submitted contextual data (a) and expanded annotation as a result of the Workshop (b).
An overview of ontology classes (with their unique class identifiers), the number of ENA sample records annotated with these classes and ENA study accessions associated with the annotated sample records
| Ontology class | Ontology class unique ID | Ontology class frequency | ENA study accession |
|---|---|---|---|
| village biome | ENVO:01000246 | 1277 | PRJEB2989 |
| anthropogenic terrestrial biome | ENVO:01000219 | 395 | PRJEB638, PRJEB1391, PRJEB1720, PRJEB7248, PRJEB7112, PRJEB5976 |
| dense settlement biome | ENVO:01000248 | 207 | PRJEB4413, PRJEB4562, PRJEB3374 |
| rangeland biome | ENVO:01000247 | 48 | PRJEB5982 |
| urban biome | ENVO:01000249 | 7 | PRJEB4512 |
| polar desert biome | ENVO:01000186 | 4 | PRJEB3228 |
| garden | ENVO:00000011 | 723 | PRJEB2989 |
| field | ENVO:00000114 | 723 | PRJEB2989 |
| cultivated habitat | ENVO:00000113 | 550 | PRJEB2989 |
| digestive tract | UBERON:0001555 | 388 | PRJEB1391, PRJEB4413, PRJEB1720, PRJEB7112, PRJEB4562, PRJEB3374 |
| intestine | UBERON:0000160 | 315 | PRJEB4413, PRJEB1720, PRJEB7112, PRJEB4562, PRJEB3374 |
| animal house | ENVO:00003040 | 200 | PRJEB638 |
| rumen | UBERON:0007365 | 48 | PRJEB5982 |
| lung | UBERON:0002048 | 8 | PRJEB7248 |
| infection | EFO:0000544 | 8 | PRJEB7248 |
| bacterial disease | EFO:0000771 | 8 | PRJEB7248 |
| brewery | ENVO:00003885 | 7 | PRJEB4512 |
| anaerobic sludge | ENVO:00002129 | 7 | PRJEB4512 |
| breast | UBERON:0000310 | 6 | PRJEB5976 |
| coastal plain | ENVO:00000090 | 4 | PRJEB3228 |
| plant tissue culture | ENVO:02000009 | 4 | PRJEB2989 |
| seedling | TAIR:0000027 | 4 | PRJEB2989 |
| rhizosphere | ENVO:00005801 | 621 | PRJEB2989 |
| root matter | ENVO:01000349 | 536 | PRJEB2989 |
| feces | UBERON:0001988 | 332 | PRJEB1391, PRJEB4413, PRJEB7112, PRJEB4562, PRJEB3374 |
| gastrointestinal contents | UBERON:0035118 | 136 | PRJEB638, PRJEB1720 |
| soil | ENVO:00001998 | 124 | PRJEB3228, PRJEB2989 |
| cecum mucosa | UBERON:0000314 | 120 | PRJEB638 |
| autoclaved sand | ENVO:01000350 | 120 | PRJEB2989 |
| cud | UBERON:0012114 | 48 | PRJEB5982 |
| gastric juice | UBERON:0001971 | 48 | PRJEB5982 |
| sputum | UBERON:0007311 | 8 | PRJEB7248 |
| waste water | ENVO:00002001 | 7 | PRJEB4512 |
| milk | UBERON:0001913 | 6 | PRJEB5976 |
| North Carolina Area | GAZ:00082924 | 1277 | PRJEB2989 |
| France | GAZ:00002940 | 147 | PRJEB4413 |
| South Korea | GAZ:00002802 | 73 | PRJEB1391 |
| China | GAZ:00002845 | 60 | PRJEB1720, PRJEB3374 |
| Israel | GAZ:00002476 | 52 | PRJEB7112 |
| India | GAZ:00002840 | 48 | PRJEB5982 |
| Commune of Espelette | GAZ:00321111 | 36 | PRJEB638 |
| Commune of Severac le Chateau | GAZ:00372953 | 26 | PRJEB638 |
| Commune of Bouchemaine | GAZ:00377283 | 25 | PRJEB638 |
| Commune of Louan-Villegruis | GAZ:00365581 | 24 | PRJEB638 |
| Calw district | GAZ:00020488 | 24 | PRJEB638 |
| Arrondissement du Nancy | GAZ:00008488 | 22 | PRJEB638 |
| Divonne les Bains | GAZ:00059221 | 22 | PRJEB638 |
| Cologne | GAZ:00396037 | 21 | PRJEB638 |
| Gambia | GAZ:00000907 | 8 | PRJEB7248 |
| Baldwinsville | GAZ:00223041 | 7 | PRJEB4512 |
| Garwood Valley | GAZ:00139908 | 4 | PRJEB3228 |
| Homo sapiens | NCBI:9606 | 293 | PRJEB4413, PRJEB7112, PRJEB4562, PRJEB1720, PRJEB7248, PRJEB5976 |
| Mus musculus | NCBI:10090 | 236 | PRJEB638, PRJEB7112, PRJEB3374 |
| Bos taurus | NCBI:9913 | 48 | PRJEB5982 |
The NCBI Taxonomy hierarchy has been included here for an overview of the records taxonomic coverage.
Figure 2Word cloud of ontology classes annotated in the sample records as a result of the environmental Sample Record Annotation Workshop. The word cloud illustrates frequency of ontology classes usage summarised in Table 1.
A list of ENA studies and the number of associated sample records updated with annotation results of the Sample Record Annotation Workshop
| ENA study | Number of sample records |
|---|---|
| PRJEB2989 | 1277 |
| PRJEB638 | 201 |
| PRJEB4413 | 147 |
| PRJEB1391 | 73 |
| PRJEB1720 | 56 |
| PRJEB4562 | 56 |
| PRJEB7112 | 52 |
| PRJEB5982 | 48 |
| PRJEB7248 | 8 |
| PRJEB4512 | 7 |
| PRJEB5976 | 6 |
| PRJEB3228 | 4 |
| PRJEB3374 | 4 |