| Literature DB >> 22434835 |
Nicole Vasilevsky1, Tenille Johnson, Karen Corday, Carlo Torniai, Matthew Brush, Erik Segerdell, Melanie Wilson, Chris Shaffer, David Robinson, Melissa Haendel.
Abstract
Development of biocuration processes and guidelines for new data types or projects is a challenging task. Each project finds its way toward defining annotation standards and ensuring data consistency with varying degrees of planning and different tools to support and/or report on consistency. Further, this process may be data type specific even within the context of a single project. This article describes our experiences with eagle-i, a 2-year pilot project to develop a federated network of data repositories in which unpublished, unshared or otherwise 'invisible' scientific resources could be inventoried and made accessible to the scientific community. During the course of eagle-i development, the main challenges we experienced related to the difficulty of collecting and curating data while the system and the data model were simultaneously built, and a deficiency and diversity of data management strategies in the laboratories from which the source data was obtained. We discuss our approach to biocuration and the importance of improving information management strategies to the research process, specifically with regard to the inventorying and usage of research resources. Finally, we highlight the commonalities and differences between eagle-i and similar efforts with the hope that our lessons learned will assist other biocuration endeavors. DATABASE URL: www.eagle-i.net.Entities:
Mesh:
Year: 2012 PMID: 22434835 PMCID: PMC3308157 DOI: 10.1093/database/bar067
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Participating institutions in the eagle-i Consortium
| Institutions |
|---|
| Harvard University, Cambridge, MA |
| Oregon Health and Science University, Portland, OR |
| University of Hawaii at Manoa, Manoa, HI |
| Montana State University, Bozeman, MT |
| Dartmouth College, Hanover, NH |
| Morehouse School of Medicine, Atlanta, GA |
| Jackson State University, Jackson, MS |
| University of Puerto Rico, San Juan, PR |
| University of Alaska Fairbanks, Fairbanks, AK |
Summary of the number of resources collected at each site
| Resource type | Alaska | Dartmouth | Harvard | Hawaii | Jackson State | Montana State | Morehouse | OHSU | Puerto Rico | Total |
|---|---|---|---|---|---|---|---|---|---|---|
| Organisms and viruses | 15 | 14 262 | 1186 | 12 816 | 3 | 87 | 17 | 151 | 46 | |
| Instruments | 225 | 114 | 1171 | 688 | 85 | 181 | 66 | 216 | 612 | |
| Reagents | 0 | 125 | 5773 | 184 | 4 | 173 | 65 | 233 | 161 | |
| Services | 66 | 101 | 984 | 347 | 42 | 71 | 52 | 465 | 110 | |
| Software | 38 | 47 | 222 | 65 | 50 | 43 | 6 | 150 | 66 | |
| Protocols | 67 | 34 | 137 | 47 | 12 | 73 | 7 | 122 | 86 | |
| Core laboratories | 8 | 20 | 196 | 36 | 14 | 15 | 12 | 37 | 33 | |
| Research opportunities | 0 | 2 | 18 | 0 | 3 | 1 | 1 | 7 | 0 | |
| Biological specimens | 0 | 0 | 0 | 2844 | 2 | 0 | 0 | 43 | 25 | |
| Human studies | 13 | 0 | 0 | 134 | 42 | 0 | 0 | 2 | 0 | |
| Total | 432 | 14 705 | 9687 | 17 161 | 257 | 644 | 226 | 1426 | 1139 | 45 677 |
Electronic systems include spreadsheets, text files, MacVector and MAG-ML; non-electronic systems include lab notebooks and paper files; LIMS include Quartzy and Epic.
Note: some labs used more than one type of inventory system.
Figure 1.Workflow of the eagle-i team. The role of the Resource Navigators is to collect and add data to the system, such as organizations or resources. All users (Curators, Lab Users and Resource Navigators) can enter data into the Data Collection tool in draft state. To edit a record, it must be ‘claimed’ by the user and then ‘shared’ after editing. Curators and Resource Navigators can send resources to curation. Data ‘in curation’ is managed by the Curation team and subsequently published, where it is visible in the Search interface. After a record is published, a Curator can withdraw, duplicate or delete the record, or return the record to draft for further editing.
Summary of laboratories that use a lab inventory system and type of system used at each institution in the eagle-i Consortium
| Institution | Type of inventory system | Total number of labs | Percentage of labs with inventory systems (%) | ||||
|---|---|---|---|---|---|---|---|
| Electronic | Non-electronic | LIMS | Database | Unspecified | |||
| Alaska | 6 | 1 | 1 | 15 | 47 | ||
| Dartmouth | 1 | 57 | 2 | ||||
| Harvard | 3 | 1 | 1 | 2 | 206 | 3 | |
| Hawaii | 23 | 2 | 2 | 105 | 26 | ||
| JSU | 6 | 1 | 2 | 20 | 40 | ||
| Montana | 1 | 1 | 1 | 77 | 4 | ||
| MSM | 3 | 1 | 1 | 1 | 19 | 26 | |
| OHSU | 9 | 2 | 1 | 2 | 75 | 15 | |
| UPR | 103 | 0 | |||||
| Total | 51 | 2 | 5 | 6 | 11 | 677 | 10 |
Figure 2.Example of an annotation form in the Data Collection Tool for the plasmid reagent type. (A) The Data Collection Tool contains annotation fields that are auto-populated using the ontology (red box) and free text (yellow box). Fields in the Data Collection Tool can also link records to other records in the repository, such as related publications or documentation (blue box). Users can request new terms be added to the ontology using the Term Request field. Inset: Construct insert is an embedded class in the plasmid form and contains information that corresponds to other databases, such as Entrez Gene ID. (B) The search result upon searching for this specified plasmid. Only the fields that are filled out in the data tool are displayed in the search interface. Search results can be returned for this plasmid by searching on any of the fields that are annotated for this record. Text that is colored blue links to other records in the search interface. Hovering over the ‘i’ icons displays the ontological definition of the term, as in the example of the technique, in situ hybridization.
Figure 3.Decision trees were used to assist with data collection and curation. (A) Decision tree legend. (B) The decision tree for biological specimens. Required and highly desired fields are indicated by green and blue colors, respectively. Each resource type had 2–3 required fields, between 4–8 highly desired fields and the other fields were considered optional or applied only to specific subtypes of resources.
Figure 4.Average number of fields recorded for instruments before and after a QA effort. There was a 1.1–4.3-fold increase in the average number of filled fields after the QA effort. Error bars indicate standard deviation.