| Literature DB >> 25960871 |
Jamie Alnasir1, Hugh P Shanahan1.
Abstract
BACKGROUND: The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment, remains to be determined.Entities:
Keywords: Annotation; Enrichment; Experiment; Fragmentation; Ligation; Metadata; Next-generation sequencing; Protocol
Mesh:
Year: 2015 PMID: 25960871 PMCID: PMC4425880 DOI: 10.1186/s13742-015-0064-7
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1A typical next-generation sequencing workflow. The sequencing workflow is shown by the black arrows; red arrows depict the metadata that should be captured from these sequencing workflow steps. We have focused on the first three major steps.
Figure 2Schema diagram of the SRA relational SQLite database based on the SQL metadata. Field names in emphasis have been probed for protocol step annotation (Table 2) together with submission table date-stamp. Diamonds represent one-to-many relationships. Fields in bold emphasis are those with relevant experimental metadata.
SRA developer documentation (col_desc table found in SQLite DB)
| Table | Field | Description |
|---|---|---|
| Study |
| Briefly describes the goals, purpose and scope of the Study. This need not be listed if it can be inherited from a referenced publication. |
| Study |
| More extensive free-form description of the study. |
| Sample |
| Free-form text describing the sample, its origin and its method of isolation. |
| Experiment |
| More details about the set-up and goals of the experiment as supplied by the Investigator. |
| Experiment |
| Whether any method was used to select and/or enrich the material being sequenced. |
| Experiment |
| Free-form text describing the protocol by which the sequencing library was constructed. |
| Experiment |
| Properties and attributes of the experiment. These can be entered as free-form tag-value pairs. |
Table depicting the structured word list for given protocol steps (fragmentation, adapter-ligation, enrichment)
| Protocol step structured word lists | ||
|---|---|---|
| Fragmentation | Adapter-ligation | Enrichment |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
The ‘%’ symbol denotes fuzzy-match logic, for instance %amplif will match (amplify and amplified).
Metadata SQL query results
| Table | Field | Total records (in table) |
| |||
|---|---|---|---|---|---|---|
| Fragmentation | Adapter ligation | Enrichment | All steps | |||
| Study |
|
|
|
|
|
|
| Study |
|
|
|
|
| |
| Sample |
|
|
|
|
|
|
| Experiment |
|
|
|
|
|
|
| Experiment |
|
|
|
|
| |
| Experiment |
|
|
|
|
| |
| Experiment |
|
|
|
|
| |
Each column (on the right side of the table dividing line) represents a sequencing step for which a word list is used to filter records where this step is annotated. Counts are the number of experiment records exhibiting this particular annotation. “All steps” indicates the number of fields containing all three types of protocol step annotation, i.e. they all have keywords from each of the keyword lists.
Figure 3Ratio of annotated experiment records to total vs total number of experiment records per study. Only study records where at least one experiment record is fully annotated are included. Points where the ratio is 1 represent study records where all of the experiment records in a given study are fully annotated. The green line is a plot of 1/total number of experiment records in a given study. Points lying along this line are those studies where only one experiment record is fully annotated (presumably to represent the annotation of all the other experiment records). Points between these two curves represent studies where an intermediate number (neither 1 or all of the experiment records) are annotated.