| Literature DB >> 28071766 |
Gabriel Gonzalez1, Michihito Sasaki2, Lucy Burkitt-Gray3, Tomonori Kamiya4, Noriko M Tsuji4, Hirofumi Sawa2,5,6, Kimihito Ito1.
Abstract
Advances in Next Generation Sequencing technologies have enabled the generation of millions of sequences from microorganisms. However, distinguishing the sequence of a novel species from sequencing errors remains a technical challenge when the novel species is highly divergent from the closest known species. To solve such a problem, we developed a new method called Optimistic Protein Assembly from Reads (OPAR). This method is based on the assumption that protein sequences could be more conserved than the nucleotide sequences encoding them. By taking advantage of metagenomics, bioinformatics and conventional Sanger sequencing, our method successfully identified all coding regions of the mouse picobirnavirus for the first time. The salvaged sequences indicated that segment 1 of this virus was more divergent from its homologues in other Picobirnaviridae species than segment 2. For this reason, only segment 2 of mouse picobirnavirus has been detected in previous studies. OPAR web tool is available at http://bioinformatics.czc.hokudai.ac.jp/opar/.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28071766 PMCID: PMC5223137 DOI: 10.1038/srep40447
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Diagram of OPAR approach to deduce nucleotide sequences.
(a) Flowchart of the steps for aligning N reads from a virome to a specific amino acid or nucleotide sequence used as a reference. Aligned and unaligned segments of the K aligned reads are the input for building a consensus sequence used for designing primers. (b) Example OPAR usage for designing primers from consensus sequences on proteins. Primers A and B designed from consensus sequences respectively located in proteins A and B allow us to analyze the nucleotide sequence between these regions. (c) Example OPAR usage for designing primers from consensus sequences located outside of characterized sequences. Primers C and F designed from consensus sequences respectively located upstream and downstream of an already characterized sequence allow us to analyze the uncharacterized novel sequences. Primers D and E are designed on the characterized section of the sequence.
Figure 2Coverage of reads aligned to PBV/mouse/JPN/2015 segment 1 proteins outputted by OPAR.
The ordinate axes represent the count of reads aligned to each particular section of the protein sequence. The abscissae represent the relative nucleotide position inferred from the aligned amino acid sequences of the respective proteins. Horizontal red lines represent the threshold set to the minimum expected average with 90% confidence, 4254 and 20,167 reads in (a) and (b), respectively. The results correspond to reads aligned to homologues in human picobirnavirus of proteins in segment 1 (a) non-structural protein and (b) capsid protein.
Primers designed to sequence PBV/mouse/JPN/2015.
| No. | Sequence | Direction | Segment | Start-End |
|---|---|---|---|---|
| 1 | TCGAACCGACACAATCTGACA | + | S1 | 617–637 |
| 2 | TCAGCGTTATACACTGGACC | − | S1 | 1732–1751 |
| 3 | ACAACACCAAAGTAAGGGAAAGT | + | S1 | 208–226 |
| 4 | AGTCGGTGTTTTAGATGGTGTAG | − | S1 | 2054–2076 |
| 5 | GTTTATGACACGAAACCAAATAGCGT | + | S1 | 1–26 |
| 6 | TTAGGCGTTCAACTTCATCCTG | − | S1 | 392–413 |
| 7 | ACATTCCGACACAGCCTACACCTGAA | + | S1 | 1940–1965 |
| 8 | ACCCAACCAAGGTTTACGCT | + | S2 | 1–20 |
| 9 | ATGTGGGATATCTAAACCAAGTCT | − | S2 | 1386–1409 |
*Primer sequence was not completely identical to the target region of PBV/mouse/JPN/2015 virus genome.
Pairwise amino acid identity of PBV/mouse/JPN/2015 to other diverged PBVs.
| Diverged PBVs | Accession no. | s1nsp | s1cp | ||
|---|---|---|---|---|---|
| Length (aa) | Identity (%) | Length (aa) | Identity (%) | ||
| PBV/Mouse/JPN/2015 | LC110352 | 241 | 100 | 577 | 100 |
| PBV/pig/ITA/2004 | KF861770 | 199 | 15.1 | 545 | 18.5 |
| PBV/horse/USA/2012 | KR902504 | 151 | 15.6 | 536 | 16.2 |
| PBV/human/THA | NC_007026 | 224 | 18.7 | 552 | 18.8 |
| PBV/human/USA/2013 | KJ663813 | 116 | 15.2 | 552 | 15.2 |
| PBV/human/NLD/2007 | GU968923 | 213 | 21.6 | 243 | 8.3 |
| PBV/turkey/USA/2011 | KJ495689 | 252 | 21.8 | 550 | 20 |
| PBV/otarine/HKG/2008 | JQ776551 | 162 | 19.1 | 575 | 20.7 |
| PBV/pig/ITA/2004 | KF861768 | 178 | 16.1 | 615 | 19 |
| PBV/fox/NLD/2012 | KC692367 | 201 | 20.6 | 506 | 16.9 |
| PBV/horse/USA/2012 | KR902506 | 222 | 20.4 | 527 | 22.4 |
| PBV/horse/USA/2012 | KR902508 | 251 | 10.2 | 557 | 21 |
| PBV/human/NLD/2008 | KJ206568 | 129 | 8.4 | 514 | 19.2 |
| PBV/dromedary/ARE/2013 | AIY31265 | N.D. | N.D. | 516 | 20.2 |
| PBV/dromedary/ARE/2013 | AIY31272 | N.D. | N.D. | 496 | 20.3 |
| PBV/rabbit/GBR/ | CAB65394 | N.D. | N.D. | 590 | 17.4 |
| PBV/dromedary/ARE/2013 | AIY31283 | N.D. | N.D. | 465 | 15.6 |
*Compared to itself to provide context of proteins length.
+Not determined.
Figure 3Phylogenetic trees of putative PBV proteins.
The percentage of trees inferred by Maximum Likelihood in which the associated taxa clustered together are shown next to the branches. Accession numbers of sequences are displayed between parentheses. The taxa of novel characterized PBV/mouse/JPN/2015 proteins are written in red. The trees correspond to (a) segment 1 non-structural protein, (b) segment 1 capsid protein and (c) segment 2 RNA-dependent RNA polymerase.