Literature DB >> 34011275

Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets.

T M Porter1, M Hajibabaei2.   

Abstract

BACKGROUND: Pseudogenes are non-functional copies of protein coding genes that typically follow a different molecular evolutionary path as compared to functional genes. The inclusion of pseudogene sequences in DNA barcoding and metabarcoding analysis can lead to misleading results. None of the most widely used bioinformatic pipelines used to process marker gene (metabarcode) high throughput sequencing data specifically accounts for the presence of pseudogenes in protein-coding marker genes. The purpose of this study is to develop a method to screen for nuclear mitochondrial DNA segments (nuMTs) in large COI datasets. We do this by: (1) describing gene and nuMT characteristics from an artificial COI barcode dataset, (2) show the impact of two different pseudogene removal methods on perturbed community datasets with simulated nuMTs, and (3) incorporate a pseudogene filtering step in a bioinformatic pipeline that can be used to process Illumina paired-end COI metabarcode sequences. Open reading frame length and sequence bit scores from hidden Markov model (HMM) profile analysis were used to detect pseudogenes.
RESULTS: Our simulations showed that it was more difficult to identify nuMTs from shorter amplicon sequences such as those typically used in metabarcoding compared with full length DNA barcodes that are used in the construction of barcode libraries. It was also more difficult to identify nuMTs in datasets where there is a high percentage of nuMTs. Existing bioinformatic pipelines used to process metabarcode sequences already remove some nuMTs, especially in the rare sequence removal step, but the addition of a pseudogene filtering step can remove up to 5% of sequences even when other filtering steps are in place.
CONCLUSIONS: Open reading frame length filtering alone or combined with hidden Markov model profile analysis can be used to effectively screen out apparent pseudogenes from large datasets. There is more to learn from COI nuMTs such as their frequency in DNA barcoding and metabarcoding studies, their taxonomic distribution, and evolution. Thus, we encourage the submission of verified COI nuMTs to public databases to facilitate future studies.

Entities:  

Keywords:  Bioinformatics; COI mtDNA; DNA barcode; Hidden Markov model; Metabarcode; NuMT; Nuclear encoded mitochondrial sequences; Pseudogene

Year:  2021        PMID: 34011275     DOI: 10.1186/s12859-021-04180-x

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  50 in total

1.  Mitochondrial pseudogenes: evolution's misplaced witnesses.

Authors:  D Bensasson; D -X. Zhang; D L. Hartl; G M. Hewitt
Journal:  Trends Ecol Evol       Date:  2001-06-01       Impact factor: 17.712

Review 2.  Evolution of mitochondrial gene content: gene loss and transfer to the nucleus.

Authors:  Keith L Adams; Jeffrey D Palmer
Journal:  Mol Phylogenet Evol       Date:  2003-12       Impact factor: 4.286

3.  Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi.

Authors:  Conrad L Schoch; Keith A Seifert; Sabine Huhndorf; Vincent Robert; John L Spouge; C André Levesque; Wen Chen
Journal:  Proc Natl Acad Sci U S A       Date:  2012-03-27       Impact factor: 11.205

Review 4.  The evolution of gene duplications: classifying and distinguishing between models.

Authors:  Hideki Innan; Fyodor Kondrashov
Journal:  Nat Rev Genet       Date:  2010-01-06       Impact factor: 53.242

5.  Nuclear integrations: challenges for mitochondrial DNA markers.

Authors:  D X Zhang; G M Hewitt
Journal:  Trends Ecol Evol       Date:  1996-06       Impact factor: 17.712

Review 6.  Mitochondrial DNA: molecular fossils in the nucleus.

Authors:  N T Perna; T D Kocher
Journal:  Curr Biol       Date:  1996-02-01       Impact factor: 10.834

7.  Assessing the effects of primer specificity on eliminating numt coamplification in DNA barcoding: a case study from Orthoptera (Arthropoda: Insecta).

Authors:  Matthew J Moulton; Hojun Song; Michael F Whiting
Journal:  Mol Ecol Resour       Date:  2010-01-03       Impact factor: 7.090

8.  Mitochondrial DNA repairs double-strand breaks in yeast chromosomes.

Authors:  M Ricchetti; C Fairhead; B Dujon
Journal:  Nature       Date:  1999-11-04       Impact factor: 49.962

9.  Hit or miss in phylogeographic analyses: the case of the cryptic NUMTs.

Authors:  Coralie Bertheau; Hannes Schuler; Susanne Krumböck; Wolfgang Arthofer; Christian Stauffer
Journal:  Mol Ecol Resour       Date:  2011-07-27       Impact factor: 7.090

10.  Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat.

Authors:  J V Lopez; N Yuhki; R Masuda; W Modi; S J O'Brien
Journal:  J Mol Evol       Date:  1994-08       Impact factor: 2.395

View more
  3 in total

1.  The Value of Whole-Genome Sequencing for Mitochondrial DNA Population Studies: Strategies and Criteria for Extracting High-Quality Mitogenome Haplotypes.

Authors:  Kimberly Sturk-Andreaggi; Joseph D Ring; Adam Ameur; Ulf Gyllensten; Martin Bodner; Walther Parson; Charla Marshall; Marie Allen
Journal:  Int J Mol Sci       Date:  2022-02-17       Impact factor: 5.923

2.  Mitochondrial cytochrome c oxidase subunit I (COI) metabarcoding of Foraminifera communities using taxon-specific primers.

Authors:  Jan-Niklas Macher; Dimitra Maria Bloska; Maria Holzmann; Elsa B Girard; Jan Pawlowski; Willem Renema
Journal:  PeerJ       Date:  2022-09-05       Impact factor: 3.061

3.  MetaWorks: A flexible, scalable bioinformatic pipeline for high-throughput multi-marker biodiversity assessments.

Authors:  Teresita M Porter; Mehrdad Hajibabaei
Journal:  PLoS One       Date:  2022-09-29       Impact factor: 3.752

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.