| Literature DB >> 32393172 |
Reza Zolfaghari Emameh1, Marianne Kuuslahti2, Hassan Nosrati3, Hannes Lohi4,5,6, Seppo Parkkila2,7.
Abstract
BACKGROUND: The inaccuracy of DNA sequence data is becoming a serious problem, as the amount of molecular data is multiplying rapidly and expectations are high for big data to revolutionize life sciences and health care. In this study, we investigated the accuracy of DNA sequence data from commonly used databases using carbonic anhydrase (CA) gene sequences as generic targets. CAs are ancient metalloenzymes that are present in all unicellular and multicellular living organisms. Among the eight distinct families of CAs, including α, β, γ, δ, ζ, η, θ, and ι, only α-CAs have been reported in vertebrates.Entities:
Keywords: Carbonic anhydrase; Contamination; Curation; DNA; Database; Sequencing
Mesh:
Substances:
Year: 2020 PMID: 32393172 PMCID: PMC7216627 DOI: 10.1186/s12864-020-6762-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Predicted genomic location of (a) a β-CA gene in Mus musculus, strain NOD/ShiLtJ (scaffold LVXS01065484.1: 870–1430) and (b) a γ-CA gene in Xenopus tropicalis (scaffold GL180697.1: 4765-5075)
Identified β-and γ-CAs from vertebrates
| Type of CA | NCBI IDs | Vertebrate species | Status in database | 73–100% identical species | Exon count | |
|---|---|---|---|---|---|---|
| 2017–2018 | 2019–2020 | |||||
| XP_007454654.1 | A | A | 1 | |||
| XP_007466906.1 | ||||||
| XP_005974256.1 | A | D | 1 | |||
| XP_005956696.1 | ||||||
| XP_005973271.1 | ||||||
| XP_005979975.1 | ||||||
| XP_005954808.1 | ||||||
| LVXS01065484.1: 870–1430a | A | A | ND | |||
| SJM31717.1 | A | D | 1 | |||
| LVHJ01039623:18–230a | U | A | ND | |||
| QNTS01034426:189–644a | U | A | ND | |||
| XP_024266887.1 | U | A | 1 | |||
| XP_007452618.1 | A | A | 1 | |||
| XP_007465530.1 | ||||||
| XP_005974442.1 | A | D | 1 | |||
| XP_005977566.1 | ||||||
| XP_005974267.1 | ||||||
| GL180697.1: 4765-5075a | A | D | Comamonadaceae bacterium | ND | ||
| SJM34589.1 | A | D | 1 | |||
| XP_004001159.1 | A | D | 1 | |||
| XP_019578089.1 | A | D | 1 | |||
| LVHJ01047219:4–240a | U | A | Bacteroidetes bacterium (93.7%) | ND | ||
Abbreviations: ND Not defined, A Available, D Discontinued, U Unavailable (Supplementary file 1)
a: Genomic location in the Ensembl genome browser 95
b: The sequencing shows only the first highly conserved sequence (CXDXR)
Fig. 2Multiple sequence alignment (MSA) of β-CA protein sequences from vertebrates. The highly conserved amino acids are shown by highlighted vertical bands
Fig. 3Multiple sequence alignment (MSA) of γ-CA protein sequences from vertebrates. The highly conserved amino acids are shown by highlighted vertical bands
Fig. 4PCR analysis of the γ-CA gene from F. catus and β-CA gene from M. musculus. Samples from two animals of both species were included in the analysis, and primer pairs P1, P3, P5, and P8 were selected based on preliminary experiments. a shows the results from the first round of PCR. The bands nearest to the estimated correct size (red arrows) are marked with red circles (1–9). These bands were isolated, and the purified DNAs were used as templates for the second round of PCR. The results are shown in b. The amplified products from samples 3, 4, 8, and 9 were subsequently subjected to DNA sequencing
Designed primers for the β- and γ-CA genes
| CA family | Vertebrate species | Primer pairs | Product length (bp) | |
|---|---|---|---|---|
| γ-CA | P1 | Forward: 5′- AGATAACTACTTCACATCTGACA −3’ | 1089 | |
| Reverse: 3′- ATACAGGGCTGGGTGCCT −5’ | ||||
| P2 | Forward: 5′- GGTGATTGGCGACTACGTGA − 3’ | 625 | ||
| Reverse: 3′- CTCAGTCGGTTAGGTGGCTG − 5’ | ||||
| P3 | Forward: 5′- GCGCGTGAAGAACAACTACC − 3’ | 217 | ||
| Reverse: 3′- GTGTTCAGTTGCGTCATCGG − 5’ | ||||
| P4 | Forward: 5′- AAGCGGCAACCTCTACATCG −3’ | 341 | ||
| Reverse: 3′- CGTGAGGTAGGCAGTAGACG −5’ | ||||
| β-CA | P5 | Forward: 5′- TGATAATGCCGATGGTCGTG −3’ | 1023 | |
| Reverse: 3′- AGTAGCCATGGCCTTGCGAT −5’ | ||||
| P6 | Forward: 5′- TGGATTTTCCGGCACCGTTA −3’ | 441 | ||
| Reverse: 3′- CGGGTCTTCCTTGCTGATGT −5’ | ||||
| P7 | Forward: 5′- ACATCAGCAAGGAAGACCCG −3’ | 391 | ||
| Reverse: 3′- CACAATACGTCAAGGCGCTG −5’ | ||||
| P8 | Forward: 5′- GCTGCACATCCGTGATCTCT −3’ | 191 | ||
| Reverse: 3′- GGATCCCATACACCCAACCG −5’ | ||||