Literature DB >> 27672674

Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboreum.

Wei Wang¹, Minxuan Xia¹, Jie Chen¹, Fenni Deng¹, Rui Yuan¹, Xiaopei Zhang¹, Fafu Shen¹.

Abstract

The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].

Entities: Species

Keywords: Cotton; Phylogenetic tree; RAMPAGE Ramachandran plot analysis; SOD

Year: 2016 PMID： 27672674 PMCID： PMC5030311 DOI： 10.1016/j.dib.2016.05.025

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table Value of the data Data on phylogenies separately estimated using Bayesian method of MrBayes enable researchers to examine how the topologies differ from each other. Data on phylogenies of Gossypium SOD proteins enable researchers to infer the possible ranges of time frames in the divergence events of Gossypium SOD genes and its molecular evolution in general. Data on RAMPAGE Ramachandran plot analysis of Gossypium SOD proteins enable researchers to evaluate the accuracy of the predicted models.

Data

The phylogenetic tree obtained using Maximum-Likelihood (ML) method of PhyML (version 20120412) [5] and the 3D structure of SODs generated by using SWISS-MODEL server (http://swissmodel.expasy.org/) [3] and using the online COACH server (http://zhanglab.ccmb.med.umich.edu/COACH/) [6]. were presented in [1]. The data shown here represent the showing dichotomy with two different clusters of SODs (I: Cu/Zn; II: Mn/Fe-SODs) inferred by the Bayesian method of MrBayes (version 3.2.4) [2] and Cu/Zn-SOD cluster had three subgroups (Ia–Ic), whereas the Mn/Fe-SOD cluster had two subgroups (IId and IIe) (Figure. 1). We analysed the accuracy of the predicted models evaluated by Ramachandran plot using the RAMPAGE server (http://mordred.bioc.cam.ac.uk/~rapper/rampage.php) [3]. The refined SOD models showed good proportions of residues in favoured, allowed and outlier regions (Appendix A, Appendix A). In-depth analyses of the data is presented in the associated research article [1].

Experimental design, materials and methods

Information access

The latest versions of the G. raimondii (V1.0) and G. arboreum (V2.0) genomes and annotation files were downloaded from CottonGen (https://www.cottongen.org/data/genome). The latest version of the Arabidopsis (TAIR10) genome and annotation files were downloaded from the Joint Genome Institute (JGI) (http://www.phytozome.net).

Data filtering

We then filtered gene annotation results based on the following criteria [7]: (1) the longest transcript in each gene loci was chosen to represent that locus; (2) coding sequences (CDS) with length <150 base pair bp were filtered out; (3) CDS with the percentage of ambiguous nucleotides (‘N’) >50% were filtered out; (4) CDS with internal termination codon were filtered out; and (5) the CDS with hits(Basic Local Alignment Search Tool (BLAST) identity ≥80%) to RepBase sequences (http://www.girinst.org/repbase/index.html) were filtered out.

Identification of SOD protein

To identify members of the SOD protein in G. raimondii and G. arboreum, we retrieved SOD protein sequences from the NCBI protein database (http://www.ncbi.nlm.nih.gov/protein/). These protein sequences from six species, including Arabidopsis (accession nos. NP_172360.1, NP_565666.1, NP_197311.1, NP_199923.1, NP_197722.1 and NP_187703.1), Theobroma cacao (XP_007030135.1 and XP_007038205.1), G. hirsutum (ABA00453.1, ACC93639.1, ABA00454.1, ABA00456.1 and ABA00455.1), Po. trichocarpa (XP_002319589.1 and XP_002325843.1), Z. mays (NP_001105704.1, BAI50563.1, ACG41865.1, ACG32380.1 and NP_001105742.1) and O. sativa (AAA33917.1, BAD09607.1, BAA37131.1 and NP_001055195.1), were used as query sequences to perform multiple database searches using BLAST for Proteins (BLASTP) [8]. After removing alignments with identity <50%, the resultant candidate SOD proteins were aligned to each other to ensure that no gene was represented multiple times. InterProScan (version 4.8) [9]was further used to confirm the inclusion of the SOD domain in each candidate sequence using the Pfam database. Furthermore, we gathered the SOD protein sequences, the template accession and motif sequences.

Construct phylogenetic trees

Phylogenetic trees were constructed using the Bayesian analysis method. Bayesian trees were constructed using MrBayes (version 3.2.4) [2] with GTR+I+gamma substitution model. The Markov chain Monte Carlo process performed 5,000,000 iterations with sampling every 500 iterations resulting in 10,000 samples and a burn-in of 25% samples. Other parameters were the default settings.

Structural evaluation and stereochemical analysis

Structural evaluation and stereochemical analyses were assessed using RAMPAGE Ramachandran plot analysis (http://mordred.bioc.cam.ac.uk/~rapper/rampage.php) [10].

Subject area	Biology
More specific subject area	Genetics and Molecular Biology
Type of data	Figure
How data was acquired	Database analysis
Data format	Analyzed
Experimental factors	Amino acid sequences were retrieved from NCBI, TAIR10, Joint Genome Institute (JGI) and/or CottonGen.
Experimental features	Sequences were aligned using BLAST for Proteins (BLASTP), Structural evaluation and stereochemical analyses were assessed using RAMPAGE Ramachandran plot analysis
Data accessibility	With this article

9 in total

1. Structure validation by Calpha geometry: phi,psi and Cbeta deviation.

Authors: Simon C Lovell; Ian W Davis; W Bryan Arendall; Paul I W de Bakker; J Michael Word; Michael G Prisant; Jane S Richardson; David C Richardson
Journal: Proteins Date: 2003-02-15

2. MrBayes 3: Bayesian phylogenetic inference under mixed models.

Authors: Fredrik Ronquist; John P Huelsenbeck
Journal: Bioinformatics Date: 2003-08-12 Impact factor: 6.937

3. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.

Authors: Stéphane Guindon; Jean-François Dufayard; Vincent Lefort; Maria Anisimova; Wim Hordijk; Olivier Gascuel
Journal: Syst Biol Date: 2010-03-29 Impact factor: 15.683

4. Genomic insights into salt adaptation in a desert poplar.

Authors: Tao Ma; Junyi Wang; Gongke Zhou; Zhen Yue; Quanjun Hu; Yan Chen; Bingbing Liu; Qiang Qiu; Zhuo Wang; Jian Zhang; Kun Wang; Dechun Jiang; Caiyun Gou; Lili Yu; Dongliang Zhan; Ran Zhou; Wenchun Luo; Hui Ma; Yongzhi Yang; Shengkai Pan; Dongming Fang; Yadan Luo; Xia Wang; Gaini Wang; Juan Wang; Qian Wang; Xu Lu; Zhe Chen; Jinchao Liu; Yao Lu; Ye Yin; Huanming Yang; Richard J Abbott; Yuxia Wu; Dongshi Wan; Jia Li; Tongming Yin; Martin Lascoux; Stephen P Difazio; Gerald A Tuskan; Jun Wang; Jianquan Liu; Liu Jianquan
Journal: Nat Commun Date: 2013 Impact factor: 14.919

5. BLAST+: architecture and applications.

Authors: Christiam Camacho; George Coulouris; Vahram Avagyan; Ning Ma; Jason Papadopoulos; Kevin Bealer; Thomas L Madden
Journal: BMC Bioinformatics Date: 2009-12-15 Impact factor: 3.169

6. InterProScan: protein domains identifier.

Authors: E Quevillon; V Silventoinen; S Pillai; N Harte; N Mulder; R Apweiler; R Lopez
Journal: Nucleic Acids Res Date: 2005-07-01 Impact factor: 16.971

7. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information.

Authors: Marco Biasini; Stefan Bienert; Andrew Waterhouse; Konstantin Arnold; Gabriel Studer; Tobias Schmidt; Florian Kiefer; Tiziano Gallo Cassarino; Martino Bertoni; Lorenza Bordoli; Torsten Schwede
Journal: Nucleic Acids Res Date: 2014-04-29 Impact factor: 16.971

8. BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions.

Authors: Jianyi Yang; Ambrish Roy; Yang Zhang
Journal: Nucleic Acids Res Date: 2012-10-18 Impact factor: 16.971

9. Pfam: the protein families database.

Authors: Robert D Finn; Alex Bateman; Jody Clements; Penelope Coggill; Ruth Y Eberhardt; Sean R Eddy; Andreas Heger; Kirstie Hetherington; Liisa Holm; Jaina Mistry; Erik L L Sonnhammer; John Tate; Marco Punta
Journal: Nucleic Acids Res Date: 2013-11-27 Impact factor: 16.971

9 in total

16 in total

1. Correlation of MLH1 polymorphisms, survival statistics, in silico assessment and gene downregulation with clinical outcomes among breast cancer cases.

Authors: Saima Shakil Malik; Ayisha Zia; Sumaira Mubarik; Nosheen Masood; Sajid Rashid; Alice Sherrard; Muhammad Bilal Khan; Muhammad Tahir Khadim
Journal: Mol Biol Rep Date: 2019-11-08 Impact factor: 2.316

2. Structure and function of seed storage proteins in faba bean (Vicia faba L.).

Authors: Yujiao Liu; Xuexia Wu; Wanwei Hou; Ping Li; Weichao Sha; Yingying Tian
Journal: 3 Biotech Date: 2017-04-27 Impact factor: 2.406

3. The role of eugenol and ferulic acid as the competitive inhibitors of transcriptional regulator RhlR in P. aeruginosa.

Authors: Esmeralda Escobar-Muciño
Journal: MethodsX Date: 2022-06-23

4. Identification and Characterization of miRNA Transcriptome in Asiatic Cotton (Gossypium arboreum) Using High Throughput Sequencing.

Authors: Muhammad Farooq; Shahid Mansoor; Hui Guo; Imran Amin; Peng W Chee; M Kamran Azim; Andrew H Paterson
Journal: Front Plant Sci Date: 2017-06-15 Impact factor: 5.753

5. In silico analysis of the V66M variant of human BDNF in psychiatric disorders: An approach to precision medicine.

Authors: Clara Carolina Silva De Oliveira; Gabriel Rodrigues Coutinho Pereira; Jamile Yvis Santos De Alcantara; Deborah Antunes; Ernesto Raul Caffarena; Joelma Freire De Mesquita
Journal: PLoS One Date: 2019-04-18 Impact factor: 3.240

6. Structure-Guided Approach to Identify Potential Inhibitors of Large Envelope Protein to Prevent Hepatitis B Virus Infection.

Authors: Mahboubeh Mehmankhah; Ruchika Bhat; Mohammad Sabery Anvar; Shahnawaz Ali; Aftab Alam; Anam Farooqui; Fatima Amir; Ayesha Anwer; Saniya Khan; Iqbal Azmi; Rafat Ali; Romana Ishrat; Md Imtaiyaz Hassan; Zarrin Minuchehr; Syed Naqui Kazim
Journal: Oxid Med Cell Longev Date: 2019-09-04 Impact factor: 6.543

7. Genome-wide identification and transcriptional expression analysis of superoxide dismutase (SOD) family in wheat (Triticum aestivum).

Authors: Wenqiang Jiang; Lei Yang; Yiqin He; Haotian Zhang; Wei Li; Huaigu Chen; Dongfang Ma; Junliang Yin
Journal: PeerJ Date: 2019-11-19 Impact factor: 2.984

8. Comparative analyses and structural insights of new class glutathione transferases in Cryptosporidium species.

Authors: Mbalenhle Sizamile Mfeka; José Martínez-Oyanedel; Wanping Chen; Ikechukwu Achilonu; Khajamohiddin Syed; Thandeka Khoza
Journal: Sci Rep Date: 2020-11-23 Impact factor: 4.379

9. Comparative Genomics Provide Insights Into Function and Evolution of Odorant Binding Proteins in Cydia pomonella.

Authors: Cong Huang; Xue Zhang; Dongfeng He; Qiang Wu; Rui Tang; Longsheng Xing; Wanxue Liu; Wenkai Wang; Bo Liu; Yu Xi; Nianwan Yang; Fanghao Wan; Wanqiang Qian
Journal: Front Physiol Date: 2021-07-07 Impact factor: 4.566

10. Structural and functional annotation of hypothetical proteins of human adenovirus: prioritizing the novel drug targets.

Authors: Muhammad Naveed; Sana Tehreem; Muhammad Usman; Zoma Chaudhry; Ghulam Abbas
Journal: BMC Res Notes Date: 2017-12-06