Literature DB >> 27284568

Data on the evolutionary history of the V(D)J recombination-activating protein 1 - RAG1 coupled with sequence and variant analyses.

Abhishek Kumar1, Anita Bhandari2, Sandeep J Sarde3, Sekhar Muppavarapu4, Ravi Tandon5.   

Abstract

RAG1 protein is one of the key component of RAG complex regulating the V(D)J recombination. There are only few studies for RAG1 concerning evolutionary history, detailed sequence and mutational hotspots. Herein, we present out datasets used for the recent comprehensive study of RAG1 based on sequence, phylogenetic and genetic variant analyses (Kumar et al., 2015) [1]. Protein sequence alignment helped in characterizing the conserved domains and regions of RAG1. It also aided in unraveling ancestral RAG1 in the sea urchin. Human genetic variant analyses revealed 751 mutational hotspots, located both in the coding and the non-coding regions. For further analysis and discussion, see (Kumar et al., 2015) [1].

Entities:  

Year:  2016        PMID: 27284568      PMCID: PMC4887553          DOI: 10.1016/j.dib.2016.05.021

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data Protein sequence analysis data reveal that SpRAG1L possesses only 19–20% identities with vertebrate RAG1, which helped us in deriving an ancestral RAG1 protein in sea urchin. This approach can be used the detection of origins for different proteins. Protein sequence alignment locates two major domains and several regions of RAG1, which suggested that these fragments were conserved from sea urchin to human. This hints evolutionary conservation of protein domains in the protein of interest and their ancestors. Data on the genetic variant analysis suggests that human RAG1 gene has 751 variants. Furthermore, there are 267 missense variants of human RAG1 causes change in amino acids including 140 deleterious mutations. These variant data serve as the mutational hotspots within the coding region of human RAG1. Assessment of mutational hotspot for any protein is critically important for understanding its function and roles in diseases. Additionally, 284 non-coding variants were identified with 94% regulatory in nature, which are often called as regulatory SNP (rSNP). These data are source of regulatory implications flanking any given gene.

Data

Table 1 lists all RAG1 sequences used in Kumar et al. [1] and these sequences are used for constructing protein sequence alignment of RAG1 (Fig. S1). This protein alignment is the basis for the Figs. 2 and 3 and Table 1–5 of Kumar et al. [1]. Details of human RAG1 variants are summarized in the Table S1 and regulatory SNPs in the Table S2. These two supplementary tables are primary data for variant analyses described in Fig. 4 and Tables 2–5 of Kumar et al. [1].
Table 1

Summary of RAG1 from selected animal genomes. This data is collected from Ensembl database release 77 . At times data is gathered from additional databases as indicated.

NameOrganismSpeciesAccession idChromosomal localization
HsapRAG1HumanHomo sapiensENSG00000166349Chromosome 11: 36,532,259-36,614,706
MmusRAG1MouseMus musculusENSMUSG00000061311Chromosome 2: 101,638,282-101,649,501
RnorRAG1RatRattus norvegicusENSRNOG00000004630Chromosome 3: 97,866,048-97,877,145
TgutRAG1ZebrafinchTaeniopygia guttataENSTGUG00000010147Chromosome 5: 17,596,747-17,599,869
MgalRAG1TurkeyMeleagris gallopavoENSMGAG00000015794Chromosome 5: 19,778,620-19,781,748
PsinRAG1TurtlePelodiscus sinensisENSPSIG00000001811Scaffold JH209124.1: 1,890,899-1,894,018
DrerRAG1ZebrafishDanio rerioENSDARG00000052122Chromosome 25: 9,231,637-9,238,142
TrubRAG1FuguTakifugu rubripesENSTRUG00000001340scaffold_302: 189,544-193,510
TnigRAG1TetraodonTetraodon nigroviridisENSTNIG00000012168Chromosome 13: 5,598,243-5,602,176
OnilRAG1TilapiaOreochromis niloticusENSONIG00000014593Scaffold GL831142.1: 1,924,501-1,931,477
GmorRAG1aCodGadus morhuaENSGMOG00000003395GeneScaffold_2196: 249,630-253,939
XmacRAG1PlatyfishXiphophorus maculatusENSXMAG00000000820Scaffold JH556735.1: 897,221-901,222
GacuRAG1SticklebackGasterosteus aculeatusENSGACG00000011465groupXIX: 14,493,756-14,497,787
OlatRAG1MedakaOryzias latipesENSORLG00000011969Chromosome 6: 17,343,305-17,347,405
LchaRAG1CoelacanthLatimeria chalumnaeENSLACG00000004406Scaffold JH126568.1: 121,275-124,451
VpaRAG1AlpacaVicugna pacosENSVPAG00000008826GeneScaffold_2429: 269,595-273,365
AcarRAG1Anole lizardAnolis carolinensisENSACAG00000005106Chromosome 1: 53,518,235-53,521,375
DnoRAG1ArmadilloDasypus novemcinctusENSDNOG00000006294Scaffold JH582431.1: 4,276,543-4,279,674
OgarRAG1BushbabyOtolemur garnettiiENSOGAG00000027339Scaffold GL873520.1: 63,167,240-63,168,052
FcatRAG1CatFelis catusENSFCAG00000002908Chromosome D1: 92,125,946-92,129,077
AmeRAG1Cave fishAstyanax mexicanusENSAMXG00000017587Scaffold KB871579.1: 5,211,103-5,217,994
PtroRAG1ChimpanzeePan troglodytesENSPTRG00000003512Chromosome 11: 36,559,562-36,571,320
BtauRAG1CowBos taurusENSBTAG00000040293Chromosome 15: 67,827,233-67,830,364
CfamRAG1DogCanis lupus familiarisENSCAFG00000006808Chromosome 18: 31,631,533-31,634,664
TtruRAG1DolphinTursiops truncatusENSTTRG00000014075scaffold_110171: 196,070-199,540
AplaRAG1DuckAnas platyrhynchosENSAPLG00000011756Scaffold KB742537.1: 887,774-890,899
LafrRAG1ElephantLoxodonta africanaENSLAFG00000023175SuperContig scaffold_21: 12,902,400-12,905,531
MfurRAG1FerretMustela putorius furoENSMPUG00000019963Scaffold GL896949.1: 10,184,818-10,187,949
FalbRAG1FlycatcherFicedula albicollisENSFALG00000014372Scaffold JH603235.1: 3,494,497-3,497,619
NleuRAG1GibbonNomascus leucogenysENSNLEG00000017951SuperContig GL397264.1: 51,275,048-51,286,754
GgorRAG1GorillaGorilla gorilla gorillaENSGGOG00000013132Chromosome 11: 37,187,229-37,198,984
CporRAG1Guinea PigCavia porcellusENSCPOG00000004516scaffold_92: 2,485,274-2,488,405
EcabRAG1HorseEquus caballusENSECAG00000021936Chromosome 12: 3,025,356-3,033,251
PcapRAG1HyraxProcavia capensisENSPCAG00000001732GeneScaffold_5497: 13,553-16,990
MmulRAG1MacaqueMacaca mulattaENSMMUG00000018267Scaffold 1099214286323: 4,563-7,694
CjacRAG1MarmosetCallithrix jacchusENSCJAG00000011082Chromosome 11: 99,857,897-99,869,593
MlucRAG1MicrobatMyotis lucifugusENSMLUG00000000544Scaffold GL430055: 356,167-359,298
MmurRAG1Mouse LemurMicrocebus murinusENSMICG00000008611GeneScaffold_3288: 841,983-845,201
MdomRAG1OppossumMonodelphis domesticaENSMODG00000024470Chromosome 5: 272,756,599-272,759,742
AmelRAG1PandaAiluropoda melanoleucaENSAMEG00000019378Scaffold GL193442.1: 461,741-464,872
SscrRAG1PigSus scrofaENSSSCG00000026145Chromosome 2: 26,730,010-26,738,216
OanaRAG1PlatypusOrnithorhynchus anatinusENSOANG00000011770Chromosome 3: 11,364,602-11,365,783
OcunRAG1RabbitOryctolagus cuniculusENSOCUG00000006989Chromosome 1: 175,828,096-175,831,224
OarRAG1SheepOvis ariesENSOARG00000010441Chromosome 15: 65,210,839-65,213,970
SaraRAG1ShrewSorex araneusENSSARG00000010950GeneScaffold_5915: 66,723-69,956
LocRAG1Spotted garLepisosteus oculatusENSLOCG00000001283Chromosome LG27: 1,403,519-1,420,074
ItriRAG1SquirrelIctidomys tridecemlineatusENSSTOG00000025584Scaffold JH393343.1: 1,817,576-1,820,707
TsyrRAG1TarsierTarsius syrichtaENSTSYG00000007158scaffold_7240: 21,771-24,902
SharRAG1Tasmanian devilSarcophilus harrisiiENSSHAG00000014085Scaffold GL864890.1: 1,391,366-1,394,509
TbeRAG1Tree ShrewTupaia belangeriENSTBEG00000003010GeneScaffold_4067: 865-4,810
MeuRAG1WallabyMacropus eugeniiENSMEUG00000003165Scaffold77145: 3,005-5,812
XtroRAG1XenopusXenopus tropicalisENSXETP00000016443/XP_002937338aScaffold GL172917.1: 903,952-910,208
SpuRAG1LSea urchinStrongylocentrotus purpuratusAAZ23546.1aNA

From NCBI.

Experimental design, materials and methods

Using the BLAST homology detection tool [2], we extracted RAG1 gene from vertebrate genomes listed either in Ensembl release 77 [3] or NCBI. To ensure accuracy of gene structures, we combined the gene predictions of the Ensembl [3] and AUGUSTUS tool [4]. We used human RAG1 as the standard sequence for intron position mapping and numbering of intron positions, followed by suffixes a–c for their location as reported previously [5]. We aligned selected RAG1 protein sequences using MUSCLE tool [6] with and we manually adjusted alignment with GENEDOC tool [7]. We reconstructed a phylogenetic tree with maximum likelihood method, based on the JTT matrix-based model [8] with 1000 bootstrap replicates. We imported all consensus trees to MEGA 6 software [9], where we edited and visualized these trees as per requirement. To detect the orthologs of RAG1 gene, we analyzed micro-synteny across different genomes using two genome browsers namely, NCBI map viewer [10] and ENSEMBL genome browser [11], [12]. Furthermore, we generated human RAG1 variants from 1092 human genomes from 14 different populations available in 1000 genomes project [13]. We analyzed the impact assessments of missense variants on the human RAG1 protein using SIFT [14] and PolyPhen V2 [15] tools, as described previously [16], [17], [18], [19]. We detected regulatory nature of non-coding variants using the rSNPbase (this database provides reliable, and comprehensive regulatory annotations [20] and such variants are called regulatory SNP or rSNP).
Subject areaBiology
More specific subject areaMolecular evolution and bioinformatics
Type of dataTables, figures
How data was acquiredRetrieved from public databases
Data formatAnalyzed data
Experimental factorsRAG1 sequences were retrieved from ENSEMBL and/or NCBI database.
Experimental featuresRAG1 protein alignment using Muscle tool and edited in the GeneDoc
RAG1 Variants were analyzed with SIFT, Polyphen & rSNPbase
Data source locationGermany
Data accessibilityData is with this article
  18 in total

1.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-03-19       Impact factor: 16.971

2.  The rapid generation of mutation data matrices from protein sequences.

Authors:  D T Jones; W R Taylor; J M Thornton
Journal:  Comput Appl Biosci       Date:  1992-06

3.  MEGA6: Molecular Evolutionary Genetics Analysis version 6.0.

Authors:  Koichiro Tamura; Glen Stecher; Daniel Peterson; Alan Filipski; Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2013-10-16       Impact factor: 16.240

4.  Sequence, phylogenetic and variant analyses of antithrombin III.

Authors:  Abhishek Kumar; Anita Bhandari; Sandeep J Sarde; Chandan Goswami
Journal:  Biochem Biophys Res Commun       Date:  2013-10-09       Impact factor: 3.575

5.  A method and server for predicting damaging missense mutations.

Authors:  Ivan A Adzhubei; Steffen Schmidt; Leonid Peshkin; Vasily E Ramensky; Anna Gerasimova; Peer Bork; Alexey S Kondrashov; Shamil R Sunyaev
Journal:  Nat Methods       Date:  2010-04       Impact factor: 28.547

6.  AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints.

Authors:  Mario Stanke; Burkhard Morgenstern
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

7.  Ensembl's 10th year.

Authors:  Paul Flicek; Bronwen L Aken; Benoit Ballester; Kathryn Beal; Eugene Bragin; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Julio Fernandez-Banet; Leo Gordon; Stefan Gräf; Syed Haider; Martin Hammond; Kerstin Howe; Andrew Jenkinson; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Gautier Koscielny; Eugene Kulesha; Daniel Lawson; Ian Longden; Tim Massingham; William McLaren; Karine Megy; Bert Overduin; Bethan Pritchard; Daniel Rios; Magali Ruffier; Michael Schuster; Guy Slater; Damian Smedley; Giulietta Spudich; Y Amy Tang; Stephen Trevanion; Albert Vilella; Jan Vogel; Simon White; Steven P Wilder; Amonida Zadissa; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; James Smith; Stephen M J Searle
Journal:  Nucleic Acids Res       Date:  2009-11-11       Impact factor: 16.971

8.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

9.  Ancestry and evolution of a secretory pathway serpin.

Authors:  Abhishek Kumar; Hermann Ragg
Journal:  BMC Evol Biol       Date:  2008-09-15       Impact factor: 3.260

10.  rSNPBase: a database for curated regulatory SNPs.

Authors:  Liyuan Guo; Yang Du; Suhua Chang; Kunlin Zhang; Jing Wang
Journal:  Nucleic Acids Res       Date:  2013-11-26       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.