Literature DB >> 31221090

Evolutionary history of the human multigene families reveals widespread gene duplications throughout the history of animals.

Nashaiman Pervaiz1, Nazia Shakeel1, Ayesha Qasim1, Rabail Zehra1, Saneela Anwar1, Neenish Rana1, Yongbiao Xue2, Zhang Zhang2, Yiming Bao3, Amir Ali Abbasi4.   

Abstract

BACKGROUND: The hypothesis that vertebrates have experienced two ancient, whole genome duplications (WGDs) is of central interest to evolutionary biology and has been implicated in evolution of developmental complexity. Three-way and Four-way paralogy regions in human and other vertebrate genomes are considered as vital evidence to support this hypothesis. Alternatively, it has been proposed that such paralogy regions are created by small-scale duplications that occurred at different intervals over the evolution of life.
RESULTS: To address this debate, the present study investigates the evolutionary history of multigene families with at least three-fold representation on human chromosomes 1, 2, 8 and 20. Phylogenetic analysis and the tree topology comparisons classified the members of 36 multigene families into four distinct co-duplicated groups. Gene families falling within the same co-duplicated group might have duplicated together, whereas genes belong to different co-duplicated groups might have distinct evolutionary origins.
CONCLUSION: Taken together with previous investigations, the current study yielded no proof in favor of WGDs hypothesis. Rather, it appears that the vertebrate genome evolved as a result of small-scale duplication events, that cover the entire span of the animals' history.

Entities:  

Keywords:  Human; Multigene families; Paralogons; Paralogy regions; Phylogenetic analysis; Segmental duplications; Vertebrate; Whole genome duplications

Mesh:

Year:  2019        PMID: 31221090      PMCID: PMC6585022          DOI: 10.1186/s12862-019-1441-0

Source DB:  PubMed          Journal:  BMC Evol Biol        ISSN: 1471-2148            Impact factor:   3.260


Background

To elucidate the genetic underpinnings of major changes in organismal make up and the origination of ample new traits during the evolutionary history of vertebrates, Susumu Ohno in the year 1970 put forward the hypothesis that two rounds of whole genome duplications (WGDs) occurred early in vertebrate evolution. This hypothesis is popularly termed as “2R hypothesis” (two rounds of WGDs) and is believed to be the most rational explanation for the complexity of modern-day vertebrate genome [1]. The 2R has been under immense scrutiny over the past couple of decades [2-9]. The occurrence of intra-genomic conserved syntenic blocks (paralogy groups/paralogons) in vertebrate genomes is presented as the most credible proof furthering the ancient WGDs [10, 11]. Markedly, the presence of four potential quadruplicated regions on Homo sapiens autosomes (Hsa) 1/6/9/19 (MHC bearing paralogon), Hsa 4/5/8/10 (FGFR bearing chromosomes), Hsa 1/2/8/20 and Hsa 2/7/12/17 (HOX-cluster bearing chromosomes), is considered as an outcome of two consecutive rounds of WGDs [12]. However, alternatively it is hypothesized that the excess of paralogy regions in the human and other vertebrate genomes is due to higher instance of local duplications, translocations and chromosomal restructuring that occurred extensively at different intervals during early vertebrate history, thus nullifying the Ohno’s postulation [13]. In order to evaluate the mechanisms behind the formation of vertebrate paralogy regions, our research group has continuously been putting efforts in assembling and dating the gene duplications that occurred during the animal’s evolutionary history [3, 4, 7, 14–17]. Previously, we investigated the evolutionary histories of 11 multigene families (40 human genes) with triplicated or quadruplicated presence on Hsa 1/2/8/20. The results achieved were in contrast with 2R hypothesis, suggesting that the paralogy fragments on human chromosomes 1, 2, 8 and 20 are an outcome of small-scale duplication events which scattered across the history of metazoans [3, 4, 14, 17, 18]. In this study, we furthered our efforts [14] to analyze the evolutionary history of 25 human multigene families with three or fourfold distribution on Hsa 1/2/8/20. A robust and detailed phylogenomic analysis was carried out by using the recently available well-annotated and high-quality genome sequence data from a wide range of metazoans [19-21]. The topology comparison approach was particularly applied on the phylogenetic data of total 36 families (25 present data and 11 previous data) to classify the genes that might have duplicated together early in vertebrate history [3, 14]. In addition, relative timing approach was employed to estimate the timings of gene duplication events. In sync with the previous results [14], it appeared that the triplicated or quadruplicated gene families residing on Hsa 1/2/8/20 have not arisen simultaneously through 2R. Rather, phylogenetic data clarifies that the tetra-paralogy blocks on the human genome have resulted from independent duplications, segmental duplications and genomic restructuring events that had occurred at broadly different time points during the course of animal evolution.

Results

For investigating the validity of whole genome duplications (WGDs) hypothesis, which strongly supports that fourfold paralogons in the human genome had been formed by polyploidization events, we undertook phylogenetic analyses for 25 gene families (see details in Methods). Each of these chosen subset of multigene families have at least threefold portrayal on one of the paralogy regions in human genome that comprises of segments from human chromosomes 1, 2, 8 and 20 (Fig. 1; Table 1). By employing currently available wide range of sequenced vertebrate and invertebrate genomes, orthologous sequence data was gathered. (Additional file 1). This wider set of taxonomic representation in the sequence data enabled us to perform a robust phylogenetic examination based on NJ and ML methods (Additional files 2, 3 and 4). Given the phylogenetic data, we next determine the co-duplication events by employing the topology comparison approach [3, 17, 22] (Fig. 2). The phylogenetic tree topology comparison approach takes into account uniformity among tree branching pattern of distinct but physically linked gene families as a proof of their joint origin, thus displaying co-duplicated groups [13, 23]. In contrast, the non-uniform tree topologies of physically linked distinct families suggest the incongruent duplication histories of concerned genes [16]. For this purpose, only those sections of 36 phylogenies were chosen for which there is a strong bootstrap support for at least two gene duplication events within the time frame that divided the teleosts and vertebrates from tetrapods and invertebrates respectively (proposed timing of WGDs) (Additional file 5: Table S1). Among them 11 families were published previously by our research group [14].
Fig. 1

Evolutionary history of human tetra-paralogon Hsa 1/2/8/20. A circular view of human chromosomes shows the paralogons detected among human chromosomes 1/2/8/20, including the synteny relationship among 36 distinct multigene families: 11 families from previously published data that are labeled in black [14], whereas the 25 families analyzed in the present study that are labeled in green. Blue lines connect positions on ideograms for gene families with 3-fold representation, while yellow lines connect families with four-fold representation on these chromosomes. Detailed information about each family is given in Table 1

Table 1

List of human gene families used in the phylogenetic analysis

Gene familyMembersChr locationHuman protein accession No.Number of included taxaNumber of sequences included
Antizyme InhibitorAZIN21p35.1Q96A702554
ODC12p25P11926
AZIN18q22.3O14977
Cholinergic Receptors Nicotinic subunitsCHRNB21q21.3P1778732123
CHRNG2q37.1P07510
CHRND2q37.1Q07001
CHRNA12q31.1P02708
CHRNA28p21Q15822
CHRNA68p11.21Q15825
CHRNB38p11.2Q05901
CHRNA420q13.33P43681
CHRNA315q24P32297
CHRNB415q24P30926
CHRNB117p13.1P11230
CHRNE17p13.2Q04844
CHRNA515q24P30532
Ciliary Rootlet Coiled-Coil ProteinCROCC1p36.13Q5TZA22842
CROCC22q37.3H7BZ55
CEP25020q11.22Q9BV73
Discs, large (Drosophila) Homolog-associated ProteinDLGAP31p35.3-p34.1O958862585
DLGAP118p11.31O14490
DLGAP514q22.3Q15398
DLGAP28p23Q9P1A6
DLGAP420q11.23Q9Y2H0
E2F Transcription FactorE2F21p36Q142093184
E2F62p25.1O75461
E2F58q21.2Q15329
E2F120q11.2Q01094
E2F36p22O00716
E2F416q22.1Q16254
Family with Sequence Similarity 110FAM110D1p36.11Q8TAY72556
FAM110C2p25.3Q1W6H9
FAM110B8q12.1Q8TC76
FAM110A20p13Q9BQ89
Grainyhead like Transcription factorGRHL31p36.11Q8TE852657
TFCP2L12q14Q9NZI6
GRHL12p25.1Q9NZI5
GRHL28q22.3Q6ISB3
TFCP212q13Q12800
UBP13p22.3Q9NZI7
Inhibitor of DNA Binding proteinID31p36.13-p36.12Q025353565
ID22p25Q02363
ID120q11P41134
ID46p22.3P47928
Maestro Heat-like Repeat-containing Protein FamilyMROH91q24.3Q5TGP62246
MROH71p32.3Q68CQ1
MROH68q24.3A6NGR9
MROH58q24.3Q6ZUA9
MROH820q11.22Q9H579
Myelin Transcription FactorMYT1L2p25.3Q9UL682248
ST188q11.23O60284
MYT120q13.33Q01538
Nuclear Receptor CoactivatorNCOA12p23Q157882254
NCOA28q13.3Q15596
NCOA320q12Q9Y6Q9
Na+/K+ Transporting ATPase Interacting ProteinNKAIN11p35.2Q4KMZ82446
NKAIN38q12.3Q8N8D7
NKAIN420q13.33Q8IVV8
NKAIN26q21Q5VXU1
Potassium Voltage-Gated Channel subfamily QKCNQ41p34P566962867
KCNQ38q24O43525
KCNQ220q13.3O43526
KCNQ56q14Q9NR82
KCNQ111p15.5P51787
Regulator of G-protein SignallingRGS131q31.2O1492131101
RGS81q25P57771
RGS11q31Q08116
RGS181q31.2Q9NS28
RGS161q25-q31O15492
RGS211q31.2Q2M5E4
RGS41q23.3P49798
RGS21q31P41220
RGS208q11.23O76081
RGS1920q13.33P49795
RGS176q25.3Q9UGC6
RGS39q32P49796
RGS51q23.1O15539
Regulating Synaptic Membrane Exocytosis ProteinRIMS31p34.2Q9UJD02749
RIMS28q22.3Q9UQ26
RIMS420q13.12Q9H426
RIMS16q12-q13Q86UR5
R-Spondin HomologRSPO11p34.3Q2MKA73160
RSPO28q23.1Q6UXX9
RSPO420p13Q2I0M5
RSPO36q22.33Q9BXY4
Solute Carrier FamilySLC30A21p35.3Q9BRI32374
SLC30A101q41Q6XR72
SLC30A11q32.3Q9Y6M5
SLC30A32p23.3Q99726
SLC30A88q24.11Q8IWU4
SLC30A415q21.1O14863
Syntrophin, GammaSNTG22p25.3Q9NY992881
SNTG18q11.21Q9NSN8
SNTB18q23-q24Q13884
SNTA120q11.2Q13424
SNTB216q22.1P49711
GOPC6q21Q9HD26
Sorting Nexin FamilySNX271q21.3Q96L922943
SNX172p23.3Q15036
SNX318q22.3Q8N9S9
StathminSTMN11p36.11P169492263
STMN28q21.13Q93045
STMN48p21.2Q9H169
STMN320q13.3Q9NZ72
Serine/Threonine-Protein KinaseSTK252q37.3O005062572
STK38q22.2Q13188
STK420q11.2-q13.2Q13043
STK2413q31.2-q32.3Q9Y6E0
STK26Xq26.2Q9P289
Transcription Elongation factor A (SII) ProteinTCEA31p36.12O757642251
TCEA18q11.2P23193
TCEA220q13.33Q15560
TCEANCXp22.2Q8N8B7
UBX Domain-Containing ProteinUBXN2A2p23.3P685432232
UBXN2B8q12.1Q14CS0
NSFL1C20p13Q9UNZ2
X Kell Blood Group Precursor-related FamilyXKR81p35.3Q9H6D324101
XKR98q13.3Q5GH70
XKR68p23.1Q5GH73
XKR48q12.1Q5GH76
XKR58p23.1Q6UX68
XKR720q11.21Q5GH72
YTH Domain-Containing Family ProteinYTHDF21p35Q9Y5A92450
YTHDF38q12.3Q7Z739
YTHDF120q13.33Q9BYJ9
Fig. 2

The human genes duplicated in parallel lie in respective co-duplicated groups. Consistencies in phylogenetic tree topologies of families (analyzed in this and our previous study) with at least threefold representation on human tetra-paralogon Hsa1/2/8/20 (a) Schematic topology of MROH and STK families; b schematic topology of E2F, EYA and STMN families; c schematic topology of HCK, DLGAP, NKAIN, KCNQ and MATN gene families; d schematic topology of FAM110, NCO, KCNS, YTHDF, XKR and MYT gene families. For each case, the percentage bootstrap values of internal branches are provided in parentheses except for gene families exhibiting slightly lower bootstrap values (≤50%).The connecting bars on the left portray the close physical associations of relevant genes. Asterisk symbol * designate the relevant chromosomes

Evolutionary history of human tetra-paralogon Hsa 1/2/8/20. A circular view of human chromosomes shows the paralogons detected among human chromosomes 1/2/8/20, including the synteny relationship among 36 distinct multigene families: 11 families from previously published data that are labeled in black [14], whereas the 25 families analyzed in the present study that are labeled in green. Blue lines connect positions on ideograms for gene families with 3-fold representation, while yellow lines connect families with four-fold representation on these chromosomes. Detailed information about each family is given in Table 1 List of human gene families used in the phylogenetic analysis The human genes duplicated in parallel lie in respective co-duplicated groups. Consistencies in phylogenetic tree topologies of families (analyzed in this and our previous study) with at least threefold representation on human tetra-paralogon Hsa1/2/8/20 (a) Schematic topology of MROH and STK families; b schematic topology of E2F, EYA and STMN families; c schematic topology of HCK, DLGAP, NKAIN, KCNQ and MATN gene families; d schematic topology of FAM110, NCO, KCNS, YTHDF, XKR and MYT gene families. For each case, the percentage bootstrap values of internal branches are provided in parentheses except for gene families exhibiting slightly lower bootstrap values (≤50%).The connecting bars on the left portray the close physical associations of relevant genes. Asterisk symbol * designate the relevant chromosomes MROH and STK gene family members has threefold representation on Hsa 1/2/8/20 paralogon and diversified by at least two vertebrate specific duplication events (Additional file 2). Assuming three independent gene translocation events in STK gene family, congruent but asymmetrical topologies of the type ((Hsa20/2 Hsa1/13) Hsa8/X) are recovered for these two gene families (Fig. 2a). This pattern indicates that the subset members of MROH and STK families might have duplicated in block through segmental duplication (SD) events. E2F family has fourfold representation, whereas EYA and STMN families has threefold representation on tetra-paralogon Hsa 1/2/8/20. Assuming two independent gene translocation events revealed congruent and asymmetrical topologies of the type (((Hsa1/6 Hsa8/6) Hsa20) for E2F, EYA and STMN families (Fig. 2b; Additional file 2). MATN family has fourfold presense, whereas HCK, DLGAP, NKAIN and KCNQ families has threefold portrayal on tetra-paralogy regions residing on Hsa 1/2/8/20. By assuming five gene translocation events, congruent and symmetrical topology of the type ((A, B) (C, D)) i.e. ((Hsa20-Hsa8/18) (Hsa1-Hsa8/6/2)) is recovered for HCK, DLGAP, NKAIN, KCNQ, and MATN families (Fig. 2c; Additional file 2). FAM110 family has fourfold depiction whereas NCOA, KCNS, YTHDF, XKR, and MYT families has threefold distribution on Hsa 1/2/8/20. Each of these five families experienced at least two vertebrate specific duplication events (Additional file 2). By assuming four independent gene translocation events, members of these five families constitute the fourth co-duplicated group with an asymmetrical tree topology of the type ((Hsa20-Hsa8/2) Hsa2/1/8) (Fig. 2d). Phylogenetic trees of eight gene families (CHRN, RGS, GRHL, RIMS, RSPO, ID, TCEA, and SNT) involve complex histories with majority of duplications occurred anciently prior to vertebrate–invertebrate split. CHRN family appear to have diversified by in total twelve duplications, six of them predate the vertebrate-invertebrate split (Additional file 2). RGS family tree indicates 10 duplication events, five of them occurred earlier than vertebrate-invertebrate split (Additional file 2). The tree topology pattern of GRHL indicates in total six duplications, two of them occured at least prior to protostome–deuterostome split (Additional file 2). The tree topology of RIMS family reveals three duplication events, one of them occurred earlier than Bilaterian–Nonbilaterian divergence (Additional file 2). RSPO arose by three independent gene duplication events, one of them happened prior to the divergence of echinoderms from vertebrates (Additional file 2). Vertebrate ID family tree revealed three independent gene duplication events, two of them occurred prior to hemichordates-vertebrates split (Additional file 2). Members of TCEA family arose by four duplications, three of them occurred earlier than vertebrate-cephalochordate split (Additional file 2). SNT paralogs experienced five duplications, four of them occurred prior to protostomes and deuterostomes split (Additional file 2). Phylogenetic tree topologies of five families (AZIN, CRO, SLC, SNX and UBXN) reveal no evidence for vertebrate specifc gene duplications. All of these families are diversified by duplications that predates the vertebrate-invertebrate split (Additional file 2). Estimation of gene duplication events with respect to relative timing of speciations provides a bird’s eye view to all the duplications that occurred in a particular time window [24]. Taken together the phylogenetic histories of 36 families (25 present data and 11 previously analyzed); in total 172 duplication events are recovered (Fig. 3). It appears that 52 of these duplication events occurred earlier than invertebrate-vertebrate- split, whereas 74 duplications are identified at the root of vertebrate history prior to tetrapod-teleost- divergence. Furthermore, 42 teleost fish specific and only 4 tetrapod specific duplication events are detected (Fig. 3).
Fig. 3

The relative timings of gene duplication events. For the 36 multigene families analyzed in this study, 52 gene duplications are detected before the invertebrate-vertebrate divide and 74 duplications are detected after invertebrate-vertebrate and before tetrapod-bony fish divergence. Only four tetrapod specific duplication events are detected. The numbers enclosed in the parentheses following gene family names represent the count of duplications experienced by family. Gene families are ordered alphabetically

The relative timings of gene duplication events. For the 36 multigene families analyzed in this study, 52 gene duplications are detected before the invertebrate-vertebrate divide and 74 duplications are detected after invertebrate-vertebrate and before tetrapod-bony fish divergence. Only four tetrapod specific duplication events are detected. The numbers enclosed in the parentheses following gene family names represent the count of duplications experienced by family. Gene families are ordered alphabetically

Discussion

Different post genomic methods like, genome wide pairwise comparisons and genome self comparisons have been robustly utilized in order to analyze the evolutionary basis for the origination of paralogy blocks in vertebrate genomes [11]. Evolutionary events in the recent vertebrate history has been successfully highlighted by these approaches, as the identity of recently duplicated intra-genomic and inter-genomic conserved syntenic segments and thus the patterns of evolution preceeding their origin are not vagued by evolutionary divergence, and genomic anomalies like chromosomal breakage and rearrangements [25]. For instance, complex pattern of segmental duplications (SDs) has been witnessed as a result of inter-genomic and intra-genomic comparisons in primates [26-29]. These large duplicated segments range in size from 300 kb to 1 Mb, position on at least two different genomic locations and possess more than 90% sequence identity [30]. Comparative data has implicated numerous roles to these SDs, such as creating new genes, expanding gene families and catalyzing large-scale hominoid specific chromosomal reorganization [31]. Conflictingly, carrying out inter-genomic and intra-genomic map comparisons have not proven useful in prediction of evolutionary processes that have arisen in early vertebrate history [32]. The reason lies in the fact that anciently duplicated genomic blocks have undergone events such as sequence variation, multiple chromosomal breakages, gene rearrangement events and modification of karyotype [32]. Phylogenetic investigation of multigene families is considered as the most reliable approach to estimate the existence of ancient intra-genomic synteny blocks or paralogons [16]. Evolutionary mechanisms behind the origin of anciently duplicated regions are captured more adequately by this approach: firstly, by estimating the relative timing of gene duplication events. This startegy can provide a bird’s eye view to all the duplications that happened in a specific time frame. For example, if the phylogenies designate that the bulk of the paralogy regions arose before the split of teleost-tetrapod and after the vertebrate-invertebrate- divergence, this advocates that large-scale gene duplications have occurred between these speciation events [24]. Secondly, the creation of paralogy regions can be scrutinized by combining the information from the global physical structuring of gene families comprising of paralogons with their phylogenetic tree topologies [13]. Distinct but physically linked multigene families (bearing human paralogons) showing coherence among the topologies would suggest that these families might have arisen jointly through segmental duplication events. This approach is elaborated and applied in previous studies [7, 16, 23]. In the earlier studies, various human tetra-paralogons, e.g. Hsa 4/5/8/10 (FGFR-paralogon), Hsa 2/7/12/17 (HOX-paralogon), and Hsa 1/6/9/19 (MHC-paralogon) have been examined to test the legitimacy of 2R hypothesis [4, 7, 14, 17, 23]. In this study, we assess the history of one of the most extensively cited paralogy region, which involves segments of human chromosomes 1, 2, 8 and 20 [14] (Additional files 2, 3 and 4). Taken together with our previous findings, this study estimated the history of 36 multigene families (25 present study and 11 from previous work) with at least threefold distribution on Hsa 1/2/8/20 [14] (Fig. 1; Table 1). In total, our data for this particular human paralogon involves 165 human genes and 2240 protein sequences (Additional file 1) [14]. The topology comparison approach is applied to test the WGD hypothesis (Fig. 2). Hence, the careful analysis resulted in the categorization of 36 phylogenies into four distinct co-duplicated groups, where the component gene families were expanded through duplications that could have happened within the time frame of invertebrate-vertebrate and bony fish-tetrapod- divergence (Additional file 5: Table S1). Distinct gene families within a co-duplicated group could have diversified concurrently by segmental duplications, whereas distinct co-duplicated groups might have been created through discrete duplication events [13]. The retrieval of large co-duplicated groups in this study shows that ancient segmental duplications (aSDs) and rearrangement events played an essential role in modeling the paralogy segments belonging to human chromosomes 1/2/8/20 (Fig. 2). Interestingly, compatible and symmetrical topologies of the type ((AB) (CD)) are gained for the HCK, DLGAP, NKAIN, KCNQ, and MATN gene families (co-duplicated group 3) (Fig. 2c). This pattern is usually measured as an outcome of WGD events [12]. However, here we affirm that sub-chromosomal duplications might be a more balanced clarification for such symmetrical topology trends [6, 7, 14]. For example, tandem duplications occurring in two rounds embracing several unrelated genes would result in a genomic segment with specific paralogous gene-quartets organized in a tandem pattern. Genomic breakage of such larger segments into smaller subsegments via chromosomal deterioration and restructuring could result in paralogy blocks seen in human and other vertebrate genomes [14].

Conclusion

The present study examined the vertebrate polyploidy proposal by scrutinizing the phylogenomic history of human tetra-paralogon Hsa1/2/8/20. Estimation of gene duplication number with respect to speciation and topology comparison approach revealed no evidence in favor of Ohno’s 2R model. Instead, taken together with previous results from HOX paralogon [16] (63 gene families), FGFR paralogon [4] (80 gene families) and MHC paralogon [23] (40 gene families), the present data (36 families from Hsa 1/2/8/20) suggests that vertebrate genome in its early history was shaped by small-scale events, such as duplication of independent genes, chromosomal segments and rearrangements.

Methods

Data collection

Gene families with triplicated or quadruplicated presence on Hsa 1/2/8/20 were recognized by scanning the maps of human genome sequence at Ensembl genome browser [33-35]. A total of 25 gene-families (in total 125 known protein-coding genes) were identified. Among these gene families, 3 families have quadruplicated representation while the 22 families have triplicated presence on Hsa 1/2/8/20 (Fig. 1; Table 1). The closest putative orthologs of human protein sequences in other animal species were acquired using BLASTP [36] in the Ensembl genome browser [33]. In attempts to obtain sequence data from those organisms still not available at Ensembl, a BLASTP search was carried out against the protein databases available at the National Center for Biotechnology Information [37] and the Joint Genome Institute [http://www.jgi.doe.gov/]. In total, 1605 amino acid sequences from 46 metazoan species were selected for phylogenomic investigation (Additional file 1). Further confirmation of the common ancestry of the putative orthologs was obtained by clustering homologous proteins within phylogenetic trees. The phylogenetic tree topology of each gene family was validated with the detailed comparison against a well established metazoan specie tree [38, 39]. Protein sequences whose placement within a tree was in disagreement with the conventional animal history were removed from the analysis. The list of sequences used in the analysis (from 46 species including 25 tetrapods, 5 teleost fish, and 16 invertebrates) is provided in Additional file 1. The species that were selected for analysis included Homo sapiens (Human), Mus musculus (Mouse), Pan troglodytes (Chimpanzee), Gorilla gorilla (Gorilla), Callithrix jacchus (Marmoset), Pongo abelii (Orangutan), Macaca mulatta (Macaque), Rattus norvegicus (Rat), Oryctolagus cuniculus (Rabbit), Taeniopygia guttata (Zebra finch), Gallus gallus (Chicken), Canis familiaris (Dog), Felis catus (Cat), Bos taurus (Cow), Loxodonta Africana (Elephant), Equus caballus (Horse), Myotis lucifugus (Microbat), Dasypus novemcinctus (Armadillo), Pteropus vampyrus (Megabat), Ornithorhynchus anatinus (Platypus), Monodelphis domestica (Opossum), Pelodiscus sinensis (Chinese softshell turtle), Anolis carolinensis (Lizard), Erinaceus europaeus (Hedgehog), Xenopus tropicalis (Frog), Danio rerio (Zebrafish), Takifug urubripes (Fugu), Tetraodon nigroviridis (Tetraodon), Gasterosteus aculeatus (Stickleback), Oryzias latipes (Medaka), Branchiostoma floridae (Amphioxus), Ciona intestinalis (Ascidian), Ciona savignyi (Ascidian), Saccoglossus kowalevskii, Ptychodera flava, Strongylocentrotu spurpuratus (Sea urchin), Caenorhabditis elegans (Nematode), Anopheles gambiae (Mosquito), Drosophila melanogaster (Fruit fly), Apis mellifera (Honey bee), Capitella teleta (Capitella), Octopus bimaculoides (Octopus), Hydra magnipapillata (Hydra) and Nematostella vectensis (Sea anemone), Trichoplax adhaerens (Trichoplax), and Amphimedon queenslandica (Sponge).

Alignment and phylogenetic analysis

Phylogenetic analysis for each gene family was performed using MEGA version 5 [40]. Multiple sequence alignment program CLUSTALW [41] was used to align the protein sequences. Alignment quality has much impact on accurate inference of phylogeny. Homologous protein sequences often evolve under different evolutionary pressure in some regions of protein in different species [42-44]. Furthermore, regional rate heterogeneity affect the whole alignment and ultimaley phylogenetic reconstructuction [44, 45]. Therefore, multiple sequence alignment of each gene family was trimmed to eliminate all of positions containing gaps and missing data. Only unambiguous portions of sequence alignments are used for phylogenetic analyses. Phylogenetic analyses were performed using Neighbor-Joining (NJ) approach [46-48]. The JTT (Jones-Taylor-Thornton) matrix-based method and uncorrected proportion (p) of amino acid differences were employed as amino acid substitution models. Results obtained with both the methods are given in Additional files 2 and 3.The authenticity of clustering patterns in resulting trees was evaluated by bootstrap method (1000 pseudo-replicates) [49], which produced the bootstrap probability values for each interior branch in the phylogenetic tree. Each of the phylogenetic tree reconstruction methods has its own limitation, therefore, to systematically check and validate NJ based trees, Maximum Likelihood (ML) based phylogenies are also constructed using Whelan and Goldman (WAG) model of amino acid replacement [50]. The phylogenetic trees with the highest log likelihood scores are selected as final trees. Initial tree(s) for ML were generated automatically by applying NJ and BioNJ methods to a matrix of pairwise distances calculated using JTT model, and then selecting a toplogy with superior loglikelihood value [47, 51]. Heuristic searches starting with the initial trees were conducted with Nearest Neighbor Interchange [NNI] [40]. The topological reliability of each ML tree was evaluated by bootstrap method on the basis of 1000 pseudoreplicates [49]. The ML based trees are provided in Additional file 4. The gene duplications relative to the divergence of major animal taxa were estimated by investigating the branching order of phylogenetic trees [4, 13, 18]. The phylogenetic topology of each family was compared with that of all other families to assess the consistencies in gene duplication events [16]. Gene families with consistent tree topologies are placed in respective co-duplicated groups [13]. Among the tree topologies of 25 gene families, the phylogenies of five families (MYT, NCOA, STMN, NKAIN and YTHDF) were rooted with invertebrate sequences, whereas CRO, ID, MROH, RSPO, FAM110, TCEA, RIMS, KCNQ and CHRN families were rooted with both invertebrate and vertebrate sequences. In case of UBXN and E2F families the vertebrate sequences served as outgroup. The phylogenies of SNX, RGS, GRHL, AZIN, DLGAP, STK, SLC, SNT, and XKR families contained two sub families, each of them served to root the other. Complete list of protein sequences used in this study (PDF 1724 kb) Neighbor Joining Trees of gene families (residing on human chromosomes 1/2/8/20) using p-distance method. (PDF 4993 kb) Neighbor Joining Trees of gene families (residing on human chromosomes 1/2/8/20) using JTT method. (PDF 3402 kb) Maximum likelihood Trees of gene families (residing on human chromosomes 1/2/8/20) based on WAG model. (PDF 3836 kb) Table S1. Summary of the Phylogenetic analysis of gene families with three or more members are residing on human chromosomes 1/2/8/20. (PDF 73 kb)
  50 in total

Review 1.  The new animal phylogeny: reliability and implications.

Authors:  A Adoutte; G Balavoine; N Lartillot; O Lespinet; B Prud'homme; R de Rosa
Journal:  Proc Natl Acad Sci U S A       Date:  2000-04-25       Impact factor: 11.205

2.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach.

Authors:  S Whelan; N Goldman
Journal:  Mol Biol Evol       Date:  2001-05       Impact factor: 16.240

Review 3.  Were vertebrates octoploid?

Authors:  Rebecca F Furlong; Peter W H Holland
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2002-04-29       Impact factor: 6.237

Review 4.  Numerous groups of chromosomal regional paralogies strongly indicate two genome doublings at the root of the vertebrates.

Authors:  Lars-Gustav Lundin; Dan Larhammar; Finn Hallböök
Journal:  J Struct Funct Genomics       Date:  2003

5.  Unraveling ancient segmental duplication events in human genome by phylogenetic analysis of multigene families residing on HOX-cluster paralogons.

Authors:  Amir Ali Abbasi
Journal:  Mol Phylogenet Evol       Date:  2010-08-07       Impact factor: 4.286

6.  Phylogenetic investigation of human FGFR-bearing paralogons favors piecemeal duplication theory of vertebrate genome evolution.

Authors:  Wajya Ajmal; Hiba Khan; Amir Ali Abbasi
Journal:  Mol Phylogenet Evol       Date:  2014-09-22       Impact factor: 4.286

7.  Extensive genomic duplication during early chordate evolution.

Authors:  Aoife McLysaght; Karsten Hokamp; Kenneth H Wolfe
Journal:  Nat Genet       Date:  2002-05-28       Impact factor: 38.330

8.  Ancient genome duplications did not structure the human Hox-bearing chromosomes.

Authors:  A L Hughes; J da Silva; R Friedman
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

9.  NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

Authors:  Kim D Pruitt; Tatiana Tatusova; Donna R Maglott
Journal:  Nucleic Acids Res       Date:  2006-11-27       Impact factor: 16.971

10.  An insight into the phylogenetic history of HOX linked gene families in vertebrates.

Authors:  Amir Ali Abbasi; Karl-Heinz Grzeschik
Journal:  BMC Evol Biol       Date:  2007-11-30       Impact factor: 3.260

View more
  3 in total

1.  Context-dependent functional compensation between Ythdf m6A reader proteins.

Authors:  Lior Lasman; Vladislav Krupalnik; Sergey Viukov; Nofar Mor; Alejandro Aguilera-Castrejon; Dan Schneir; Jonathan Bayerl; Orel Mizrahi; Shani Peles; Shadi Tawil; Shashank Sathe; Aharon Nachshon; Tom Shani; Mirie Zerbib; Itay Kilimnik; Stefan Aigner; Archana Shankar; Jasmine R Mueller; Schraga Schwartz; Noam Stern-Ginossar; Gene W Yeo; Shay Geula; Noa Novershtern; Jacob H Hanna
Journal:  Genes Dev       Date:  2020-09-17       Impact factor: 11.361

Review 2.  Evolution of the Neocortex Through RNA-Binding Proteins and Post-transcriptional Regulation.

Authors:  Iva Salamon; Mladen-Roko Rasin
Journal:  Front Neurosci       Date:  2022-01-10       Impact factor: 4.677

Review 3.  An Overview of Duplicated Gene Detection Methods: Why the Duplication Mechanism Has to Be Accounted for in Their Choice.

Authors:  Tanguy Lallemand; Martin Leduc; Claudine Landès; Carène Rizzon; Emmanuelle Lerat
Journal:  Genes (Basel)       Date:  2020-09-04       Impact factor: 4.096

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.