Literature DB >> 27560837

Evolutionary analysis of apolipoprotein E by Maximum Likelihood and complex network methods.

Leandro de Jesus Benevides¹, Daniel Santana de Carvalho², Roberto Fernandes Silva Andrade², Gilberto Cafezeiro Bomfim², Flora Maria de Campos Fernandes².

Abstract

Apolipoprotein E (apo E) is a human glycoprotein with 299 amino acids, and it is a major component of very low density lipoproteins (VLDL) and a group of high-density lipoproteins (HDL). Phylogenetic studies are important to clarify how various apo E proteins are related in groups of organisms and whether they evolved from a common ancestor. Here, we aimed at performing a phylogenetic study on apo E carrying organisms. We employed a classical and robust method, such as Maximum Likelihood (ML), and compared the results using a more recent approach based on complex networks. Thirty-two apo E amino acid sequences were downloaded from NCBI. A clear separation could be observed among three major groups: mammals, fish and amphibians. The results obtained from ML method, as well as from the constructed networks showed two different groups: one with mammals only (C1) and another with fish (C2), and a single node with the single sequence available for an amphibian. The accordance in results from the different methods shows that the complex networks approach is effective in phylogenetic studies. Furthermore, our results revealed the conservation of apo E among animal groups.

Entities: Chemical Disease Gene Species

Year: 2016 PMID： 27560837 PMCID： PMC5127143 DOI： 10.1590/1678-4685-GMB-2015-0164

Source DB: PubMed Journal: Genet Mol Biol ISSN： 1415-4757 Impact factor: 1.771

Introduction

Apolipoprotein E (apo E) is a human glycoprotein composed of 299 aminoacids (Wernette-Hammonds ). It is one of the main proteins in plasma, to where it is exported after being synthesized, mainly in the liver (Elshourbagy and Walker, 1986; Lin ). In the cytoplasm, apo E is the major component of very low density lipoproteins (VLDL). It is also part of a high-density lipoproteins group (HDL) (Mensenkamp ) that play a key role in triglyceride-rich component catabolism in humans (Kasap ). Apo E deficiencies cause diseases related to the increase in cholesterol and triglycerides level in the circulation, such as hyperlipidemia, coronary heart disease, Alzheimer's disease, and other neurodegenerative diseases (Lohse ; Corder ; Mahley and Huang 1999). There are other apolipoproteins related to triglyceride and cholesterol redistribution in animals, and depending on the organism, different lipoprotein transport systems can exist using similar mechanisms (Davis, 1997). For instance, apo E is not found in birds, but a homologous apolipoprotein (apo A-I) performs functions that are similar to those of apo E in humans and other vertebrates (Rajavashisth ; Lamon-Fava ). Hence, to understand the role of apo E within groups of organisms, phylogenetic studies may contribute to elucidate questions about how apo E proteins are related in different vertebrates, and whether they may have evolved from a common ancestor (Kasap ). Currently, phylogenetic results are continuously improved due to the increasing availability of a large amount of biological data and to new approaches to analyze them. Regarding the number of organisms included in a previous study (Kasap ), it is now possible to obtain a phylogeny based on more than twice as many organisms as before and with different methodologies. Here, we first conducted a phylogenetic reconstruction based on the Maximum Likelihood (ML) method, which is one of the most reliable ones among the available methods. ML is a phylogenetic inference method that uses probabilistic models for representing the tree with the highest probability or likelihood of correctly reproducing the observed data. The ML criterion allows the analysis of several variables, which makes the method robust, but with increasing computational cost. Another method for phylogenetic reconstruction that is also robust and demands less computational cost, is Bayesian Inference (BI). In practice, ML and BI should present similar and accurate results, but we decided not to use BI because we did not have a problem with the computational cost for our data and because there is some controversy about this method. For example, some studies suggest that BI can be more prone to long-branch attraction biases than ML techniques, and that there could be some problems with Markov chain Monte Carlo (MCMC) implementations or prior distributions (Kolaczkowski and Thornton, 2009). In addition to the ML method, several complex network-based approaches have been proposed, which allow for the analysis of larger datasets with relatively low computational cost (Andrade ). Such methods have been widely applied to biological evolutionary studies (Junker and Schreiber, 2008; Boccaletti, 2009). Therefore, a second set of results for a complex network-based phylogenetic classification is presented in this study, which uses the network modularity concept. It uses the apo E similarity scores evaluated for each pair of organisms in the considered set to identify network communities (or modules). Here, the nodes of each community have a larger number of connections with other nodes within the same modules, in comparison to the connections they have with nodes outside of it. The phylogenetic tree then results from the identification of modules. Note that when the nodes have connections within their own modules, the respective communities become isolated. Summarizing, this study aimed at performing a phylogenetic analysis based on a set of 32 apo E amino acid sequences using both the ML method and a reliable complex network based approach (Andrade ). The latter does not require further biological assumptions for its application, except for the ones used by BLAST to calculate similarity scores. As it will become clear in the next sections, the results obtained using the two different approaches are in conformity and helped to clarify how various apo E proteins are related among different groups of animals.

Material and Methods

Sequences

The apo E sequences were downloaded from NCBI (National Center for Biotechnology Information, http://www.ncbi.nih.gov) on November 17, 2014 (Table S1). Based on homologous sequences deposited in the HomoloGene database (http://www.ncbi.nlm.nih.gov/homologene), two BLAST (Basic Local Alignment Search Tool) analyses (Altschul ) have been performed: the first one used the Homo sapiens apo E sequence as query, whereas in the second one the Danio rerio apo E sequence was the query used. Only sequences with an E-value lower than e10−6 and without the RefSeq status PREDICTED were considered. The only representative amphibian sequence (Xenopus (Silurana) tropicalis) found in NCBI was the same as the one used by Kasap . Once collected, the sequences were aligned computationally using ClustalW (Thompson ) default parameters. The alignment was subsequently revised and manually edited in BioEdit 7.1.3.0 (Hall 1999), following the instructions in Hall (2011).

Maximum Likelihood

After alignment and editing, the dataset was first subjected to the ML phylogenetic analysis. The MEGA6 software (Tamura ) was used to select the best evolutionary substitution model of amino acid residues for this dataset, whereas the Tree-Puzzle 5.2 software (Schmidt ) was used for likelihood mapping analysis. To perform the phylogenetic reconstruction, MEGA6 was used under the ML criterion with 1000 bootstrap replicates.

Complex Networks

In the first step of the complex network analysis, a local pairwise alignment was performed using BLAST version 2.2.21 (StandAlone), which resulted in a similarity S matrix with elements 0≤s ≤100 consisting of the mean value of the pairwise similarity scores between sequences (i,j) and (j,i). The similarity values were considered valid for those pairs of sequences with an E-value equal to or smaller than 1.0. Next, the community identification started with the construction of the one-parameter network family, as proposed by Andrade . The symmetric S matrix allows the construction of a one-parameter family of undirected networks, which is generated according to the following scheme: i) the apo E sequences were considered as nodes; ii) the edges, representing interactions between nodes, depend on the similarity degree of the sequences; iii) a control parameter, represented by a threshold similarity S=σ, allows generating a network family of adjacency matrices M(σ), such that the matrix elements M can only assume the values 0 or 1; iv) these values indicate, respectively, the absence or presence of an edge between nodes i and according to the following prescription: M=1 if s3 σ; M=0 if s < σ. Finally, the modularity is obtained by two properties of the network family. The first one is the network distance δ between two networks evaluated at nearby σ values, denoted by δ(σ,σ+Δσ). The graph of δ as function of σ allows identifying the optimal σ values at which the network undergoes significant structural changes that can be associated with the emergence of communities separated from the rest of the network. The critical networks graphical representation was done through GePhi software (Bastian ). The phylogenetic classification is completed by the construction of a dendrogram with the σ value representing the horizontal axis, in which communities or isolated nodes give rise to subtrees or individual branches at the σ-values where they become disconnected from the other modules or completely isolated from any other node. It is worthy of note that the construction of the dendrogram in the current study relied only on community separation by changing the σ value. This methodological change could be made because the data set in the current work was much smaller than those data sets used in previous studies (Góes-Neto ; Andrade ), in which the dendrograms at fixed σ values were obtained by successive link elimination based on Newman-Girvan's algorithm. Nonetheless, we also obtained the same results reported in the next section by applying of the previously two-step procedure.

Results

Phylogeny

The obtained dataset was composed of 32 apo E amino acid sequences. Data mining of the NCBI records did not detect apo E sequences of birds and reptiles. The multiple alignments of these sequences, after manual editing, contained 250 residues. The analysis based on the Likelihood Ratio Test indicated that the evolutionary substitution model of amino acid residues that better explains the data is the Jones-Taylor-Thornton one (JTT) (Jones ), with gamma distribution (α = 3.8665). The likelihood mapping result indicated the presence of a phylogenetic signal with 88.8% of quartets located at the vertices and 7.7% in the center-dimensional simplex. The unrooted phylogenetic tree is shown in Figure 1, as a phylogram.

Figure 1

Phylogenetic tree of apo E sequences constructed by the ML method. The bootstrap analysis was performed with 1000 pseudoreplicates. Bootstrap values lower than 70% are not shown. The evolutionary distances were computed using the Jones-Taylor-Thornton (JTT) model.

Complex Network

After the construction of networks at different σ values, it was possible to obtain the dynamics of edges elimination represented by the dendrogram shown in Figure 2. In the horizontal axis, it is possible to identify the σ−values in which the sequences of two or more organisms have lost their connection with the other sequences.

Figure 2

Complex networks analysis dendrogram showing the edge elimination dynamics according to the similarity threshold σ (in %).

These branching events are clearly revealed by the δ-distance graph shown in Figure 3A. It shows that the apo E network family undergoes the most important structural change at σ=30%, switching from a completely connected graph to a modular structure consisting of two communities (C) and an isolated node (Figure 3B). They correspond, respectively, to two communities of organisms in the databases [mammalian (C1) and fish (C2)], while the sequence corresponding to frog remained isolated (Figure 3B).

Figure 3

Complex networks analysis. (A) Graph showing the δ-distance between the networks, emphasizing the indication of the branching event at σ=30%. Smaller peaks related to branching events in primate and fish sub-communities are highlighted in Figures 5A and 6A. (B) Representation of apo E network at σ=30%.

If one considers a σ > 30%, the C1 community can be regarded as an independent network composed of mammals only. If a similar structure identification based on δ-distance is carried out, a new critical threshold value of σ=77% is obtained (Figure 4A). The network obtained with this σ-value separated two communities, one composed of primates (C1.1) and the other of four rodents (C1.2). Prior to this, seven further sequences were separated: T. glis, O. cuniculus, S. scrofa, P. alecto, B. taurus, C. porcellus and Z. californianus (Figure 4B). As C1 revealed a subcommunity with 11 primates (C1.1), an analysis was performed for this particular subcommunity. It was possible to detected a new branching event at σ=86% (Figure 5A). Among the species within C1.1, five sequences (G. gorilla, H. sapiens, P. pygmaeus, P. troglodytes, and H. lar) were the most interconnected ones (Figure 5B).

Figure 4

Complex networks analysis. (A) Graph showing the δ-distance between networks formed by the apo E sequences from mammals with σ=77%. (B) Network representation of apo E mammal sequences at σ=77%.

Figure 5

Complex networks analysis. (A) Graph showing the δ-distance between networks formed by the apo E sequences from primates with σ=86%. (B) Network representation of apo E primate sequences at σ=86%.

The analysis of community C2, which contains apo E sequences from fish, led to the identification of three further significant σ-values, at 46%, 67% and 88%, as shown in Figure 6A. A fish sub-network constructed for σ=46% showed that only the N. guentheri sequence separates from the others, whereas in the network obtained for σ=67%, the O. mykiss apo E sequence was also separated from two other subcommunities: C2.1 (T. rubripes, O. fasciatus and P. flavescens) and C2.2 (Danio rerio, H. mylodon, C. auratus and C. carpio) (Figure 6B). Finally, at σ=88% in C2.2, the D. rerio sequence was separated from the others.

Figure 6

Complex networks analysis. (A) Graph showing the δ-distance between networks formed by the apo E sequences from fishes with maximum δ values at σ= 67. (B) Network representation of apo E fish sequences at σ=67%.

Discussion

This study considered 32 apo E amino acid sequences deposited in the NCBI database. Apo E sequences from birds and reptiles were not found in the search, which is in accordance with previous studies suggesting that apo E does not exist in some groups, and that, probably, other apolipoprotein homologues perform a corresponding function (Lamon-Fava ; Duggan and Callard, 2001). The construction of a complex network family was based on the method proposed by Andrade . It allows a comparison between two networks obtained at close similarity thresholds, using the concept of δ-distance between complex networks (Andrade ). Thus, changes in the network structure can be detected just by drawing a graph of the δ-distance between networks as function of the similarity threshold value of σ. This procedure allows identifying a few critical similarity threshold values of σ, at which an optimal network can be generated with clear-cut modular properties. The phylogenetic tree obtained by ML is not rooted because there was no sequence included that could be used as outgroup. In this tree, a clear separation can be observed between three classes: Mammalia, Amphibia and Actinopterygii (Figure 1). The branch representing the class Mammalia is supported by bootstrap score of 100, whereas the one representing the Actinopterygii shows bootstrap score of 99. The branches having bootstrap scores above 70 can be considered as well supported in phylogenetic trees constructed from amino acid sequence data. Hence, the branches representing Mammalia and Actinopterygii in this study are well supported. These results were corroborated by the use of a complex network method, which indicated a first splitting of the original full connected organism group into two different communities and an isolated node. The first community contained only mammals (C1), the second one was composed of fish (C2), while the amphibian sequence (X. tropicalis) became detached from all other nodes (Figure 3B). Due to the low σ-value necessary to separate these communities (30%) (Figure 3A), it is possible to infer that the conservation of apo E sequences among the different metazoan classes is very low. Moreover, a much higher apo E conservation could be observed within the Mammalia class. This result is in accordance with the phylogenetic tree shown in Figure 1, in which the class Mammalia was supported with a bootstrap value of 100, and in the network shown in Figure 4, in which it was possible only at σ = 77% to observe a significant new separation of the sequences. Other well-supported clusters shown in Figure 1 are the orders Rodentia (M. musculus and R. norvegicus) (bootstrap 99) and Primates (bootstrap 99). The topology of the tree shown in Figure 1 is in accordance with the proposed phylogenetic tree for vertebrates in the Tree of Life Web Project (Janvier, 1997). Within the group of primates a polytomy was found, which encompasses four taxa of the Hominidae family (Figure 1). It was likely not resolved by the ML method due to the high degree of apo E sequence conservation in these primates. This high degree of conservation was also seen in the complex networks analysis (Figure 2). At σ=84%, the last grouped sequences (H. sapiens, P. troglodytes, H. lar, P. pygmaeus and G. gorilla) are split from the other primates. Considering the Hominidae clade, P. pygmaeus separated from the other four sequences at σ =85%, while for Pan troglodytes and Hylobates lar the separation occurred at σ =86%. Finally, H. sapiens and G. gorilla separated from each other at σ =88% (Figure 2). Although the joint separation of taxa may occur at the same threshold, it does not necessarily indicate that these taxa belong to the same clade. It probably occurs due to the high degree of apo E conservation, as pointed out above. In addition, it is interesting to note that H. sapiens and G. gorilla remain connected until σ = 88%, confirming the proximity between Pan, Homo and Gorilla, which is in accordance with the literature (see, e.g., Perelman ). Thus, the complex network method used proved to be reliable and sensitive to small changes in sequence similarity, since it was able to detect the expected communities based only on the apo E sequences from these primates. The network composed only by primate sequences (Figure 5A) for σ=86% supports the conservation of apo E. The Hominidae apoE sequences were placed in the same branch, and they are connected in the network analysis (Figure 5B). Hence, it confirms high protein conservation, and the challenge of obtaining a fully resolved phylogeny, even with assistance of the complex network analyses. Among fish, represented in this study by the class Actinopterygii, the tree is fully resolved and apo E conservation within the orders is once again evident. This is denoted by the well supported branch (bootstrap 100) that represents the family Cyprinidae. For σ=67%, the network analysis performed only with fish apo E sequences indicated the presence of two groups: the first one formed by the family Cyprinidae and the other by two species from the order Perciformes (Oplegnathus fasciatus and Perca flavescens), and one species from the order Tetraodontiformes (Takifugu rubripes). Notably, the results from the two methods used in this study were similar. Given the huge amount of data available, our results also indicated that the development of new approaches, like those based on complex network theory, can be important to support and advance phylogenetic studies, as already suggested by Goes-Neto and Andrade . From an evolutionary point of view it is interesting to note that the similarity threshold between fish, amphibians and mammals is low (30%, Figure 3A). Noting that lipoproteins can play an important role in dietary function, this result was expected and as seen in other studies (Finch and Stanford, 2004; Kasap ; McIntosh ), the diet from different animal groups is very divergent, shaping Apo E differently in these animals. Despite the clear divergence between mammals, fishes and amphibians, our methods were able to detect significant divergence between apo E within these groups, as seen in primates and rodents, for example. Furthermore, our results suggest that this lipoprotein presents an adaptive flexibility with regard to the separation of the vertebrate groups studied here. Thus, from the low similarity percentage needed to separate different organism classes (Amphibia, Chondrichthyes and Mammalia), we infer that apo E was under strong selective pressure in each of the classes, possibly because of dietary divergence. Dietary divergence may have lead to significant changes in apo E so that it was possible to see a clear distance in protein similarity between orders (e.g. Rodentia, Primates).

23 in total

Review 1. Apolipoprotein E: from atherosclerosis to Alzheimer's disease and beyond.

Authors: R W Mahley; Y Huang
Journal: Curr Opin Lipidol Date: 1999-06 Impact factor: 4.776

2. The rapid generation of mutation data matrices from protein sequences.

Authors: D T Jones; W R Taylor; J M Thornton
Journal: Comput Appl Biosci Date: 1992-06

3. Apolipoprotein E phylogeny and evolution.

Authors: Murat Kasap; Ali Sazci; Gurler Akpinar; Emel Ergul
Journal: Cell Biochem Funct Date: 2008 Jan-Feb Impact factor: 3.685

4. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0.

Authors: Koichiro Tamura; Glen Stecher; Daniel Peterson; Alan Filipski; Sudhir Kumar
Journal: Mol Biol Evol Date: 2013-10-16 Impact factor: 16.240

5. The nucleotide and derived amino acid sequence of human apolipoprotein A-IV mRNA and the close linkage of its gene to the genes of apolipoproteins A-I and C-III.

Authors: N A Elshourbagy; D W Walker; M S Boguski; J I Gordon; J M Taylor
Journal: J Biol Chem Date: 1986-02-15 Impact factor: 5.157

6. Apolipoprotein E participates in the regulation of very low density lipoprotein-triglyceride secretion by the liver.

Authors: A R Mensenkamp; M C Jong; H van Goor; M J van Luyn; V Bloks; R Havinga; P J Voshol; M H Hofker; K W van Dijk; L M Havekes; F Kuipers
Journal: J Biol Chem Date: 1999-12-10 Impact factor: 5.157

7. Comparative protein analysis of the chitin metabolic pathway in extant organisms: a complex network approach.

Authors: Aristóteles Góes-Neto; Marcelo V C Diniz; Leonardo B L Santos; Suani T R Pinho; José G V Miranda; Thierry Petit Lobao; Ernesto P Borges; Charbel Niño El-Hani; Roberto F S Andrade
Journal: Biosystems Date: 2010-04-24 Impact factor: 1.973

Review 8. Meat-adaptive genes and the evolution of slower aging in humans.

Authors: Caleb E Finch; Craig B Stanford
Journal: Q Rev Biol Date: 2004-03 Impact factor: 4.875

9. Glycosylation of human apolipoprotein E. The carbohydrate attachment site is threonine 194.

Authors: M E Wernette-Hammond; S J Lauer; A Corsini; D Walker; J M Taylor; S C Rall
Journal: J Biol Chem Date: 1989-05-25 Impact factor: 5.157

10. Detecting network communities: an application to phylogenetic analysis.

Authors: Roberto F S Andrade; Ivan C Rocha-Neto; Leonardo B L Santos; Charles N de Santana; Marcelo V C Diniz; Thierry Petit Lobão; Aristóteles Goés-Neto; Suani T R Pinho; Charbel N El-Hani
Journal: PLoS Comput Biol Date: 2011-05-05 Impact factor: 4.475

2 in total

Review 1. Apolipoprotein E and Atherosclerosis.

Authors: A D Marais
Journal: Curr Atheroscler Rep Date: 2021-05-10 Impact factor: 5.113

2. Comparison of complex networks and tree-based methods of phylogenetic analysis and proposal of a bootstrap method.

Authors: Aristóteles Góes-Neto; Marcelo V C Diniz; Daniel S Carvalho; Gilberto C Bomfim; Angelo A Duarte; Jerzy A Brzozowski; Thierry C Petit Lobão; Suani T R Pinho; Charbel N El-Hani; Roberto F S Andrade
Journal: PeerJ Date: 2018-02-09 Impact factor: 2.984

2 in total