| Literature DB >> 33028229 |
Liang Jiang1,2,3, Yiqian Lu4,5, Lin Zheng4,5, Gaopeng Li4,5, Lianchang Chen4,5, Maona Zhang4,5, Jiazuan Ni4,5,6, Qiong Liu4,5,6, Yan Zhang7,8,9.
Abstract
BACKGROUND: Selenium is an essential trace element, and selenocysteine (Sec, U) is its predominant form in vivo. Proteins that contain Sec are selenoproteins, whose special structural features include not only the TGA codon encoding Sec but also the SECIS element in mRNA and the conservation of the Sec-flanking region. These unique features have led to the development of a series of bioinformatics methods to predict and research selenoprotein genes. There have been some studies and reports on the evolution and distribution of selenoprotein genes in prokaryotes and multicellular eukaryotes, but the systematic analysis of single-cell eukaryotes, especially algae, has been very limited.Entities:
Keywords: Algae; Evolution; Genomics; Selenium; Selenoprotein
Mesh:
Substances:
Year: 2020 PMID: 33028229 PMCID: PMC7539508 DOI: 10.1186/s12864-020-07101-z
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Distribution of algal selenoproteomes. Selenoprotein families predicted based on the genomic sequences of 36 algal species. The taxonomic tree of these organisms is shown on the left (based on ref [29, 30]). In the tree, a green branch indicates a high-level selenium-containing organism (with a number of selenoproteins in a single species > = 20), and a red branch represents a low-level organism (with a number of selenoproteins in a single species <= 2). On the right, the taxonomic classification of different groups of algae is shown in different colors. The presence or absence of a selenoprotein and/or its homologs in each organism is highlighted in the pie graphs: The green, orange, and gray colors represent selenoproteins, Cys-containing homologs, and homologs containing other residues, respectively. The sizes of the whole pie and each sector represent the number of genes in the corresponding groups. The first bar on the right shows the number of selenoprotein families in different algae. The meaning of the various colors in the cylinder is consistent with the color in the pie chart matrix (green: Sec, orange: Cys, gray: others). While the length of the column represents the total number of protein families in each species, the two-color column (green and orange) indicates that the multiple protein families in the species include both Sec-containing and Cys-containing members. The rightmost blue bar chart shows the number of selenoprotein genes found in the genome of each species
Fig. 2Multiple sequence alignment and phylogenetic analysis of novel selenoproteins. a PDI_e, b AhpC, c SymSEP. The Sec residue is marked with a green background. The sequence numbers, phyla names, and organism names are shown on the left, and the sequences from the NR database are shown with their accession IDs in brackets
Fig. 3Substitution of Sec with other amino acids in algal selenoproteins. Statistics on the substitution of Sec in members of the selenoprotein family in all algae and related evolutionary phyla based on 137 algal sequences and the NR database. The size and proportion of the pie chart in the figure schematically show the number of genes of various types in each evolutionary phylum. Different colors represent the type of amino acid at the position containing Sec, and the meaning of the color is shown in the legend on the right
Fig. 4Gene clustering and fusion of algal selenoproteins. a Matrix of gene clusters of algal selenoproteins. A matrix cell composed of two or more colored boxes is a gene cluster. The colored box and the label on top indicate the family of the gene in the cluster. The U or C in the box represents the Sec or Cys form of the gene in the cluster. The species names are labeled on the left. b Genomic synteny of sequences containing the SELENOF-PDI_a gene cluster. c Conserved domain distribution matrix of algae selenoproteins. The abbreviation of each selenoprotein family is labeled at the top. The name and IPR id of the conserved domain are marked on the left. The number in the colored box next to the domain name indicates how many selenoprotein families contain the domain. The colored box in the matrix indicates that the corresponding domain has been detected in the selenoprotein family on the top. d Gene structure of fusion selenoprotein genes. The ruler on the top shows the genomic location. The arrow on the green box indicates the strand of the gene. The position of the EST matching the genome sequence is shown by the pink box
Fig. 5Heatmap of algae selenoprotein distribution. The selenoprotein families and organisms were clustered based on the existence of selenoproteins or different types of homologies. The cluster trees are shown on the top and left side of the heatmap. In the organism cluster tree, the green/red branches indicate high−/low-level selenium algae, which is also shown in Fig. 1. The colored cells with different shades in the heatmap indicate the existence of the different types of selenoproteins or homologies. The meaning of the colors is shown in the top-left corner square. For example, “dark green”, labeled with “Sec” indicates the exclusive existence of selenoprotein; “light green”, labeled with “Sec & Cys”, indicates that selenoprotein and Cys-containing homology were both identified; “gray”, labeled with “other”, indicates the exclusive existence of homologs containing neither Sec nor Cys. The taxonomic description of algae, such as Plante, SAR group, Diatoms, Red algae, etc., is shown beside the organism names with different color backgrounds. On the bottom, the selenoproteome size, genomic size, gene numbers, and living environments of each organism are shown in order. In the chart of “selenoproteome size”, the length of the whole column (composed of green and gray areas) represents the total number of protein families (including selenoproteins and other homologies) of each species. The length of the green bar indicates the number of selenoprotein families. Additionally, the red bar inside the column indicates the number of genomic flanking region duplications found in a specific organism