| Literature DB >> 28057052 |
Andrea Brancaccio1,2, Josephine C Adams3.
Abstract
BACKGROUND: Dystroglycan (DG) is an adhesion receptor complex composed of two non-covalently associated subunits, transcribed from a single gene. The extracellular α-DG is highly and heterogeneously glycosylated and binds with high affinity to laminins, and the transmembrane β-DG binds intracellular dystrophin. Multiple cellular functions have been proposed for DG, notwithstanding that its role in skeletal muscle appears central as demonstrated by both primary and secondary severe muscular dystrophic phenotypes collectively known as dystroglycanopathies. We recently analysed the molecular phylogeny of the DG core protein and identified the α/β interface, transmembrane and cytoplasmic domains of β-DG as the most conserved region. It was also identified that the IG2_MAT_NU region has been independently duplicated in multiple lineages.Entities:
Keywords: Dystroglycan; Exon–intron junctions; Gene structure; IG domain; Intron expansion; Metazoan
Mesh:
Substances:
Year: 2017 PMID: 28057052 PMCID: PMC5216574 DOI: 10.1186/s13104-016-2322-x
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1Architecture of dystroglycan genes from different metazoan phyla. a The typical organization of the DG gene that is found in most Chordata. This panel also represents the DG gene structure identified in a hemichordate species (S. kowalevskii), the echinoderm S. purpuratus and the cephalopod mollusc O. bimaculoides. b Some species of Teleostei and Cyclostomata (Table 2) also have a paralogous DAG1 gene (designated as DAG1a, [26]) with an additional short intron that interrupts the sequence encoding the S6 domain. c Ciona intestinalis (Urochordata); d Branchiostoma floridae (Cephalochordata). e Typical DG gene organisation of Mollusca (Bivalvia and Gasteropoda) and Annelida; f Drosophila melanogaster (Arthropoda). g Caenorhabditis elegans (Nematoda). h Hydra magnipapillata (Cnidaria). i Trichoplax adhaerens (Placozoa). j Amphimedon queenslandica (Porifera). In all diagrams, pre-coding exons are in grey and coding exons are in white or coloured to represent the encoded protein domains. IG domains of Ciona DG show a lower degree of homology. Alternatively spliced exons in Drosophila are boxed by dashed lines. Introns are indicated by black lines. Boundaries of the IG1-intron are indicated by two vertical dashed lines. Key: IG1 and IG2, immunoglobulin-like domains; S6, S6-like domain; MAT, C-terminal region of α-dystroglycan upstream of the Gly-Ser maturation site; a/b, α/β DG proteolysis site, NU, natively unfolded region that forms the N-terminal region of the ectodomain of β-dystroglycan; TM, transmembrane; Cyto, cytoplasmic domain; DBS, dystrophin-binding site. Not to scale. Underlying data and accession codes are given in Tables 1 and 2
Details of the gene structures of the paralogous form of DAG1 (DAG1a) present in species of teleostei and cyclostomata
| Species | Exon1 (Pre-ATG exon) (bp) | Intron1 (pre-ATG intron) (bp) | Exon2 (ATG exon) (bp) | Intron2 (IG1-intron) (bp) | Exon3 (bp) | Intron3 (S6 mini-intron) (bp) | Exon4 (includes stop codon) (bp) | Gene accession code | Code name & notes |
|---|---|---|---|---|---|---|---|---|---|
| Chordata (teleostei) | |||||||||
| | b | b | b | b | 480 | 384 | 1824a | ENSGMOG00000003333 | >Gm2 |
| | b | b | 210 | 9155 | 459 | 726 | 1806a | ENSXMAG00000012250 | >Xm2 |
| | b | b | 336 | 2795 | 403 | 137 | 1827a | ENSTRUG00000002580 | >Tr2 1 add. 5′ intron (?) |
| Chordata (cyclostomata) | |||||||||
| | b | b | 159 | 2684 | 975 | 126c | 1356a | ENSPMAG00000009628 | >Pm2 |
aThe genome-predicted sequence ends at the stop codon
bNot present or not annotated (see Fig. 1b for schematic details)
cA mini-intron is not present in S6 but within the mucin-like region
Summary of DG gene structures from species representative of all the animal groups in which a DG gene was identified
| Metazoan main taxa | Exon1 (preATG exon) (bp) | Intron1 (preATG intron) (bp) | Exon2 (ATG exon) (bp) | Intron2 (IG1-intron) (bp) | Exon3 (stop codon & 3′UTR) (bp) | Gene accession code | Code name | Notes |
|---|---|---|---|---|---|---|---|---|
| Chordata (mammals | ||||||||
| | 302 | 39,985 | 401 | 19,977 | 4819 | ENSG00000173402 | >Hs | |
| | 139 | 45,369 | 398 | 8351 | 3707 | ENSMUSG00000039952 | >Mm | |
| | 375d | 48,846 | 383 | 16,520 | 4968 | ENSCAFG00000011207 | >Clf | 1 add. preATG exon–intr. |
| | 161 | 58,270 | 398 | 30,785 | 5789 | ENSOANG00000013307 | >Oa | |
| | 203 | 62,147 | 400 | 22,513 | 2403a | ENSDNOG00000016917 | >Dn | |
| Chordata (aves) | ||||||||
| | h | h | 317 | 8464 | 2409 | ENSGALG00000027710 | >Gg | |
| | 175 | 34,691 | 410 | 12,789 | 2740 | ENSFALG00000009006 | >Fa | |
| Chordata (reptiles) | ||||||||
| | 392 | 56,285 | 403 | 26,079 | 7196 | ENSACAG00000007264 | >Ac | |
| | h | h | 380 | 25,784 | 6793 | ENSPSIG00000016851 | >Ps | |
| Chordata (amphibia) | ||||||||
| | 125 | 19,428 | 354 | 8686 | 2917 | ENSXETG00000005928 | >Xt | |
| Chordata (teleostei) | ||||||||
| | 169 | 39,769 | 528 | 14,466 | 2899 | ENSDARG00000016153 | >Dr | |
| | h | h | 345 | 1126 | 2310a | ENSGMOG00000017787 | >Gm1 | |
| | 595 | 8826 | 566 | 2800 | 3850 | ENSXMAG00000015708 | >Xm1 | |
| | h | h | 366 | 818 | 2307a | ENSTRUG00000007345 | >Tr1 | |
| Chordata (chondrichthyes) | ||||||||
| | h | h | 276b | 11,246 | 2373a | KI635869.1 (Elephant Shark Genome Project@IMCB) | >Cmi | |
| Chordata (cyclostomata) | ||||||||
| | 172 | 24,308 | 399 | 1218 | 2218a | ENSPMAG00000000367 | >Pm1 | |
| Urochordata | ||||||||
| | h | h | 348 | 6249 | 3703 | gene 293,437 (Metazome) | >Ci | 4 add. 3′ introns |
| Cephalochordata | ||||||||
| | h | h | 708b | 10,023 | 1653a | fgenesh2_pg.scaffold_27000085 (Metazome) | >Bf | 2 add. large 5′ introns |
| Hemichordata | ||||||||
| | 161 | 5069 | 309 | 7194 | 5506 | Sakowv30014893 m.g (Metazome) | >Sk | |
| Echinodermata | ||||||||
| | h | h | 300 | 8426 | 2373a | LOC581503 (Metazome) | >Spu | |
| Arthropoda (insecta) | ||||||||
| | 437 | 8235 | 322 | 284e | 5774 | FBgn0034072 (EnsemblMetazoa) | >Dm | 12 add. 3′ introns |
| | h | h | 192b | 46 | 3153a | LOC663372 (Metazome) | >Tc | 4 add. 3′ introns |
| Arthropoda (crustacea) | ||||||||
| | h | h | 723 | 341 | 2956 | DAPPUDRAFT 300,674 (EnsemblMetazoa) | >Dpu | 3 add. 3′ introns |
| Arthropoda (chelicerata) | ||||||||
| | h | h | 111b | 7852 | 2397a (2796)a,f | ISCW015049 (EnsemblMetazoa) | >Is | 2 add. 3′ intronsg |
| Mollusca (cephalopoda) | ||||||||
| | 561 | 14,288 | 1383 | 8418 | 2310a | Ocbimv22032669 m.g (Metazome) | >Ob | |
| Mollusca (bivalvia) | ||||||||
| | h | h | 231 | 3133 | 2286a | CGI_10020032 (EnsemblMetazoa) | >Cg | |
| Mollusca (gastropoda) | ||||||||
| | h | h | 137c | 3043 | 2185a | LgGsHFWreduced.7288 (Metazome) | >Lg | |
| Annelida (sedentaria) | ||||||||
| | h | h | 210 | 48 | 3408 | CapteG183589 (Ensembl/Metazoa) | >Ct | |
| Annelida (clitellata) | ||||||||
| | h | h | 186 | 276 | 2602 | HelroG188507 (EnsemblMetazoa) | >Hr | |
| Nematode (chromadorea) | ||||||||
| | 89 | 5376 | 229 | 328 | 2236 | WBGene00000961 (Metazome) | >Ce | 3 add. 3′ introns |
| Nematode (secernentea) | ||||||||
| | h | h | 216 | 319 | 1653 | CRE07443 (EnsemblMetazoa) | >Cr | 3 add. 3′ introns |
| Cnidaria (hydrozoa) | ||||||||
| | h | h | 210b | 1751 | 2189 | Hydra_232607 (Metazome) | >Hm | 1 add. 3′ S6 intron |
| Cnidaria (anthozoa) | ||||||||
| | h | h | 162b | 2567 | 5553a | estExt_fgenesh1_pg.C_1310045 (JGI) | >Nv | 1 add. 3′ S6 intron |
| Placozoa | ||||||||
| | h | h | 114b | 320 | 2559a | TriadG60041 (EnsemblMetazoa) | >Ta | 1 add. 5′ and 1 add. 3′introns |
| Porifera (demospongiae) | ||||||||
| | h | h | 170b | 47 | 4254a | Aqu1.217766 (EnsemblMetazoa) | >Aq | |
For species in which additional introns are present, either upstream (5′) or downstream (3′) of the IG1-intron (intron2), these introns are reported in the Notes column. In these species, the value reported for E1 and/or E3 refers to the combined size originating from all the resulting exons
aThe genome-annotated sequence ends at the stop codon
bAdditional nucleotides 5′ to the initial ATG codon may be missing
cThe annotated gene sequence starts slightly downstream of the ATG codon
dAn additional pre-ATG exon is reported > 100 Kb upstream
eDue to divergence, D. melanogaster DG lacks an IG1 domain however the IG1-intron is located in a similar 5′ position to other species that contain the IG1 domain
fA recent study has demonstrated that a gene region that was previously considered to code for an intronic sequence is an exon, giving rise to a predicted protein product of 968 aa instead of 835 aa [25]
gThe first additional intron is also present within the IG1 domain
hNot present or not annotated. See Fig. 1 for schematic details
Fig. 2Expansion of the IG1-intron. a IG1-intron size as a function of genome size. b IG1-intron size as a function of DG gene size. The plots include data from 35 species representative of the metazoan phyla that encode DG. The fitted lines in the semi-logarithmic plots in panels a and b were obtained using a linear equation; the corresponding R2 value are 0.68 and 0.75, respectively
Fig. 3Sequence features of the IG1-intron at the nucleotide and protein levels. a, c MUSCLE alignment of 50 nucleotides that span the AGGT exon–intron (a), or the intron–exon (c), boundaries of the IG1-intron. Data are from 23 species representative of the metazoan phyla that encode DG. b, d Multiple sequence alignments prepared in MUSCLE 3.8 of 15 aa long regions from the IG1 domain that flank the exon–intron insertion site (b), or the intron–exon site (d) in the same species. The region shown in b includes a.a. 81–95 of human DG; the region shown in d includes a.a. 96–110 of human DG. e The secondary structural elements of the IG1 domain that encompass the intron insertion site (95KV96 in human DG, underlined) [11] demonstrate that the intronic sequence is not in register with the structural organization of the domain. f MUSCLE sequence alignment of the IG1-intron sequences from A. queenslandica and C. tellata. In all alignments, black background indicates identical nucleotides or residues in >50% of the sequences, grey background indicates conservative substitutions, and a white background indicates that the position is conserved in <50% of the sequences. Code names are as in Table 1 with the exception of Oc for Oryctolagus cuniculus and Bt for Bos taurus in (e)
Fig. 4Model of DG gene evolution. The diagram does not include ctenophores due to their uncertain evolutionary placement and that no DG-encoding sequences have been identified in ctenophores. See text for discussion