| Literature DB >> 26922242 |
Pedro Seoane-Zonjic1, Rafael A Cañas2, Rocío Bautista3, Josefa Gómez-Maldonado4, Isabel Arrillaga5, Noé Fernández-Pozo6, M Gonzalo Claros7, Francisco M Cánovas8, Concepción Ávila9.
Abstract
BACKGROUND: In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26922242 PMCID: PMC4769843 DOI: 10.1186/s12864-016-2490-z
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Structure of maritime pine genomic DNA contained in the BAC clones GenBank: KP172187 (a), GenBank: KP172184 (b) and GenBank: KP172195 (c) respectively. Light red boxes represent exons and introns the intervening line. The length in base pairs of each intron and exon is also indicated. Segments with similarity to transposable element and repetitive regions were identified with Repeat Masker and are represented by white and dark red boxes. The scale bar represents 5 kbp.
Exon length comparisons among complete AS genes from P. pinaster and AS from two angiosperm plants
| Exon length (nt) |
| ASN_BAC |
|
|
|
|
|---|---|---|---|---|---|---|
| E1 | 80 | 80 | 80 | 80 | 80 | 80 |
| E2 | 139 | 139 | 139 | 139 | 139 | 139 |
| E3 | 96 | 96 | 100 | 96 | 96 | 96 |
| E4 | 142 | 142 | 135 | 333 | 142 | 142 |
| E5 | 96 | 96 | 96 | 191 | 96 | |
| E6 | 98 | 98 | 98 | 95 | ||
| E7 | 162 | 162 | 162 | 162 | 162 | 162 |
| E8 | 81 | 81 | 84 | 303 | 81 | 81 |
| E9 | 222 | 222 | 222 | 222 | 222 | |
| E10 | 135 | 135 | 129 | 135 | 135 | 135 |
| E11 | 81/84* | 81 | 81 | 168 | 81 | 81 |
| E12 | 87 | 87 | 87 | 87 | 87 | |
| E13 | 108 | 108 | 108 | 108 | 108 | 108 |
| E14 | 252/243/240 | 252 | 78 | 231 | 246 | 231 |
*Exon from AS5. The gene capture model is also included
Intron length comparisons among complete AS genes from P. pinaster and AS from two angiosperm plants
| Intron length (nt) |
|
|
| ASN_BAC |
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| I1 | 225 | 213 | 124 | 225 | 260 | 422 | 509 | 78 |
| I2 | 101 | 88 | 112 | 101 | 186 | 88 | 156 | 227 |
| I3 | 426 | 136 | 422* | 426 | 192 | 82 | 152 | 369 |
| I4 | 119 | 133 | 126 | 119 | 102 | 78 | 94 | 263 |
| I5 | 88 | 112 | 117 | 88 | 184 | 115 | 520 | |
| I6 | 132 | 213 | 139 | 132 | 94 | 721 | ||
| I7 | 219 | 261 | 221 | 219 | 85 | 136 | 89 | 533 |
| I8 | 104 | 104 | 103 | 104 | 84 | 80 | 93 | 81 |
| I9 | 87 | 87 | 87 | 87 | 83 | 96 | 136 | |
| I10 | 92 | 89 | 99 | 92 | 209 | 83 | 120 | 453 |
| I11 | 139 | 100 | 109 | 139 | 85 | 91 | 85 | 110 |
| I12 | 89 | 104 | 102 | 89 | 91 | 91 | 236 | |
| I13 | 95 | 95 | 90 | 95 | 96 | 93 | 100 | 152 |
*Incomplete Intron. The gene capture model is also included
Content and type of repeats present in SuSy and AS1 BACs from P. glauca and P. pinaster
|
|
| AS | AS | |
|---|---|---|---|---|
| GeneBank: | KC860252 | KP172192 | KC860234 | KP172187 |
| Total length: | 137047 bp | 59397 bp | 130154 bp | 46111 bp |
| GC level: | 38.66 % | 39.03 % | 38.17 % | 34.80 % |
| Number of elements/percentage of sequence | ||||
| Bases masked: | 10259 bp (7.49 %) | 5173 bp (8.71 %) | 7548 bp (5.80 %) | 595 bp (1.29 %) |
| Retroelements | 7 (6,78 %) | 7 (7,52 %) | 9 (4,65 %) | 0 |
| LINEs: | 0 | 1 (0,70 %) | 2 (0,16 %) | 0 |
| L1/CIN4 | 0 | 1 (0,70 %) | 2 (0,16 %) | 0 |
| LTR elements: | 7 (6,78 %) | 6 (6,82 %) | 7 (4,49 %) | 0 |
| Ty1/Copia | 1 (1,09 %) | 1 (0,24 %) | 1 (0,55 %) | 0 |
| Gypsy/DIRS1 | 6 (5,60 %) | 5 (6,58 %) | 6 (3,94 %) | 0 |
| DNA transposons | 0 | 0 | 2 (0,10 %) | 0 |
| Simple repeats: | 18 (0,67 %) | 17 (0,94 %) | 18 (0,69 %) | 11 (0,98 %) |
| Low complexity: | 1 (0,03 %) | 3 (0,25 %) | 7 (0,36 %) | 2 (0,31 %) |
Fig. 2Flow chart for the Gene Assembler pipeline. a Gene assignment and contig filtering. b Contig clustering gene. c Gene model building. See text for details.
Fig. 3Distribution of individual lengths of introns smaller than 2600 nt using the 866 reconstructed gene models in P. pinaster and their orthologs from P. patens, A. thaliana, O. sativa and P. trichocarpa. The box plot includes the median values as well as outlier lengths.
Fig. 4Exon/intron gene model for the AS family in maritime pine. The red boxes are exons and the black lines introns. The corresponding size in nucleotides is indicated. The asterisks indicate introns with incomplete sequences.
Fig. 5Frequency distribution of contigs containing the 5´upstream region of the gene models generated. The sequence length recovered is expressed in nucleotides. The distribution was built using the contigs containing the first exon where the amino acid 10 or previous is present and where from the beginning of the first exon have at least 100 nt upstream.
Fig. 6Representation of gene intron size: (a) AS genes in P. pinaster, A. thaliana and P. trichocarpa; (b) SuSy gene from P. pinaster, A. thaliana and P. trichocarpa. On the X axis is showed the position of the introns and on the Y axis is showed the length of introns in nucleotides.
Fig. 7Phylogenetic tree of the deduced protein sequences of plant genes encoding asparagine synthetase (AS). The optimal tree with the sum of branch length = 2.28234152 is shown. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The following representative members of the asparagine synthetase (AS) are included in the tree: Arabidopsis thaliana [AthASN1, Phytozome:AT3G47340; AthASN2, Phytozome:AT5G65010; AthASN3, Phytozome:AT5G10240), Medicago truncatula [Phytozome:Medtr5g071360; Phytozome:Medtr3g087220], Oryza sativa [Phytozome:LOC_Os06g15420; Phytozome:LOC_Os03g18130], Physcomitrella patens [Phytozome:Pp1s410_47V6; Phytozome:Pp1s350_23V6; Phytozome:Pp1s44_242V6], Pinus pinaster [AS1, GenBank:ADU02856; AS2, Genbank:ADK13052; AS3, PGC:geneCapture_all_rep_c7631; AS4, SPDB:sp_v3.0_unigene97582/sp_v3.0_unigene8248; AS5, PGC:geneCapture_all_rep_c8956/geneCapture_all_rep_c1052], Pinus taeda [PtAS1, Congenie:lcl|scaffold622225; PtAS2, Congenie:PgdbPtadea_48226; PtAS3, Congenie:lcl|tscaffold2448; PtAS4, Congenie:lcl|tscaffold2448; PtAS5, Congenie:lcl|scaffold870050.1], Populus trichocarpa [Phytozome:Potri.005G075700; Phytozome:Potri.009G072900; Phytozome:Potri.001G278400], Sorghum bicolor [Phytozome:Sobic.005G003200; Phytozome:Sobic.001G406800; Phytozome:Sobic.010G110000], Solanum lycopersicum [Phytozome:Solyc06g007180.2; Phytozome:Solyc04g055210.2.1- Phytozome:Solyc04g055200.2], Vitis vinifera [Phytozome:GSVIVG01024713001; Phytozome:VITISV_034450], Zea mays [ZmASN1, Phytozome:GRMZM2G074589; ZmASN2, Phytozome:GRMZM2G093175; ZmASN3, Phytozome:GRMZM2G053669; ZmASN4, Phytozome:GRMZM2G078472]