| Literature DB >> 24999437 |
Mui-Keng Tan1, Damian Collins1, Zhiliang Chen2, Anna Englezou1, Marc R Wilkins3.
Abstract
Using de novo assembly of 46 million paired end sequence reads of length 250 bp for a myrtle rust isolate, we have estimated its genome size to be between 103 and 145 Mb and the number of proteins as >19,000. Annotation of the contigs found a very large percentage of proteins are associated with molecular functions of DNA binding or binding in biological processes for DNA integration and RNA-dependent DNA replication. A large proportion of these activities are attributed to the transposable elements (TEs). These elements are estimated to comprise 27% of the genome with 22% retrotransposons and 5% DNA transposons. The exon and intron boundaries of 46 genes occurring on contigs >20,000 bp have been determined. The number of introns range from 2 to 20 with a mean of 7. Phylogenetic analyses using partial COXI, 18S rRNA and 28S rRNA genes have placed myrtle rust in the Pucciniaceae lineage on a separate taxonomic branch from the families of Pucciniaceae, Phragmidiaceae, Sphaerophragmiaceae, Phragmidiaceae, Uropyxidaceae, Chaconiaceae and Phakopsoraceae. Further work is thus required to determine the family placement of myrtle rust in the Pucciniaceae of Pucciniales.Entities:
Keywords: DNA repeats; Melampsora; Phakopsora; Puccinia; phylogeny; transposons
Year: 2014 PMID: 24999437 PMCID: PMC4066913 DOI: 10.1080/21501203.2014.919967
Source DB: PubMed Journal: Mycology ISSN: 2150-1203
A comparison of genome assembly results for myrtle rust using different de novo assemblers.
| Total # of contigs | N50[ | Max Contig (bp) | Total length (bp) | ||
|---|---|---|---|---|---|
| all | 4,963,090 | 439 | 3099 | 2,147,976,637 | |
| [ | 116,914 | 859 | 103,828,908 | ||
| all | 1,321,742 | 493 | 8631 | 573,387,137 | |
| 135,453 | 976 | 137,671,501 | |||
| all | 759,095 | 558 | 25,668 | 362,000,000 | |
| 99,806 | 1522 | 145,365,606 | |||
| all | 766,280 | 567 | 6094 | 387,909,377 | |
| 109,692 | 949 | 108,060,557 | |||
| all | 148,139 | 3165 | 47,187 | 387,958,276 | |
| 148,138 | 3165 | 387,956,878 | |||
| 37,684 | 5535 | 203,507,520 | |||
| 23,106 | 7929 | 153,506,636 |
Notes: 1overlap length;
contig length (bp);
An N50 contig size of N means that 50% of the assembled bases are contained in contigs of length N or larger. N50 sizes are often used as a measure of assembly quality because they capture how much of the genome is covered by relatively large contigs.
Protein composition in the myrtle rust genome.
| Contig set | Contig set | |||
|---|---|---|---|---|
| Proteins | # of contigs | % | # of contigs | % |
| Total # of contigs | 37,605 | 23,106 | ||
| No blastx hits | 17,714 | 47.0 | 9150 | 39.6 |
| Mapped proteins | 6800 | 18.1 | 4932 | 21.3 |
| Blast2GO annotated proteins | 2907 | 7.7 | 2032 | 8.8 |
| Mitochondrial contigs | 79 | 0.21 | 65 | 0.28 |
| Hypothetical proteins | 10,105 | 26.9 | 6927 | 30 |
| -[Uncharacterized protein | [21 | [0.1 | [16 | [0.1 |
| hypothetical protein [ | 8803 | 23.4 | 5994 | 25.9 |
| hypothetical protein | 704 | 1.9 | 496 | 2.1 |
| [ | 19 | 0.1 | 17 | 0.1 |
| hypothetical protein | 30 | 0.1 | 20 | 0.1 |
| [ | 8] | 0] | 7] | 0] |
| hypothetical protein [ | ||||
Notes: [] indicates the breakdown of hypothetical protein.
PGTG_XXXXX: hypothetical protein [Puccinia graminis f. sp. tritici CRL 75-36-700-3].
MELLADRAFT_XXXXX: hypothetical protein [Melampsora larici-populina 98AG31].
TREMEDRAFT_XXXXX: hypothetical protein [Tremella mesenterica DSM 1558].
MPER_XXXXX: hypothetical protein [Moniliophthora perniciosa FA553].
Figure 1.Direct counts of sub-lists of GO terms for cellular components (A), biological process (B) and molecular function (C) in the contig set, l ≥ 3000 of the myrtle rust genome.
Percentages of different classes of TEs in contigs, l ≥ 3000 bp and l ≥ 4000 bp from CLC-Bio Genomics Workbench.
| Total length | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| TEs | # contigs | % | Total length (nucleotide) | % | # contigs | % | Total length (nucleotide) | % | |
| Class I: | |||||||||
| LTR retro-transposons | 7995 | 21.2 | 43,373,367 | 21.4 | 4930 | 21.4 | 32,851,702 | 21.6 | |
| Class II: | |||||||||
| DNA transposons | 1626 | 4.3 | 10,648,124 | 5.3 | 1153 | 5.1 | 9,019,956 | 5.8 | |
| [ | [ | [ | [ | [ | [ | [ | [ | [ | |
| -mutator | 311 | 19.1 | 2,040,658 | 19.1 | 223 | 19.3 | 1,739,036 | 19.3 | |
| -hAT | 56 | 3.4 | 417,836 | 3.9 | 42 | 3.6 | 369,208 | 4.1 | |
| -Tel | 29 | 1.8 | 161,062 | 1.5 | 20 | 1.7 | 129,046 | 1.4 | |
| ] | ] | ] | ] | ] | ] | ] | ] | ] | |
Note: 1[] indicates the breakdown of DNA transposons.
Figure 2.Distribution of coverage (log10 scale) of all contigs from the CLC-Bio Genomics Workbench assembled data.
Distribution of mean, median, minimum and maximum coverages of TEs and hypothetical protein in the contig set, l ≥ 3000 bp and l ≥ 4000 bp.
| Contig set with | Contig set with | |||||||
|---|---|---|---|---|---|---|---|---|
| Proteins | Mean | Median | Min | Max | Mean | Median | Min | Max |
| Retrotransposable elements | 21.5 | 11.55 | 4.22 | 11,286.87 | 21.4 | 11.59 | 4.23 | 11,286.87 |
| Copia polyprotein | 16 | 11.2 | 4.91 | 796.6 | 14.5 | 11.43 | 5.02 | 412.22 |
| Gypsy retrotransposon | 22.5 | 12.03 | 4.62 | 944.55 | 18.3 | 12.00 | 4.83 | 538.69 |
| DNA transposons | 26.4 | 11.315 | 4.61 | 11,188.22 | 13.8 | 11.6 | 5.23 | 462.26 |
| HAT | 12.7 | 11.57 | 6.7 | 40.34 | 12.2 | 11.84 | 6.7 | 24.57 |
| DDE | 12.4 | 10.61 | 5.47 | 113.58 | 13.1 | 11.34 | 5.8 | 113.58 |
| Mutator | 16.3 | 11.32 | 4.93 | 583.75 | 13.3 | 11.57 | 5.54 | 100.35 |
| Tcl | 11.6 | 11.33 | 5.76 | 18.13 | 11.5 | 11.22 | 5.76 | 18.13 |
| RNase H | 18.7 | 11.5 | 4.44 | 896.15 | 15.4 | 11.82 | 5.99 | 255.31 |
| Hypothetical protein | 16.7 | 10.91 | 3.85 | 5704.75 | 16 | 11.05 | 4.33 | 5704.75 |
Annotated genes of the myrtle rust pathogen with their corresponding GenBank accession numbers.
| Gene | # exons | Contig length | GenBank accession # |
|---|---|---|---|
| AarF; Pkc_like | 5 | 22,551 | KF431993 |
| AdoMet Mtases | 6 | 22,485 | KF431980 |
| ATP12 | 7 | 22,485 | KF431980 |
| ATP_sub_h | 2 | 20,659 | KF431974 |
| C2_RasGAP | 3 | 33,008 | KF431975 |
| Clathrin-associated protein | 9 | 23,635 | KF431979 |
| Cpn60_TCPl | 8 | 20,034 | KF431976 |
| DNA repair protein (radl) | 14 | 32,354 | KF431977 |
| DnaJ | 7 | 21,047 | KF431981 |
| DSPc | 6 | 20,930 | KF431982 |
| Eukaryotic translation initiation factor 2 subunit | 7 | 26,736 | KF431983 |
| Farnesyl-diphosphate farnesyltransferase | 8 | 24,879 | KF431984 |
| FAT; TRRAP; PI3Kc | 14 | 31,264 | KF431988 |
| GITSHD | 10 | 21,927 | KF431985 |
| Glyco hydro 2 C | 12 | 28,805 | KF431986 |
| Glyco transf 25 | 2 | 22,428 | KF431989 |
| Glycoside hydrolase family 92 protein | 20 | 20,606 | KF431987 |
| Heterokaryon incompatibility protein Het-C | 17 | 26,736 | KF431990 |
| IbpA ACD LpsHSP like | 3 | 33,590 | KF431991 |
| MBOAT2 | 3 | 21,660 | KF431992 |
| nadF | 5 | 21,393 | KF431994 |
| Patatins and Phospholipases | 6 | 31,522 | KF431995 |
| Pectate lyase 3 | 2 | 22,483 | KF431996 |
| Peptidase C14 (Caspase domain; pfam656) | 11 | 24,405 | KF431997 |
| Peptidase_M 14NE-CP-C_like | 6 | 29,845 | KF431998 |
| peptidase M17 | 9 | 25,580 | KF431999 |
| Peptidylprolyl isomerase | 4 | 24,879 | KF432000 |
| Phox homology (PX) domain protein (COG5391) | 8 | 20,306 | KF432001 |
| rab family protein | 4 | 22,719 | KF432002 |
| Ras-like protein Rab7 | 6 | 30,285 | KF432003 |
| Ribosomal PO like | 5 | 37,821 | KF432004 |
| RINT-1 TIP-1 family; pfam4,437 | 9 | 20,112 | KF432005 |
| SCAMP family; pfam4,144 | 5 | 24,168 | KF432006 |
| SecE | 4 | 20,357 | KF432007 |
| Sen 15 | 4 | 22,496 | KF432008 |
| SF3bl HSH155 | 13 | 21,811 | KF432009 |
| Sfil | 11 | 23,803 | KF432010 |
| Small nuclear ribonucleoprotein D3 | 4 | 24,696 | KF432011 |
| SPX_CitT_SLC13 permease | 5 | 20,601 | KF432012 |
| Sun_AdoMet_MTases | 10 | 20,962 | KF432013 |
| TFIIE beta winged helix | 7 | 23,401 | KF432014 |
| Ubiquitin thiolesterase | 9 | 20,356 | KF432015 |
| Ubox_RING_cyclophilin_RING | 11 | 29,845 | KF432016 |
| Uncharacterized conserved protein COG0397 | 5 | 20,101 | KF432017 |
| WD40 Peptidase CI9 UCH 1 PAN2 exo | 10 | 22,935 | KF432018 |
| COXI gene | 11 | 38,639 | KF431978 |
| 28S-ITS-18S rRNA unit | – | 6326 | KF792096 |
Figure 3.A consensus tree generated using the neighbour joining method on the Jukes-Cantor model with 1000 bootstrap replicates in the program MEGA 6.0 for the 28S rRNA gene sequence (5′ end; 543 characters) from 51 different rusts including myrtle rust. Only bootstrap values greater than or equal 50% of 1000 replicates are shown. The GenBank accession numbers of the DNA sequences are indicated with the species name. Outgroups (O) are Herpobasidium filicium and Ecocronartium muscicola.