Literature DB >> 26484270

Genome analysis of rice-blast fungus Magnaporthe oryzae field isolates from southern India.

Malali Gowda¹, Meghana D Shirke¹, H B Mahesh², Pinal Chandarana³, Anantharamanan Rajamani¹, Bharat B Chattoo³.

Abstract

The Indian subcontinent is the center of origin and diversity for rice (Oryza sativa L.). The O. sativa ssp. indica is a major food crop grown in India, which occupies the first and second position in area and production, respectively. Blast disease caused by Magnaporthe oryzae is a major constraint to rice production. Here, we report the analysis of genome architecture and sequence variation of two field isolates, B157 and MG01, of the blast fungus from southern India. The 40 Mb genome of B157 and 43 Mb genome of MG01 contained 11,344 and 11,733 predicted genes, respectively. Genomic comparisons unveiled a large set of SNPs and several isolate specific genes in the Indian blast isolates. Avr genes were analyzed in several sequenced Magnaporthe strains; this analysis revealed the presence of Avr-Pizt and Avr-Ace1 genes in all the sequenced isolates. Availability of whole genomes of field isolates from India will contribute to global efforts to understand genetic diversity of M. oryzae population and to track the emergence of virulent pathotypes.

Entities: CellLine Chemical Disease Species

Keywords: Genome comparison; Isolate specific genes; Magnaporthe; Next generation sequencing; Single nucleotide polymorphism

Year: 2015 PMID： 26484270 PMCID： PMC4583678 DOI： 10.1016/j.gdata.2015.06.018

Source DB: PubMed Journal: Genom Data ISSN： 2213-5960

Introduction

Rice (Oryza sativa) is the staple food for more than half of the world population. Rice blast disease caused by the Ascomycetes fungus Magnaporthe oryzae is a predominant biotic stress affecting rice production worldwide. The outbreak of wheat (Triticum aestivum) blast in Brazil in early 1990s is an important example of host shift and coevolution of this fungus in recent times [1]. The blast fungus is also known to infect other food crops such as finger millet, small millets, barley, and most of the growth stages in rice including leaf, stem, nodes, panicle and root [2], [3]. Thus, blast disease is a serious constraint in cereal crop production in India and at the global level. High genetic variability in M. oryzae isolates poses a major challenge to rice breeders and pathologists to control blast disease. The genome of M. oryzae strain 70-15 was the first to be sequenced among plant pathogenic fungi using Sanger sequencing method [4]. The 70-15 isolate was derived from a cross between isolates of rice and weeping lovegrass and further backcross to rice isolate [5]. Subsequently, several field isolates of blast have been sequenced using next generation sequencing (NGS). Field isolates from Japan (Ina168 and P131) and China (Y34) [6], [7] were sequenced using 454 sequencing. More recently two field isolates, FJ81278 and HN19311 from China have been sequenced using Illumina technology [8]. Interestingly, whole genome sequencing of multiple isolates has revealed over a mega-base pairs of novel genomic regions and hundreds of novel genes. This could be due to race evolution over a period of time by geographical separation, chromosomal variation and variability in repetitive elements [4], [7], [8]. Multiple clonal lineages of Magnaporthe are known to exist around various cropping zones in the world. The Indian subcontinent is a center of origin and diversity for Magnaporthe species complex, but we lack the information about genomic variability among M. oryzae isolates. To understand the genomic variation within field isolates of M. oryzae, we carried out whole genome sequencing of two Indian strains, B157 and MG01 using Illumina sequencing technology. B157 is commonly used virulent reference strain by several research groups in India for many years [9], [10] and MG01 is recently isolated virulent strain from HR12 cultivar. In addition, we used RNAseq to understand strand specific gene expression, which is not explored in pathogenic fungi. This is the first of its kind to compare blast fungal isolates in India at the genome level. This work will certainly be useful to pathologists and breeders to understand Magnaporthe virulence and improve blast resistance in rice.

Results

Genome assembly

The M. oryzae strain B157 was isolated during 1990s from Maruteru near Hyderabad, India and is a widely used standard strain in our laboratory experiments [9], [10]. MG01 strain was isolated in 2012 from Karnataka in India. The genomes of these field isolates were initially assembled using both de novo and reference approaches using Velvet algorithm. However, reference based assembly was preferred for further analysis since it yielded better assembly quality and resolution of repeat elements. The genome assembly and annotation statistics are summarized in Table 1. Iterative mapping, contig ordering and scaffolding further improved the quality of reference-based assembly (Supplementary Table S1, Supplementary Table S2). The reference-guided assembly analysis of B157 genome yielded 2508 scaffolds, N50 of 92 Kb and the largest scaffold of 489 Kb (Table 1). The reference-based assembly for MG01 comprised of 3060 scaffolds with N50 of 55 Kb and largest scaffold of 292 Kb. The genome size of 41 Mb and 43 Mb for B157 and MG01, respectively, was obtained from reference based assembly (Table 1).

Table 1

Genome assembly and annotation of M. oryzae strains, B157 and MG01.

	De novo based assembly		Reference based assembly
Isolate name	B157	MG01	B157	MG01
Illumina reads (millions)	35.83	38.81	35.83	38.81
Coverage (X)	89	97	89	97
No. of contigs	6815	7722	5364	6046
No. of scaffolds	3534	7303	2508	3060
Largest scaffold size (Kb)	649	324	489	292
N50 (Kb)	91.16	55.03	92.4	54.6
Assembly size (Mb)	38	40	41	43
% repeats	3.01	3.23	10.4	10.39
No. of predicted genes with Augustus	11,340	11,744	12,535	13,135
No. of predicted genes (> 200 bp)	–	–	11,334	11,733

Analysis of repetitive DNA sequences

Using de novo assembly, we obtained 3.01 Mb and 3.23 Mb of repeat elements for B157 and MG01, respectively (Table 1). Similar repeat content has been reported for other Magnaporthe isolates using de novo approaches [8]. However, the overall detection of repeat sequences in silico was dramatically increased in the reference-guided assembly, 4 Mb (10.4%) for B157 and 4.5 Mb (10.39%) for MG01 (Table 1). This repeat percentage is comparable to the first draft of Magnaporthe genome [4]. The predicted repeats in B157 and MG01 are largely composed of retrotransposons such as Pot2, MGR583, MAGGY and Pyret (Fig. 1). Pot2 copy number was higher in B157 (366 copies) and MG01 (396 copies) as compared to the laboratory strain 70-15 (272 copies). In contrast, Mg-SINE copy number is higher in 70-15 (172 copies) as compared to Indian isolates, B157 (65 copies) and MG01 (97 copies).

Fig. 1

Major repeat elements in the genomes of M. oryzae strains B157, MG01 and 70-15.

Gene prediction and annotation

The de novo gene prediction from Augustus tool resulted 11,340 and 11,744 genes in B157 and MG01, respectively. After discarding short coding sequences (< 200 bp), we obtained 11,334 and 11,733 genes in the genomes of B157 and MG01, respectively (Table 1; Supplementary Table S3, Supplementary Table S4). The average gene length was 1400 bp and protein length was 480 amino acids. To further validate de novo gene prediction, a total of 87,947 ESTs from NCBI were mapped to the coding sequences of 70-15, B157 and MG01 genes. A total of 64.8%, 63.1% and 63.1% ESTs from NCBI were mapped to 52.6%, 56.3% and 54.3% of coding sequence in 70-15, B157 and MG01 strains of M. oryzae, respectively, provided validation of de novo gene prediction. Out of the mapped ESTs, 20%, 18.8% and 19.1% coding sequences were mapped with one EST, while 80%, 81.2% and 80.9% of coding sequences were mapped by two or more ESTs in 70-15, B157 and MG01, respectively.

Strain specific genes in Magnaporthe

Annotated genes of B157 and MG01 were aligned against 12,827 genes of reference strain 70-15 genome using BLAST program. There were 489 and 596 genes found to have no hit in the genomes of B157 and MG01, respectively. These genes from B157 and MG01 were further aligned against the reference genome of 70-15 using FASTA tool [11]. This analysis resulted in 54, 73 and 134 isolate specific genes in B157, MG01 and 70-15, respectively. B157 and MG01 specific genes were compared with recently sequenced Magnaporthe field isolates, Y34 and P131 [6]. This comparison has resulted 17 isolates specific genes in B157 and 24 genes specific to MG01 (Fig. 2; Supplementary Table S5, Supplementary Table S6). About 44% of novel genes from MG01 were found to have expression evidence and functional annotation.

Fig. 2

Isolate specific genes from B157 and MG01 based on the reference (70-15) genome assembly.

Genome-wide variation in M. oryzae

There were 8650 and 10,797 ICVs in B157 and MG01, respectively (Fig. 3). We also observed that a large number of these variations occurred on chromosome 1, 2, 6 and 7. All possible inter-chromosomal translocations are shown in the innermost circle of Circos [12] (Fig. 3). All translocated regions were screened for M. oryzae repeat elements (Fig. 3). About 51.6% and 54.59% of rearrangements present across the seven chromosomes of B157 and MG01 genomes were due to repeats. These repeat elements were further analysed and found to be mainly composed of retroelements (40% in B157 and 39.53% in MG01) and DNA transposable elements (10.85% in B157 and 14.56% in MG01).

Fig. 3

Distribution of genes, SNPs and inter-chromosomal variations in Indian isolates (B157 and MG01) in comparison with 70-15 reference genome. The outermost circle shows the distribution of genes in MG01, and the second circle shows seven chromosomes of 70-15 with their sizes in 1 Kb intervals. The third circle shows the distribution of genes in B157, and the fourth and fifth circles show SNP distribution for MG01 and B157, respectively at 1 Kb intervals. The high SNP density regions are marked as peaks. The innermost circle shows possible translocation patterns in both B157 and MG01.

We used M. oryzae reference genome 70-15 to identify SNPs and short InDels in the genomes of Indian isolates, B157 and MG01. We identified 11,736 SNPs in B157 and 14,117 SNPs in MG01 genomes. In addition, we have also obtained 2.32 and 2.43 ratio of transitions (A ↔ G or C ↔ T) to transversions (A ↔ C or G ↔ T) Ts/Tv for B157 and MG01, respectively (Supplementary Fig. S4). In comparison to the reference Magnaporthe strain (70-15), we have obtained SNPs for genic regions, UTRs, coding regions, introns, and intergenic regions (Supplementary Fig. S2; Supplementary Table S7, Supplementary Table S8). We also screened synonymous and non-synonymous amino acid substitutions for the coding regions of annotated genes. This analysis revealed non-synonymous SNPs at exonic regions of 2423 and 2948 genes for B157 and MG01, respectively. In addition, SNPs in 151 genes from B157 and 155 genes from MG01 were found to have changes at the start and stop codons (gained and lost). Around 6260 and 5695 InDels were identified in B157 and MG01, respectively, by comparing with the genome of 70-15 (Supplementary Fig. S3, Supplementary Table S9, Supplementary Table S10).

Analysis of host specificity factors and avirulence genes in Magnaporthe

We compared host specificity factors and Avr genes in all sequenced M. oryzae strains including 70-15, B157 and MG01 (Table 2). The host specificity factor PWL1 is absent in all sequenced isolates expect MG01 whereas PWL2 is absent in B157, P131 and 4091-59-8. PWL3 and PWL4 are present in all isolates of M. oryzae except Indian isolates, B157 and MG01 (Table 2). The Avirulence gene, Avr-CO39 is absent in all strains except 4091-59-8 (Table 2). The avirulence factor (Avr-ACE1) is belongs to PKS/NRPS family, which present in all sequenced strains of Magnaporthe. We validated in silico analyzed host specificity and Avr genes by PCR amplification using gene-specific primers (Supplementary Table S14). This validation confirmed the accuracy of prediction of the presence and absence of the aforementioned Avr genes in B157 and MG01 (Supplementary Fig. S5).

Table 2

Host specificity factors and avirulence (Avr) genes in sequenced strains of M. oryzae.

M.oryzae genes	M. oryzae strains
B157	MG01	70-15	HN19311	FJ81278	Y34	P131	KJ201	4091-59-8
Host specificity factors
PWL1	−	+	−	−	−	−	−	−	−
PWL2	−	+	+	+	+	+	−	+	−
PWL3	−	−	+	+	+	+	+	+	+
PWL4	−	−	+	+	+	+	+	+	+

Avirulence (Avr) genes
Avr-CO39	−	−	−	−	−	−	−	−	+
Avr-Piz-t	+	+	+	+	+	+	+	+	+
Avr-Pia	−	+	−	−	−	+	+	−	−
Avr-Pii	−	−	−	−	−	−	−	−	−
Avr-Pik	+	−	+	+	+	+	−	+	−
Avr-Pita	−	+	+	+	+	+	+	+	+
Avr ACE1	+	+	+	+	+	+	+	+	+

Strand specific RNA-seq analysis

The strand specific RNA-seq analysis revealed 8338 (71%) genes expressed in MG01 (Fig. 4A, Supplementary Table S11). Expressed transcripts were further classified into sense [7405 genes (89%); Fig. 4B], antisense [338 genes (4.05%); Fig. 4C] and sense/antisense [595 genes (7.15%); Fig. 4D]. Each of the main classes was sub-categorized as low (14.04%), medium (72.27%) and highly (13.68%) expressed genes based on the FPKM values. We identified 835 genes that potentially generate overlapping transcripts of which 575 genes have overlapping sense and antisense transcripts (Table 3). To validate our ssRNA-seq results, we have used MPSS and RL-SAGE data from previous study [13]. This validation analysis showed, 17% of genes (58 out of 338 genes) have evidence for antisense transcription from MPSS and RL-SAGE data and 12% of genes (75 out of 595 genes) have MPSS evidence for producing sense and antisense transcripts (Table 3).

Fig. 4

Sense and antisense transcript expression support for annotated genes of MG01. A, the overall distribution of genes validated by strand-specific RNA-sequence data. A1, number of genes with no RNA-seq evidence; A2, number of genes with RNA-seq evidence. A2 is subdivided into sense (A2.1), antisense (A2.2) and both sense and antisense (A2.3). All expressed genes were classified based on their strand being expressed into three main classes as sense (B), antisense (C) or both (D) (sense and antisense). Each main class was subcategorized into three subclasses based on the FPKM value (Low, medium and high). The FPKM value was categorized as low (if FPKM ≤ 10), medium (if FPKM > 10 to ≤ 200) or high (> 200).

Table 3

Strand specific RNA (ssRNA) seq analysis for MG01 strain.

Gene feature	B157	MG01
Total no. of genes predicted (> 200 bp)	11,334	11,733
No. of genes having RNA-seq evidence	8672	8338
a) No. of genes having sense expression	–	7404
b) No. of genes having antisense expression	–	338 (58a)
c) No. of genes having sense and antisense expression	–	595 (75b)
No. of genes with no RNA-seq evidence	2662	3395

Genes are confirmed by MPSS and RL-SAGE data [13].

Genes confirmed by MPSS data [13].

Comparative analyses in Ascomycota fungi

We compared the core set of Magnaporthe proteins (10,778 genes) with all sequenced non-pathogenic Ascomycetes fungi including Neurospora crassa, Aspergillus niger, Aspergillus clavatus, Aspergillus oryzae, Aspergillus flavus, Aspergillus nidulans and Aspergillus terreus. Magnaporthe protein sequences that have no homology in non-pathogenic fungi were scanned for Pfam domain. From this analysis, we identified a few pathogenicity genes; Enterotoxin A (PF01375), GSP synthase (PF03738) and YgcI-YcgG (PF08892), which are only present in Magnaporthe and absent in non-pathogenic Ascomycetous fungi such as N. crassa, A. niger, A. clavatus, A. oryzae, A. flavus, A. nidulans and A. terreus. Enterotoxin A gene cluster consists of seven genes in 70-15 and five genes in other sequenced strains including B157, MG01, P131, Y34, KJ201, 4091-5-8, FJ81278 and HN19311 (Table 4). Among these gene clusters, enterotoxin A cluster (MGG_05465, MGG_00390, MGG_16989) was expressed in mycelial stage in MG01 isolate (Table 4). However, other plant pathogenic fungi were found to have a single copy of enterotoxin A gene including Phaeosphaeria nodorum, Glomerella graminicola, Colletotrichum higginsianum and Verticillium dahliae (Table 4). We also looked for enterotoxin A domain containing genes in non-pathogenic Ascomycetous fungi. We were unable to find any homologs for enterotoxin A gene in non-pathogenic fungi including N. crassa, A. niger, A. clavatus, A. oryzae, A. flavus, A. nidulans and A. terreus.

Table 4

Enterotoxin-A domain encoding genes in pathogenic fungi including the sequenced M. oryzae strains.

Pathogenic fungi	Host	No. of genes
P. nodorum	Plant	1
G. graminicola	Plant	1
C. higginsianum	Plant	1
V. dahliae	Plant	1
M. oryzae strains
70-15	Plant	7
B157, MG01, P131, Y34, FJ81278, HN19311, 4091-5-8,KJ201	5
M. acridum	Animal	7
M. anisopliae	Animal	16
C. militaris	Animal	4

Discussion

The comparative analysis of pathogenic field isolates of Magnaporthe from different locations of the world will help in understanding the fungal virulence spectrum. With advent of next generation sequencing technologies, now it is possible to sequence multiple genomes of Magnaporthe species. In this study we sequenced two field isolates (B157 and MG01) of Magnaporthe from different regions of southern India. The genome size for B157 and MG01 was slightly larger than 70-15. Over 50% coding sequences in B157 and MG01 strains have EST evidences. To understand genomic variations and isolate specific gene content we compared the predicted gene sets of Magnaporthe isolates. Comparisons of the genes of two field isolates with 70-15 and two recently published isolates revealed 54 and 73 isolates specific genes in B157 and MG01, respectively, as compared to 70-15. There were 17 and 24 isolate specific genes in B157 and MG01 when compared with Y34 and P131 [6], respectively. Majority of the isolate specific genes did not have annotations, however 47% of isolate specific genes from MG01 showed the RNA seq expression evidence. In general, Avr-ACE1 and Avr-Pizt genes were having orthologs in all the sequenced isolates. We surveyed for SNPs and InDels in the coding regions of Avr genes in the sequenced genomes of the rice blast isolates. However, we could not find any non-synonymous mutation in any of the cloned host specificity and Avr genes. This indicates that these genes are likely to be important for survival of the blast fungus in nature. Interestingly MG01 genome contains more host specificity factors and Avr genes (PWL1, PWL2, Avr-Pita, Avr ACE1, Avr-Pizt and Avr-Pia) as compared to B157 (Avr-ACE1, AvrPiz-t and Avr-Pik) (Table 2). Repetitive elements have played an important role in Magnaporthe genome evolution. Although individual repeat classes varied in copy numbers between these two isolates and overall distribution of repeat content remain similar (Fig. 1; Table 1). The percentage of majority of repeats elements like MAGGY, PYRET and POT2 was slightly higher in the genomes of these isolates. Copy number of Mg-SINE was found to be higher in the 70-15 genome as compared to field isolates (Fig. 1). A large number of genome-wide translocations were found to be associated with retrotransposons (40% in B157 and 39.53% in MG01) and transposons (11% in B157 and 15% in MG01). Absence of Avr-CO39 in rice isolates indicated that this gene must have been lost due to accumulation of mutations and transpositions over a period of time [14]. Insertion of repeats in Avr genes has been shown to be associated with gain of virulence. It has been shown that insertion of Pot3 element in the promoter of Avr-Pita has led to gain in virulence towards rice varieties carrying R gene, Pi-ta [15]. Thus analysis of repeat elements and their insertion sites in the genome will help to clarify the emergence of virulent Magnaporthe strains. We hypothesize that the genomic differences among fungal isolates may be due to variation in environmental factors, host factors, mating types, and other micro variations such as SNPs and InDels, chromosomal abnormalities and repetitive DNA elements [4], [7], [8]. We could observe about one SNP per 3448 nts and 2866 nts in the genomes of B157 and MG01, respectively. We identified 11,736 SNPs in B157 and 14,117 SNPs in MG01 genomes. Non-synonymous mutations were much higher as compared to synonymous substitutions in MG01 and B157 isolates (Supplementary Table S6, Supplementary Table S7). Among these, 151 genes from B157 and 155 genes from MG01 have gained start or stop codons due to non-synonymous SNPs, which will have a significant effect on gene function. The larger genome size, higher percentage of repeat elements and non-synonymous mutations indicate the adaptive evolution of these isolates under field conditions. Natural antisense transcripts have been identified in several fungi including Saccharomyces cerevisiae [16], Candida albicans [17], A. flavus [18], Cryptococcus neoformans [19] and Ustilago maydis [20], [21]. However, natural antisense transcripts and their role in pathogenicity have not been studied in fungal pathogens. For this purpose, we adopted strand orientation based RNA sequencing using Illumina HiSeq chemistry. In this study we identified sense and antisense transcripts in Magnaporthe isolate, MG01 for several genes including MGG_09706 (aerobactin siderophore biosynthesis protein; Supplementary Fig. S6A) and MGG_07705 (acyl-CoA ligase; Supplementary Fig. S6B), and few pathogenicity genes like beta-glucanase (MGG_06493), serine/threonine protein kinase (MGG_03207) and endoglucanase-4 (MGG_08020). This is the first report of cataloguing antisense transcription across Magnaporthe genome, which will shed light on role of antisense gene regulation in blast disease biology. We compared Magnaporthe genomes with non-pathogenic Ascomycetes fungi (N. crassa, A. niger, A. clavatus, A. oryzae, A. flavus, A. nidulans and A. terreus). This comparison revealed the exclusive presence of pathogenicity genes like Enterotoxin A (PF01375), GSP synthase (PF03738) and YgcI-YcgG (PF08892) in Magnaporthe isolates. Other genes including transcriptional regulators including MGG_12865 (HOX7), MGG_11346 (CDTF1) and MGG_07218 found only in Magnaporthe and absent in all non-pathogenic Ascomycetes fungi such as N. crassa, A. niger, A. clavatus, A. oryzae, A. flavus, A. nidulans and A. terreus (Supplementary Table S12). HOX family is the member of homeobox transcription factors, which are stage specific regulators in Magnaporthe developmental process and HOX7 is specifically involved appressoria formation [22]. CDTF1 is a transcriptional regulator involved in appressoria development [23] and MGG_07218 is hypothetical protein involved in host pathogenicity [24]. Interestingly, the homologs of these genes were identified in other pathogenic fungi including Fusarium, Colletotrichum, Verticillium and Gaeumannomyces graminis, V. dahliae, Glomerella cingulata, Colletotrichum graminicola and C. higginsianum. The PTH11 (MGG_0587), a trans-membrane protein is absent in majority of non-pathogenic Ascomycetes except A. clavatus (Supplementary Table S13). PTH1 is known to involve in appressoria development in M. oryzae [25]. The Con7p (MGG_05287) is a transcription factor necessary for appressoria development and in plant establishment of Magnaporthe [26] and this gene is absent in A. clavatus, A. oryzae and A. flavus (Supplementary Table S13). Tps1 affects virulence-associated gene expression by modulation of a set of NADP-dependent GATA factor/Nmr transcriptional regulators (NMR1: MGG_10017, NMR2: MGG_02860 and NMR3: MGG_09705), which implicated in both fungal development and pathogenicity [27]. NMR genes are absent in non-pathogens including N. crassa, A. niger, A. clavatus and A. terreus (Supplementary Table S13). The exclusive presence of pathogenic gene clusters in Magnaporthe indicates the selective retention of these genes in pathogenic strains and loss in non-pathogenic Ascomycetes counterparts as previously reported by [28] range of pathogenic Ascomycetes fungi. In summary, we have sequenced and analyzed genomes of two field isolates of Magnaporthe from Indian subcontinent where rice is grown in large areas. The incidence of rice blast disease is also very high in India. The variability of host specificity genes, avirulence genes and other pathogenicity genes from this study will be valuable resource for functional genomic studies in M. oryzae. The availability of genomes of host plant (HR12; NCBI accession number AZTA00000000) and its corresponding virulent M. oryzae isolates (B157 and MG01) from India will accelerate study of host-pathogen interaction and development of strategies for resistance breeding in rice.

Materials and methods

M. oryzae field isolates

Monoconidial isolation approach was followed to obtain pure cultures of B157 and MG01. These strains were isolated from the rice cultivar HR12 belonging to indica subspecies. HR12 is widely used blast susceptible check cultivar in India, since it is extremely sensitive to blast pathogen. Oatmeal agar medium was used for growth and maintenance of M. oryzae isolates. These cultures were stored on filter paper disks at − 20 °C for long-term storage.

Nucleic acid isolation

M. oryzae strains were grown in liquid YEG medium (Glucose 1 g, yeast extract 0.2 g in 100 ml of distilled water) for 3-days in dark at 200 rpm at 28 °C. Mycelia were filtered and genomic DNA was extracted using GenElute plant genomic Miniprep kit (Sigma Cat. No. G2N70-1KT). Total RNA was isolated from 3-days old mycelia on YEG liquid medium from MG01 using TRIzol method [13].

DNA library construction and Illumina sequencing

Libraries were prepared with 1 μg of DNA using TrueSeq DNA sample preparation kit (Illumina Cat. No. FC-121-2001). The genomic DNA (1 μg) was fragmented using ultrasonicator (S220: Covaris, USA) to obtain an average of 350 bp fragments. This was followed by end repair, A-tailing, ligation with Illumina adapters, size selection and PCR amplification. The prepared libraries were quantified using Bioanalyzer and quantitative PCR (qPCR). The clusters were generated using cBOT and paired-end sequencing was carried out with Illumina HiSeq 1000 instrument at Center for Cellular and Molecular Platforms (CCAMP), Bangalore, India. Illumina paired-end reads were quality filtered using FastX tool kit (version 0.0.13.2). Adapter sequences were clipped using Cutadapt version 1.2.1 [29]. Then paired reads having at least 80% of bases with quality score greater than Q30 (Q score is quality score specified by Illumina, which indicates probability of errors in base calling. Q30 means a probability of incorrect base call is in 1 in 1000) were chosen for further analysis. We attempted both de novo and reference based assembly of genomes using Velvet 1.2.09, however reference based assembly was used for further analysis since it yielded better assembly [30]. M. oryzae 70-15 was used as a reference strain for reference based assembly. The whole genome assembly is available at NCBI/DDBJ/EMBL with the accession AXDJ01000000 for B157 and AYPX01000000 for MG01. Contig ordering, gap filling and re-scaffolding was performed using various integrated tools in order to improve assembly quality. We used the ABACAS tool for contig ordering with reference [31]. The Iterative Mapping and Assembly for Gap Elimination (IMAGE) [32] method was used to fill the gaps in the assembly. The pre-assembled contigs were merged back to scaffolds after successful completion of iterative assembly using SSPACE (SSAKE-based Scaffolding of Pre-Assembled Contigs after Extension) [33]. Genes were predicted using Augustus [34] from reference-based assembly of B157 and MG01 strains of M. oryzae. The predicted genes were aligned to 70-15 protein sequence using BLAST based homology with e-value cutoff of ≤ 10− 5. To validate the predicted genes by Augustus, ESTs of M. oryzae were mapped onto 70-15, B157 and MG01. EST data sets were downloaded from NCBI and mapped onto coding sequences of predicted genes of B157 and MG01. De novo gene prediction was performed using Augustus [34] from Indian isolates, B157 and MG01 whereas 70-15 genes were predicted by FGENESH [35]. Thus, in order to identify isolate specific genes, FASTA v36 program [11] was used for comparison of genes to genome to extract isolate specific genes. Annot8r tool [36] was used for annotation of genes predicted by Augustus.

Repetitive DNA prediction

Repeat detection and masking was carried out using RepeatMasker 4.0.2 tool [37] Magnaporthe species repeat library was used for repeat prediction. The predicted repeats were further classified into major classes/families of elements.

Variant analysis

In order to detect possible inter chromosomal variations (ICVs), sequencing reads were mapped against the 70-15 reference genome. Anomalous read pairs (ARPs) (mapping distance between the read pairs is beyond the actual sequencing library insert size) were extracted from mapping file. To avoid any false positive hit, we removed duplicate reads prior to structural variation detection. ARPs mapping to different chromosomes were extracted and checked for ICVs. Alignment obtained from reference mapping tool was used for short variants detection using SAM tool version 0.1.19 [38]. Both single nucleotide variations (SNVs) and short InDels were detected for B157 and MG01 in comparison with reference sequence of 70-15. Predicted variants were filtered based on mapping quality greater than 25 read, read depths greater than 10 and strand level evidence (at least one read from both the direction). These variants were further annotated using SnpEff tool [39] based on their chromosomal location and biological effects such as synonymous/non-synonymous SNPs, upstream/downstream, UTRs, intergenic etc. Also the transition and transversion (Ts/Tv) ratio was calculated for single nucleotide variations.

Analysis of host specificity and Avr genes in M. oryzae

Nucleotide sequences for host specificity genes [PWL1 (U36923.1), PWL2 (U26313.1), PWL3 (1045533), PWL4 (1045535)] and avirulence (Avr) genes [AvrPita (12642087), Avr-CO39 (27450408), AvrACE1 (47109413), AvrPiz-t (194293523), Avr-Pia (237858322), Avr-Pii (237858324) and Avr-Pik (237858326)] were downloaded from NCBI database. BLASTn was performed to identify Avr genes in the genomes of M. oryzae strains, N. crassa and A. niger. Genomic sequence of other M. oryzae strains including 70-15, Y34 (AHZS00000000), P131 (AHZT00000000), FJ81278 (ATNU00000000) and HN19311 (ATNT00000000) were downloaded from NCBI database.

PCR validation of host specificity and Avr genes

PCR primers were designed for 5′ and 3′ untranslated region (UTR) of AVR genes (Supplementary Table S14). Ten milliliters PCR reaction was set up with, 20 ng of genomic DNA, 1 μl of 10 × buffer, 0.4 μl of 20 mM of dNTP mix, 0.5 μl of 10 mM of each forward and reverse primer, 0.75 Units of Dream Taq DNA polymerase (Fermentas, Cat. No EP0712, PA, USA). PCR program with initial denaturation at 95 °C for 5 min, 30 cycles of 95 °C for 15 s, Ta (variable as per each primer) for 30 s, 72 °C for 1 min and final extension at 72 °C for 5 min was set up for 30 PCR cycles. PCR product was separated on 2% agarose gels.

Strand specific RNA (ssRNA) library preparation and sequencing

About 1μg of total RNA from MG01 strain was used for mRNA purification using Illumina TruSeq RNA sample preparation kit v2 (Cat. No. RS-122-2001). mRNA was fragmented (100–150 bases) by chemical method. The first strand cDNA was synthesized using dNTPs and followed by second strand cDNA synthesis using dUTPs along with dNTPs, random hexamers and reverse transcriptase. Ampure beads were used to select double stranded cDNA template, end repaired, adenylated by adding single ‘A’ base to end of the cDNA and adapters were ligated. Fifteen cycles of PCR was performed to enrich the library. Bioanalyzer was used to validate RNA library and paired end sequencing (2 × 100 cycles) was performed using Illumina HiSeq 1000 at C-CAMP, Bangalore, India.

Strand specific RNA seq (ssRNAseq) data analysis

Genes predicted from Augustus pipeline was used for expression validation. The gene sequence was indexed using Bowtie2 [40] tool. We used about 35 million paired ssRNAseq reads (2 × 100 nts) from mycelial tissue of MG01 strain. The spliced read mapper TopHat2 [41] was used to map ssRNA-seq reads to the indexed gene set. The SAM file (TopHat2 output) was processed further to remove reads with secondary alignments. Uniquely aligned reads were assembled into transcripts and estimated relative abundance of transcripts for genes using Cufflinks [42]. The transcript abundance per gene was expressed in fragments per kilobase of exon model per million mapped fragments (FPKM). The output of Cufflinks was processed manually to count the genes with sense and antisense expression. We used in-house Perl script to identify the genes for overlapping sense and antisense transcripts. Genes with overlapping transcripts from both sense and antisense strands are annotated using Annot8r programme.

Pathogenicity genes analyses in M. oryzae

Pathogenic (M. oryzae) and non-pathogenic Ascomycetous fungi were compared for pathogenicity genes using whole genome annotation data. Protein sequences of the reference stain 70-15 (assembly version MG8) and Indian strains, B157 and MG01 were compared to identify the core set of genes in M. oryzae. Protein sequences of N. crassa OR74A, A. clavatus, A. oryzae, A. flavus, A. nidulans and A. terreus were retrieved from Broad Institute fungal database (http://www.broadinstitute.org) and A. niger CBS 513.88 sequences retrieved from KEGG database (http://www.genome.jp/kegg). BLASTp analysis was performed for M. oryzae core proteins against the non-pathogenic Ascomycetous proteins. The significant homology (e-10− 5 and greater than 55% query sequence length coverage) was used for homology alignment. Additionally, we also used a gradient of percent identity (I > 30%, I > 50% and I > 70%) to check the variation in gene numbers. These sequences were further classified into known, predicted and hypothetical proteins based on the information available at fungal database at Broad Institute (http://www.broadinstitute.org). Further, hypothetical proteins were scanned through Pfam domain database to identify conserved domains from M. oryzae, which are present only in pathogenic fungi and absent in non-pathogenic fungi. M. oryzae genes that are involved in pathogenesis were obtained from PHIbase V-3.4 database [43]. Reciprocal BLASTp was performed to identify homologous genes in sequenced strains of M. oryzae, N. crassa, A. niger, A. clavatus, A. oryzae, A. flavus, A. nidulans and A. terreus. Genes were filtered based on percent identity and query coverage (PHIbase analysis parameter: identity ≥ 30% and query coverage ≥ 50%). The following are the supplementary data related to this article. Supplementary Fig. S1. Genome assembly improvement using iterative approaches. Supplementary Fig. S2. A, Distribution of annotated genes and SNPs in the Indian isolates (B157 and MG01) with reference to 70-15 and B, ratio of synonymous to non-synonymous substitutions in B157 and MG01. Supplementary Fig. S3. Overall distribution of InDels across genomes of B157 and MG01. Supplementary Fig. S4. Transition and transversion (Ts/Tv) pattern of predicted variants. Supplementary Fig. S5. PCR validation of AVR genes in the Indian isolates B157 and MG01. Supplementary Fig. S6. A, Sense and antisense transcripts for MGG_09706 gene (aerobactin siderophore biosynthesis protein IucB) in MG01. The forward read is shown as sense (red) and the reverse read depicts antisense (blue) transcripts. B, Sense and antisense transcripts for MGG_07705 (acyl-CoA ligase). The forward tag represents sense (in red) and reverse read the antisense (blue) transcripts.

Supplementary Table S1

Assembly statistics during pre- and post-scaffolding.

Supplementary Table S2

Iterative mapping and assembly for gap elimination analysis.

Supplementary Table S3

Functional annotation of B157 genes.

Supplementary Table S4

Functional annotation of MG01 genes.

Supplementary Table S5

Isolate specific genes in B157.

Supplementary Table S6

Isolate specific genes in MG01.

Supplementary Table S7

SNP analysis for annotated genes in B157.

Supplementary Table S8

SNP analysis for annotated genes in MG01.

Supplementary Table S9

InDel analysis for B157 annotated genes.

Supplementary Table S10

InDel analysis for MG01 annotated genes.

Supplementary Table S11

Genes expressed at the mycelial stage in MG01 based on strand specific RNAseq.

Supplementary Table S12

M. oryzae specific genes absent in non-pathogenic Ascomycetes fungi.

Supplementary Table S13

Pathogenicity related genes in other plant-pathogenic Ascomycetes.

Supplementary Table S14

Primers used for PCR analysis.

37 in total

1. AVR1-CO39 is a predominant locus governing the broad avirulence of Magnaporthe oryzae 2539 on cultivated rice (Oryza sativa L.).

Authors: Yan Zheng; Wenhui Zheng; Fucheng Lin; Ying Zhang; Yunping Yi; Baohua Wang; Guodong Lu; Zonghua Wang; Weiren Wu
Journal: Mol Plant Microbe Interact Date: 2011-01 Impact factor: 4.171

2. Fast gapped-read alignment with Bowtie 2.

Authors: Ben Langmead; Steven L Salzberg
Journal: Nat Methods Date: 2012-03-04 Impact factor: 28.547

3. Improved tools for biological sequence comparison.

Authors: W R Pearson; D J Lipman
Journal: Proc Natl Acad Sci U S A Date: 1988-04 Impact factor: 11.205

4. The genome sequence of the rice blast fungus Magnaporthe grisea.

Authors: Ralph A Dean; Nicholas J Talbot; Daniel J Ebbole; Mark L Farman; Thomas K Mitchell; Marc J Orbach; Michael Thon; Resham Kulkarni; Jin-Rong Xu; Huaqin Pan; Nick D Read; Yong-Hwan Lee; Ignazio Carbone; Doug Brown; Yeon Yee Oh; Nicole Donofrio; Jun Seop Jeong; Darren M Soanes; Slavica Djonovic; Elena Kolomiets; Cathryn Rehmeyer; Weixi Li; Michael Harding; Soonok Kim; Marc-Henri Lebrun; Heidi Bohnert; Sean Coughlan; Jonathan Butler; Sarah Calvo; Li-Jun Ma; Robert Nicol; Seth Purcell; Chad Nusbaum; James E Galagan; Bruce W Birren
Journal: Nature Date: 2005-04-21 Impact factor: 49.962

5. Gain of virulence caused by insertion of a Pot3 transposon in a Magnaporthe grisea avirulence gene.

Authors: S Kang; M H Lebrun; L Farrall; B Valent
Journal: Mol Plant Microbe Interact Date: 2001-05 Impact factor: 4.171

6. Homeobox transcription factors are required for conidiation and appressorium development in the rice blast fungus Magnaporthe oryzae.

Authors: Seryun Kim; Sook-Young Park; Kyoung Su Kim; Hee-Sool Rho; Myoung-Hwan Chi; Jaehyuk Choi; Jongsun Park; Sunghyung Kong; Jaejin Park; Jaeduk Goh; Yong-Hwan Lee
Journal: PLoS Genet Date: 2009-12-04 Impact factor: 5.917

7. Strand-specific RNA sequencing reveals extensive regulated long antisense transcripts that are conserved across yeast species.

Authors: Moran Yassour; Jenna Pfiffner; Joshua Z Levin; Xian Adiconis; Andreas Gnirke; Chad Nusbaum; Dawn-Anne Thompson; Nir Friedman; Aviv Regev
Journal: Genome Biol Date: 2010-08-26 Impact factor: 13.583

8. Association genetics reveals three novel avirulence genes from the rice blast fungal pathogen Magnaporthe oryzae.

Authors: Kentaro Yoshida; Hiromasa Saitoh; Shizuko Fujisawa; Hiroyuki Kanzaki; Hideo Matsumura; Kakoto Yoshida; Yukio Tosa; Izumi Chuma; Yoshitaka Takano; Joe Win; Sophien Kamoun; Ryohei Terauchi
Journal: Plant Cell Date: 2009-05-19 Impact factor: 11.277

9. Pot2, an inverted repeat transposon from the rice blast fungus Magnaporthe grisea.

Authors: P Kachroo; S A Leong; B B Chattoo
Journal: Mol Gen Genet Date: 1994-11-01

10. Deep and comparative analysis of the mycelium and appressorium transcriptomes of Magnaporthe grisea using MPSS, RL-SAGE, and oligoarray methods.

Authors: Malali Gowda; R C Venu; Mohan B Raghupathy; Kan Nobuta; Huameng Li; Rod Wing; Eric Stahlberg; Sean Couglan; Christian D Haudenschild; Ralph Dean; Baek-Hie Nahm; Blake C Meyers; Guo-Liang Wang
Journal: BMC Genomics Date: 2006-12-08 Impact factor: 3.969

8 in total

1. Comparative genomics of rice false smut fungi Ustilaginoidea virens Uv-Gvt strain from India reveals genetic diversity and phylogenetic divergence.

Authors: Devanna Pramesh; Muthukapalli K Prasannakumar; Kondarajanahally M Muniraju; H B Mahesh; H D Pushpa; Channappa Manjunatha; Alase Saddamhusen; E Chidanandappa; Manoj K Yadav; Masalavada K Kumara; Huded Sharanabasav; B S Rohith; Gaurab Banerjee; Anupam J Das
Journal: 3 Biotech Date: 2020-07-19 Impact factor: 2.406

2. Indica rice genome assembly, annotation and mining of blast disease resistance genes.

Authors: H B Mahesh; Meghana Deepak Shirke; Siddarth Singh; Anantharamanan Rajamani; Shailaja Hittalmani; Guo-Liang Wang; Malali Gowda
Journal: BMC Genomics Date: 2016-03-16 Impact factor: 3.969

3. Population genomic analysis of the rice blast fungus reveals specific events associated with expansion of three main clades.

Authors: Zhenhui Zhong; Meilian Chen; Lianyu Lin; Yijuan Han; Jiandong Bao; Wei Tang; Lili Lin; Yahong Lin; Rewish Somai; Lin Lu; Wenjing Zhang; Jian Chen; Yonghe Hong; Xiaofeng Chen; Baohua Wang; Wei-Chiang Shen; Guodong Lu; Justice Norvienyeku; Daniel J Ebbole; Zonghua Wang
Journal: ISME J Date: 2018-03-22 Impact factor: 10.302

Review 4. Understanding the Dynamics of Blast Resistance in Rice-Magnaporthe oryzae Interactions.

Authors: Basavantraya N Devanna; Priyanka Jain; Amolkumar U Solanke; Alok Das; Shallu Thakur; Pankaj K Singh; Mandeep Kumari; Himanshu Dubey; Rajdeep Jaswal; Deepak Pawar; Ritu Kapoor; Jyoti Singh; Kirti Arora; Banita Kumari Saklani; Chandrappa AnilKumar; Sheshu Madhav Maganti; Humira Sonah; Rupesh Deshmukh; Rajeev Rathour; Tilak Raj Sharma
Journal: J Fungi (Basel) Date: 2022-05-30

5. De novo genome assembly and annotation of rice sheath rot fungus Sarocladium oryzae reveals genes involved in Helvolic acid and Cerulenin biosynthesis pathways.

Authors: Shailaja Hittalmani; H B Mahesh; Channappa Mahadevaiah; Mothukapalli Krishnareddy Prasannakumar
Journal: BMC Genomics Date: 2016-03-31 Impact factor: 3.969

6. Genome-Wide Comparison of Magnaporthe Species Reveals a Host-Specific Pattern of Secretory Proteins and Transposable Elements.

Authors: Meghana Deepak Shirke; H B Mahesh; Malali Gowda
Journal: PLoS One Date: 2016-09-22 Impact factor: 3.240

7. Evolution of the Genes Encoding Effector Candidates Within Multiple Pathotypes of Magnaporthe oryzae.

Authors: Ki-Tae Kim; Jaeho Ko; Hyeunjeong Song; Gobong Choi; Hyunbin Kim; Jongbum Jeon; Kyeongchae Cheong; Seogchan Kang; Yong-Hwan Lee
Journal: Front Microbiol Date: 2019-11-06 Impact factor: 5.640

8. Comparative Genomics and Gene Pool Analysis Reveal the Decrease of Genome Diversity and Gene Number in Rice Blast Fungi by Stable Adaption with Rice.

Authors: Qi Wu; Yi Wang; Li-Na Liu; Kai Shi; Cheng-Yun Li
Journal: J Fungi (Basel) Date: 2021-12-22

8 in total