Literature DB >> 22384341

Rediscovery by Whole Genome Sequencing: Classical Mutations and Genome Polymorphisms in Neurospora crassa.

Kevin McCluskey, Aric E Wiest, Igor V Grigoriev, Anna Lipzen, Joel Martin, Wendy Schackwitz, Scott E Baker.

Abstract

Classical forward genetics has been foundational to modern biology, and has been the paradigm for characterizing the role of genes in shaping phenotypes for decades. In recent years, reverse genetics has been used to identify the functions of genes, via the intentional introduction of variation and subsequent evaluation in physiological, molecular, and even population contexts. These approaches are complementary and whole genome analysis serves as a bridge between the two. We report in this article the whole genome sequencing of eighteen classical mutant strains of Neurospora crassa and the putative identification of the mutations associated with corresponding mutant phenotypes. Although some strains carry multiple unique nonsynonymous, nonsense, or frameshift mutations, the combined power of limiting the scope of the search based on genetic markers and of using a comparative analysis among the eighteen genomes provides strong support for the association between mutation and phenotype. For ten of the mutants, the mutant phenotype is recapitulated in classical or gene deletion mutants in Neurospora or other filamentous fungi. From thirteen to 137 nonsense mutations are present in each strain and indel sizes are shown to be highly skewed in gene coding sequence. Significant additional genetic variation was found in the eighteen mutant strains, and this variability defines multiple alleles of many genes. These alleles may be useful in further genetic and molecular analysis of known and yet-to-be-discovered functions and they invite new interpretations of molecular and genetic interactions in classical mutant strains.

Entities: Chemical Disease Gene Mutation Species

Keywords: SNP; classical mutant; comparative genomics; indel; single nucleotide polymorphism

Year: 2011 PMID： 22384341 PMCID： PMC3276140 DOI： 10.1534/g3.111.000307

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Neurospora crassa has been used as a genetic model system since the early 1930s (Lindegren ) because it presented significant advantages for genetics research. It grows on chemically defined culture medium, has a haploid vegetative stage, can complete a full sexual generation in under a month, and produces ordered ascospores (the haploid products of meiosis), facilitating genetic analysis (Davis and Perkins 2002). Neurospora was quickly adopted by a number of laboratories and is famous for the demonstration of the “one-gene, one-enzyme hypothesis” (Beadle and Tatum 1941), which showed that classical mutations that could be followed in a genetic cross were associated with biochemical traits that could be characterized chemically. By the time the Fungal Genetics Stock Center (FGSC) was established in 1960 it was estimated that hundreds of genetic loci were being investigated. The classical genetic map of N. crassa now comprises over 1,000 phenotypic markers and an additional several hundred genetic markers such as telomeres, centromeres, the nucleolus organizer, and a variety of translocations, inversions and duplications (Perkins ). Because of its many advantages and the relative ease with which one could produce biochemical mutants, N. crassa was used for research into every aspect of genetics and biochemistry. N. crassa grows as a haploid filament that makes copious asexual spores in vegetative culture, and when compatible strains are paired, it can make either vegetative hetero-dikaryons or go into the sexual cycle. To facilitate allelism tests, strains that were vegetatively compatible and displayed high fertility were generated at Yale and at Oak Ridge National Laboratory, where Neurospora was utilized in extensive radiation biology studies (De Serres and Webber 1997). By 1963, much of the research community had adopted this so-called “Oak Ridge” lineage. Therefore, many mutants in the FGSC collection have a shared lineage (e.g., Perkins ). This lineage gave rise to the current wild type strains used in both the Neurospora genome sequencing project (Galagan ) and in the Neurospora functional genomics program (Colot ; Dunlap ). Additionally, a compendium of Neurospora genes, first published in 1982 (Perkins ) and updated in 2001 (Perkins ), is currently maintained as an electronic compendium online (http://bmbpcu36.leeds.ac.uk/~gen6ar/newgenelist/genes/gene_list.htm). Despite detailed genetic, physiological and biochemical characterization, many classical mutants remain anonymous at the level of the genome sequence. However, because of the extensive genetic mapping done with N. crassa, most classical markers are flanked by genes that have been identified to the level of DNA sequence. This makes it possible to exclude most of the genome from consideration when looking to associate a genetic marker with the open reading frame (ORF) responsible for the mutant phenotype in any given strain. Whether the approach is chromosome walking with cosmids that are mapped onto the genome sequence, or by gene sequencing, the high resolution N. crassa genetic map allows one to use flanking markers to delimit the search. This approach has been used at the FGSC to identify four temperature sensitive lethal mutations by gene complementation (Dieterle ; McCluskey ). Advances in genome sequencing technology (Hobert 2010) have made it possible to sequence the entire genome to identify individual mutations (Le Crom ; Pomraning ; Sarin ; Smith ). For Neurospora, this is complicated by the fact that not all classical genetic mutations have been crossed into the same genetic background as that of the reference genome strain. Eight of the strains sequenced in the current report were explicitly backcrossed into the reference genome prior to deposit into the FGSC collection. Ten were reported to have ‘mixed’ backgrounds. Neurospora is a cosmopolitan fungus and it has been isolated from locations all over the world (Turner ). While most genetic research is carried out with N. crassa, other species are commonly isolated from the environment and have unique characteristics. Neurospora intermedia is interfertile with both N. tetrasperma and N. crassa, and progeny can be recovered from crosses carried out in the laboratory. One such cross was carried out to allow a meiotic drive element, Sporekiller-2 (Sk-2), rarely identified in wild isolates of N. intermedia, to be studied in N. crassa (Turner and Perkins 1979). The FGSC collection holds and distributes a large number of morphological and developmental mutants. A subset of these for which the underlying nature of the gene defect remains unknown were chosen for this study (Table 1). In this article, we report the whole genome sequencing of seventeen classical mutant strains of N. crassa, and the putative identification of classical mutations in sixteen of them. An eighteenth strain that carries the Sk-2 meiotic drive element, originally identified in the related species N. intermedia and subsequently introgressed into N. crassa, was also sequenced. Overall, these mutant strains exhibit a wide range of sequence variability that is directly proportional to their being related to the reference genome strain. Much of the variation is shared among strains while other variation is strain-specific. Some regions of the genome show enrichment for unique variation suggesting that they are hot-spots for mutation. Insertions and deletions manifest a strong size bias associated with their presence in coding vs. non-coding DNA. The ability to compare among multiple strains and exclude shared variants aids in the association between neutral polymorphism and phenotypically relevant mutations.

Table 1

Strains of Neurospora crassa with their relevant genetic characteristics

FGSC No.	Gene	Contig	Flanking Markers	Range on Contig	No. of Genes in Range	Mutagen	Genetic Background^a	Reference	Genbank SRA #
106	Com	3	ace-2, ad-4	1565000 to 1827000	44	UV	SL3	(Perkins and Ishitani 1959)	SRP004312
305	amyc	1	ad-5, cen-I	2988000 to 3730000	166	?	SL3	(Atwood and Mukai 1954)	SRP004308
309	Ti	1	arg-3, T(39311)	3000000 to 3730000	163	X-rays	SL3	(Perkins 1959)	SRP004304
322	ty-1	3	tyr-1, cen-3	3000000 to 4500000	396	Spontaneous	M	(Horowitz et al. 1961)	SRP003564
821	ts	5	inl, cen-V	3000000 to 5000000	514	Spontaneous	M	(Nakamura and Egashira 1961)	SRP002767
1211	dot	1	ad-9, thi-1	3970000 to 5658063	415	Spontaneous	SL3	(Perkins et al. 1962)	SRP002761
1303	Fi	4	pyr-1, ace-4	432932 to 1547255	234	Spontaneous	M	(Perkins et al. 1962)	SRP002757
1363	smco-1	1	mat, rg-1	2500000 to 4500000	409	Mustard	L	(Garnjobst and Tatum 1967)	SRP004305
2261	do	7	nic-3, cen-7	>3475561	193	UV	SL2	(Perkins et al. 1962)	SRP004307
3114	Sk-2	3	cum, his-7	70000 to 2040000	407	Introgression	SL	(Turner and Perkins 1979)	SRP002747
3246	fs-n	1	mat, T(4637) al-1	1860000 to 1940000	18	Spontaneous	M	(Mylyk and Threlkeld 1974)	SRP002750
3562	Mb-1	7	nic-3	< 4255303	1048	UV	M	(Weijer and Vigfusson 1972)	SRP002749
3564	Mb-2	1	al-1, nit-1	7377187 to 8029920	180	UV	M	(Weijer and Vigfusson 1972)	SRP002748
3566	Mb-3	1	al-1, cen-1	>3970000	1433	UV	M	(Weijer and Vigfusson 1972)	SRP004280
3831	ff-1	2	un-20, aro-1	2897659 to 3571946	172	Spontaneous	M	(Tan and Ho 1970)	SRP004330
3921	tng	2	arg-5, pyr-4	513651 to 1551100	201	Spontaneous	SL2	(Springer and Yanofsky 1989)	SRP002768
7022	fld	4	∼arg-14, his-5	< 2792033	827	Spontaneous	M	(Perkins et al. 1962)	SRP003536
7035	Per-1	5	ilv-1, asn-1	1620310 to 5558912	1033	UV	SL3	(Howe and Benson 1974)	SRP002701

“SL” indicates the reference genome background (St. Lawrence), and the number indicates the number of backcrosses to a reference strain. “L” indicates the Lindegren background, and “M” indicates that there is a mixed background.

Materials and Methods

Neurospora strains are described in Table 1 and are available from the Fungal Genetics Stock Center. Genomic DNA was purified from mycelia grown in Vogels liquid medium, using a simple phenol/chloroform extraction technique (Lee ). Mycelia from mutants with limited vegetative growth was produced by macerating tissue in a glass tissue pulverizer under sterile conditions. DNA was randomly sheared into small fragments of between 200 and 300 bp in size using Covaris E210 according to the manufacturer's recommendation. The overhangs created by fragmentation were converted into blunt ends using T4 DNA polymerase and DNA polymerase I Klenow fragment. Using dATP, base 'A' is added to the 3′ end of the blunt phosphorylated DNA fragment to prepare the DNA for ligation to the adaptors. Adaptors were then ligated to the DNA fragment using DNA ligase so that they would hybridize on a flowcell. Finally, using DNA Phusion polymerase, PCR was performed to selectively enrich those DNA fragments that have adapter molecules on both ends, and to amplify the amount of DNA in the library. DNA was sequenced on Illumina genome analyzer II. Reads were aligned to the reference genome sequence (Galagan ) and single nucleotide polymorphisms (SNP), insertions, and deletions were characterized using maq-0.7.1 (Li ) and with BreakDancer (Chen ). Default maq parameters were used for SNP calling and for filtering as described in the maq paper, (maq.pl SNPfilter -f cns.indelse -F cns.indelpe -d 3 -q 40 -Q 60 -w 5 -N 2), with a subsequent minimum map quality filter of 30 applied after the indel proximity filters. The minimum consensus quality (phred scaled likelihood of the consensus base being incorrectly called) and depth filters minimize the likelihood that random sequencing errors lead to false positive SNP calls. A direct estimate of false SNP identification was not determined, but is expected to increase as the divergence from the reference increases. Though this group of strains is not a freely reproducing population and the variants have not been confirmed to exist at an allele frequency to fit the definition of a polymorphism, all of the single nucleotide variants are referred to as Single Nucleotide Polymorphisms (SNPs) for simplicity. To allow comparisons between strains, each SNP or indel was assigned a unique identifier that included the contig and position. For example, a SNP would have an identifier such as “3_591470_C” indicating that it is on contig 3 at position 591,470 and that the base at that position is C. It was necessary to include the base in the SNP identifier, as multiple polymorphisms may occur at the same position. An indel would have an identifier such as “6_91954” indicating the contig and location; size information was not encoded for indels but direct comparisons among strains were directly possible because of the small number of unique indels in any region of the genome. For validation of the deletion in strain FGSC 3921, PCR was carried out with 100 ng genomic DNA using 0.5 uM primers and taq polymerase following manufacturer’s instructions. Primers for amplification of a 1.2 kb coding fragment of wild type NCU03436 were as follows: Forward- 5′-CGATACTCGCTTCGTCTTCC-3′, Reverse- 5′-ATCATCAAGTCCGCCACTTC-3′. Photographic microscopy of culture edge morphology was carried out using a glass microscope slide culture as previously described (Dieterle ).

Results

Association of specific phenotypes with genes

tangerine:

The morphological mutant tangerine (tng) causes the production of enlarged conidia that are delimited by a membrane, but no cell wall (Springer and Yanofsky 1989). Although the sequence of strain FGSC 3921 revealed neither unique nonsynonymous or nonsense mutations, nor frame breaking indels in the region flanked by markers in the 1 Mb region delimited by flanking markers arg-5 and pyr-4 (supporting information, Figure S1), analysis of the genome sequence of FGSC 3921 using BreakDancer (Chen ) revealed a unique 572 base deletion in this region that occurs in NCU03436 (Table 2). PCR amplification of this gene from FGSC 3921 confirmed that the deletion was correctly identified in this strain (Figure 1). NCU03436 is the Neurospora ortholog of cpp-1 (cell-shape-control protein phosphatase) in Fusarium verticillioides where studies indicate an involvement in control of cell shape and fumonisin production. The phenotype for the NCU3436 knockout strain is identical to that of strains containing the classically derived mutation in tng (Figure 1).

Table 2

Mutations identified by whole genome sequencing

FGSC no.	Gene	Putative ORF	Mutation	Annotation
106	com	NCU06508	G->A: P->L	gpip-3
305	amyc	NCU02689	−1:G Frameshift	lrg-1
309	ti	NCU10497	8 bp Deletion	Oligosaccharyl transferase
322	ty-1	NCU00455	2 bp Deletion	ste50
821	ts	NCU01459	−1:T Frameshift	asl-2
1211	dot	NCU00896	−1:A Frameshift	Sac1 (yeast)
1303	fi	NCU04990	−1:A Frameshift	stk-17
1363	smco-1	NCU02762	8 bp insertion	Cch1/trm-11
2261	do	NCU06871	G->A:H->R	β-1,3 glucan synthase
3114	Sk-2	—	—
3246	fs-n	NCU02794	+1:A Frameshift	so, ham-2
3562	mb-1	?
3564	mb-2	NCU00652	W:TGG ->TAG Stop	Hypothetical
3566	mb-3	NCU00658	Premature Stop	Hypothetical
3831	ff-1	NCU01543	+1:C Frameshift	PTAB
3921	tng	NCU03436	572 base deletion	stk-15
7022	fld	NCU09739	−1:G Frameshift	ada-7
7035	per-1	NCU03584	−1:A Frameshift	PKS-7

Figure 1

Morphological characterization of FGSC 3921 and validation of the deletion detected by whole genome sequencing. (A) FGSC 2489 (wt). (B) FGSC 3921 (tng) showing hyperbranching, swollen hyphae and cytoplasmic bleeding at the tips. (C) FGSC 16003 (NCU03436 [stk-15] KO) showing hyperbranching and swollen hyphae. (D) Agarose gel electrophoretic analysis of PCR products from wild type or a tng genomic DNA.

fluffyoid:

Strains with a mutation in fluffyoid (fld) send up aerial hyphae that only very rarely conidiate under normal growth conditions. However, when fld mutants are grown under carbon limitation conditions, conidiation occurs at 25°C but not 34°C (Springer and Yanofsky 1989). Strain FGSC 7022 has over 78,000 SNPs and 8907 indels (Figure S2). Among 918 indels that are unique to strain FGSC 7022, one unique deletion was identified in the region near arg-14 and to the left of his-5 that causes a frameshift mutation in the annotated ORF NCU09739. This deletion is found in all reads covering this region in this strain and is the only frameshift inducing indel in NCU09739 among 18 strains (there is one instance of a 3 base deletion in this ORF in strain FGSC 821). There are no unique nonsynonymous or nonsense SNPs in strain 7022 left of the flanking marker his-5 further supporting the conclusion that the variant in NCU09739 is responsible for the fluffyoid phenotype. NCU09739 encodes a protein predicted to be a Zn(II)2Cys6 type fungal transcription factor and is annotated as all development altered-7 (ada-7) in The Neurospora crassa e-Compendium. Deletion of NCU09739 leads to defects in conidiation, hyphae and female (but not male) fertility(Dunlap ).

compact:

Strains with mutant com make small colonies with restricted radial growth (Perkins and Ishitani 1959). Genetic mapping places the mutation associated with com between ace-2 and ad-4 on LGIII, a span of 741 kb containing only 44 ORFs on supercontig 3. Strain FGSC 106 has over 23,000 SNPs including 1,033 unique SNPs (Figure S3). There is one unique variant in FGSC 106 that occurs in the region delimited by the flanking markers and it occurs in NCU06508. This variant encodes a C629T (ccg to ctg) mutation at the DNA level and a Phenylalanine to Leucine change at amino acid residue 210 out of 1123 and this variant occurred in all 76 reads covering this position. NCU06508 is annotated as a glycosylphophatidylinositol anchor phosphoethanolamine transferase-3 (gpip-3) that is involved in production of glycosylphosphatidylinositol (GPI) anchors between proteins and the plasma membrane. RIP mutants in this gene were found to have defects in morphology at both the colony and microscopic levels of gpip-3 (Bowman ). These defects are very similar to those described for strains containing a mutant in compact.

amycelial:

Multiple phenotypes associated with the mutation amyc are small colony size including, increased cell wall thickness, and non-hyphal, multipolar budding growth under restrictive conditions. Conidiation is considered “conditional” and is dependent in part on carbon source. A single unique deletion was found in strain FGSC 305 among the 166 ORFs in the 742 kb region delimited by the flanking markers (Figure S4). The G at position 3,212,445 in gene NCU02689 is deleted resulting in a frameshift mutation and this deletion is detected in all reads covering this region in this strain. There are no other indels in this ORF among all the strains sequenced and a knockout of this gene has altered sexual morphology (FGSC #11464 Mating Type: a NCU02689.2 Heterokaryon (lrg-1)) (Colot ). Additionally, a conditionally morphological mutant in this gene was identified and was named lrg-1 (Seiler and Plamann 2003). It was subsequently shown that LRG1 is essential for hyphal tip extension and that it plays a role in regulating (through RHO1) β1,3-glucan synthase activity. A mutation in lrg-1 was shown to increase activity of β1,3-glucan synthase as evidenced by decreased sensitivity to caspofungin; it is possible that the increased cell wall thickness of characteristic of amyc grown under restrictive conditions (Coniordos and Turian 1973) is due to increased β1,3-glucan synthase activity.

tiny:

Another morphological mutant, tiny (ti), shows temperature sensitivity and decreased colony size with increasing temperature. There are no nonsynonymous SNPs in strain 309 in the 730 kb region carrying the mutation tiny (ti) (Figure S5) and there is only one unique indel among the 163 ORFs in this region. This 8 base deletion causes a frameshift mutation in the coding sequence of NCU10497 beginning at amino acid 405 out of 750 total amino acids. This frameshift mutation introduces fourteen nonsense codons, beginning at amino acid 445. NCU10497 encodes the oligosaccharyl transferase STT3 subunit. Stt3 is an essential gene in both Saccharomyces cerevisiae and Schizosaccharomyces pombe. Similar to alleles in S. cereviseae, the tiny allele is temperature sensitive. Because stt3 is an essential component of the N-linked glycosylation pathway, it is logical to hypothesize that this causes the decrease in cell wall protein in the ti mutant. A gene deletion mutant of NCU10497 was nonviable as a homokaryon (Dunlap ).

fissure:

Strain FGSC 1303 contains only one unique indel and no unique SNPs in the 1 Mb region between pyr-1 and ace-4 (Figure S6). The deletion is the only indel in the gene NCU04990 and it removes a single nucleotide in the coding sequence causing multiple nonsense codons. This deletion is found in all reads covering this region in this strain. NCU04990 is predicted to encode the ortholog of the vhs/ran1 serine threonine kinase in S. cereviseae. The deletion of this gene in yeast leads to decreased resistance to hyperosmotic stress and accumulation of glycogen. An N. crassa knock-out mutant for this ORF has reduced asexual spore formation and abnormal sexual morphology (Dunlap ).

semicolonial-1:

FGSC 1363 has one unique insertion of 8 bases among the 409 ORFs in the 2 Mb region flanked by mating type and rg-1 (Figure S7). This indel occurs in NCU02762, an ORF that contains domains associated with calcium channels, and which is an ortholog of CCH1 from Saccharomyces cerevisiae. In S. cerevisiae, CCH1 mutants have impaired ability to take up Ca2+ (Peiter ) in response to α mating factor as well as increased sensitivity to some stress conditions. A deletion of the cch1 ortholog from Fusarium graminearum leads to a growth phenotype reminiscent of smco-1 (Hallen and Trail 2008). A Neurospora gene deletion mutant of NCU02762 was only recovered as a heterokaryon suggesting that a homokaryotic deletion would be lethal. All of the SNPs in this region are either shared, or occur in ORFs that have polymorphisms in multiple strains.

dot:

The mutation known as dot is in a 1.7 Mb region on LG IR that contains 415 ORFs. The only unique variant in this region in strain FGSC 1211 is the deletion of one T from NCU00896 (Figure S8), which is detected in all reads covering this sequence in FGSC 1211. As with other morphological mutants, a gene deletion mutant of this ORF was only recovered as a heterokaryon, suggesting that it may be essential for growth or sexual reproduction. This ORF encodes a phosphatidylinositol phosphate (PtdInsP) phosphatase and is the ortholog of SAC1 from S. cerevisiae. Mutations in this gene have been implicated in growth regulation in yeast, as well as in hereditary disease in humans (Manford ). In yeast, sac1p is localized to the ER and plays an important role in actin cytoskeleton organization, cell wall synthesis, Golgi function, lipid metabolism and vacuole morphology through regulation of phosphatidylinositol 4-phosphate levels (Rivas ).

doily:

The mutation doily (do) was deposited to the FGSC collection in 1972 and is characterized by colonial morphology that is sensitive to the carbon source in the culture medium. FGSC 2261 carries the only allele known of the mutation doily (do). While this strain has a relatively high number of polymorphisms (Table 3, Table 4, and Figure S9), most of the SNPs are not unique or occur in ORFs that have other polymorphisms in multiple strains. However, one unique nonsynonymous SNP (A->G at position 3,652,079) was found in the ORF designated NCU06871. This variant had the maximum possible quality score and changes a histidine to an arginine at position 1,246 of the predicted protein. NCU06871 encodes a glycoside transferase 48 family protein involved β-1,3 glucan synthesis. The ortholog of this gene in Aspergillus fumigatus has been shown to be essential. Similarly, strains of Fusarium solani in which RNAi has been used to decrease expression of its ortholog show defects in morphology (Beauvais ; Ha ).

Table 3

Types of single nucleotide polymorphisms among 18 strains of Neurospora crassa

Strain	Total	NC	Synonymous	Non Synonymous	3′	5′	Int	Nonsense	% NC	% Syn	% Non Synonymous	% Nonsense
106	23579	14110	4536	2269	851	494	1143	18	59.84	32.15	9.62	0.08
305	90195	49883	18740	8417	4875	2560	4709	67	55.31	37.57	9.33	0.07
309	13274	9949	1725	667	321	144	414	11	74.95	17.34	5.02	0.08
322	142489	88879	24733	11525	6692	3112	6413	95	62.38	27.83	8.09	0.07
821	188346	112490	35810	17193	8458	4177	8825	122	59.73	31.83	9.13	0.06
1211	20493	16100	2263	1146	339	187	380	19	78.56	14.06	5.59	0.09
1303	59356	41212	8773	4330	1828	816	2119	35	69.43	21.29	7.29	0.06
1363	146641	71259	35955	15999	8535	4463	8685	137	48.59	50.46	10.91	0.09
2261	44839	25265	9894	4507	1772	911	2111	37	56.35	39.16	10.05	0.08
3114	41085	24550	9176	3186	1461	751	1626	31	59.75	37.38	7.75	0.08
3246	21533	14301	3405	1556	839	404	843	14	66.41	23.81	7.23	0.07
3562	106533	57767	22686	10480	5800	2830	5862	78	54.22	39.27	9.84	0.07
3564	47981	31886	7821	3702	1661	777	1737	36	66.46	24.53	7.72	0.08
3566	37516	29134	3962	1875	806	354	1198	27	77.66	13.6	5	0.07
3831	22961	17290	2563	1369	523	253	792	15	75.3	14.82	5.96	0.07
3921	80311	47446	16023	7557	3158	1605	3901	84	59.08	33.77	9.41	0.1
7022	78991	51944	12585	5843	3110	1602	3270	45	65.76	24.23	7.4	0.06
7035	18487	13174	2656	1160	472	241	686	13	71.26	20.16	6.27	0.07

Table 4

Indel distribution among 18 strains of Neurospora crassa

Strain	Total Indels	Unique Indel	CDS Indels	NC Indels	Intron	Splice Site	Other
106	3607	136	294	2636	289	7	381
305	13335	816	1014	9492	1052	18	1759
309	1959	123	137	1466	151	3	202
322	15105	2001	701	11182	1236	22	1964
821	21224	6941	919	16066	1637	27	2575
1211	1494	100	103	1141	106	3	141
1303	6110	501	321	4746	413	7	623
1363	24952	5213	2091	17573	1997	31	3260
2261	7059	546	600	5143	584	6	726
3114	4211	3468	246	2971	369	6	619
3246	2310	89	124	1781	146	4	255
3562	13753	1920	741	10131	1119	19	1743
3564	4960	116	274	3747	387	8	544
3566	3714	471	199	2933	261	6	315
3831	2450	140	138	1922	168	4	218
3921	9743	979	581	7391	748	10	1013
7022	8907	918	471	6765	625	8	1038
7035	1869	93	134	1386	148	2	199

female fertile-1:

female fertile-1 (ff-1) was originally isolated as a mutant displaying a defect in female fertility. It was later shown to be allelic with a spontaneous mutant (glycerol phosphate-1; glp-1) that was characterized by its ability to efficiently conidiate and grow on glycerol as a carbon source. This mutation lies in a 675 kb region of chromosome 2 that includes 172 ORFs and that has very few polymorphisms in strain FGSC 3831 (Figure S10). In this strain, NCU01543 has a unique insertion of 1 base that introduces a frameshift mutation causing multiple stop codons, the first of which is located at position 113 out of 762. The insertion was identified in all reads covering this region in strain FGSC 3831. An orthologous gene in Magnaporthe grisea has been characterized (Li ); the gene, LDB1, is so named because it encodes a predicted LIM-binding-domain. LIM domains are involved in protein–protein interactions (Kadrmas and Beckerle 2004). The phenotypes associated with deletion of this gene in M. grisea are a lack of asexual and sexual spore production, easily-wettable colonies and an inability to develop appresoria.

female sterile-n:

Female sterile-n (fs-n) is on LG 1 far from mat (35%; 45%) and near T(4637) al-1, a region of 80 kb containing only 18 ORFs (Figure S11). This mutation was characterized in 1974 (Mylyk and Threlkeld 1974) and the original description of this mutant suggested that there could be two closely linked lesions responsible for the lack of female fertility in strains carrying this trait. The genome sequence of strain FGSC 3246 revealed a unique insertion of one A into NCU02794 at position 9,112,074 and this insertion was detected in all reads covering this region in this strain. While there are no other indels in strain FGSC 3246 in the 80 kb region delimited by the flanking markers there are a number of SNPs in this region, although most are shared among multiple strains. Although NCU02794, known variously as ham-2 or so (Table 2), has multiple nonsynonymous SNPs in other strains, further examination of the vegetative of morphology of FGSC 3246 indicates that is has a “soft” like phenotype. The protein encoded by NCU02794 contains a WW domain and is involved in hyphal fusion and was localized to septal plugs (Fleissner ). A knockout of this ORF is female sterile and has abnormal vegetative morphology (Dunlap ).

tyrosinaseless-1:

The tyrosinaseless-1 (ty-1) mutant phenotype was initially described as female infertile, “velvet” (short aerial hyphae) and tyrosinaseless (Horowitz ). Further analysis showed that tyrosinase was inducible under some conditions. Strain FGSC 322 carries over 1,400 indels in the 1.5 Mb region genetically shown to carry the mutation tyrosinaseless-1 (Figure S12). Of these, 69 are in coding sequence but only three are unique to this strain. Of these three, NCU00240 exhibits frameshift inducing indels in other strains, suggesting that it is not responsible for the tyrosinaseless phenotype. NCU00403 and NCU00455 each exhibit unique frameshift inducing indels and both deletions are strongly supported in the sequence data. While gene deletion mutants are available for both, the phenotype for the gene deletion of NCU00403 was characterized by the Neurospora Program Project and exhibited “normal” growth and sexual development, including pigmentation of perithecia and ascospores (Dunlap ). Thus, the likely gene associated with the ty-1 phenotype is NCU00455, which encodes the ortholog of S. cerevisiae Ste50, a scaffold protein that connects the mitogen-activated protein kinase (MAPK) cascade with cell cycle machinery. The MAP kinase signaling cascade regulates a significant number of processes in fungi including the regulation of tyrosinase (Park ). There are three MAP kinase cascades in Neurospora crassa that overlap in function and are involved in many biological processes, including cell morphology, conidiation, mating, and osmotic stress response (Borkovich ).

male barren-1:

The mutation known as mb-1 is on LG VII, and the strain carrying this mutation, FGSC 3562, has 741 indels in coding sequences of which 71 are on super contig 7 (Figure S13). Four of the 71 indels are unique to strain FGSC 3562 and three introduce frameshift mutations. Two deletions occur in NCU02251 and both are deletions of one base (at 1,076,703 and 1,076,825). However, a homokaryotic gene deletion mutant of this strain is available suggesting that this ORF is not responsible for the phenotype in FGSC 3562. The second ORF carrying a unique frameshift mutation in strain FGSC 3562 is NCU06930, a hypothetical protein with only limited orthology among filamentous ascomycetes. This same ORF has multiple nonsynonymous SNPs in strain FGSC 821. Four additional annotated ORFs have non-unique indels with NCU11995 having thirteen distinct indels in strains 3562 and 821. Although strain 3562 has no unique nonsense codons on contig 7 it has 800 nonsynonymous SNPs on contig 7 of which 13 are unique to this strain. Of these, NCU11995 has multiple nonsynonymous SNPs. This ORF encodes a C6 zinc finger domain-containing protein. Additionally, NCU06102 has two unique nonsynonymous SNPs although a homokaryotic gene deletion mutant for this ORF is available. Other ORFs containing unique nonsynonymous SNPs are in ORFs that contain other nonsynonymous SNPs in other strains. Because of the ambiguity presented by these multiple polymorphisms, it is not possible to uniquely identify one as being responsible for the male barren phenotype in strain FGSC 3562.

male barren-2:

Although there are no indels or nonsynonymous SNPs in the region carrying the male barren-2 trait in strain FGSC 3564 (Table 1 and Figure S14), there is one unique nonsense mutation in this region in strain FGSC 3564. The G to A substitution occurred in all reads covering this position and has a quality score of 111. This variant encodes a unique nonsense codon in NCU00652 that is the only unique variant in this ORF among all 18 strains. The nonsense SNP in strain FGSC 3564 occurs at amino acid 659 out of 1188. Only four other strains display any variants in this ORF and none of these SNPs are unique.

male barren-3:

The last of the male-barren strains in the current investigation is mb-3, which is on linkage group IR near al-1 and mb-2 (Table 1). There is one unique nonsense mutation in NCU00658 in strain FGSC 3566 (Figure S15) and it occurs with high confidence. The nonsense mutation occurs at amino acid 219 out of 4008. A gene deletion mutant for this ORF has normal morphology, but is male barren.

tan spore:

tan spore (ts) is found in strain FGSC 821, which was deposited into the FGSC collection in 1961, and is described as having been a spontaneous mutation in strain 4a, the so called Emerson lineage (Nakamura and Egashira 1961). The phenotype for ts is light colored and/or immature ascoscores that do not germinate. The ts mutation was used in studies that showed that the majority of N. crassa perithecia originate from fertilization by a single male spore each. Strain FGSC 821 has the greatest nucleotide divergence relative to the reference genome (Table 3, Table 4, and Figure S16) of all the strains characterized in the present study. Nevertheless, no unique SNPs, and only one unique indel, occur in coding sequence in the 2 Mb region flanked by the markers inl and cen-V (Table 1). The unique mutation is a deletion of one base leading to frameshift in NCU01459 (Table 2) and this deletion is detected in all reads covering this region in strain FGSC 821. NCU01459 is a BZIP transcription factor whose deletion leads to ascospore inviability. The deletion of NCU01459 leads to immature tan ascospores and was noted in (Colot ) who named the gene ascospore lethal-2 (asl-2).

perithecial-1:

Analysis of FGSC 7035 genome sequence led to the identification of a unique 1bp deletion in the gene coding region of NCU03584 as well as two ORFs carrying unique nonsynonymous SNPs among the 1,033 ORFs in the region flanked by the markers ilv-1 and asn-1 (Figure S17). While the deletion in NCU03584 is the only variant in this ORF, both of the nonsynonymous SNPs occur in ORFs that have multiple nonsynonymous SNPs or frameshift causing indels in other strains. To validate the identification of NCU03584, and the identification of unique variants by whole genome sequencing, NCU03584 from FGSC 7035 was sequenced manually and the deletion was identified by this approach as well (data not shown). The deletion in NCU03584 occurs at nucleotide 114 resulting in a stop codon at nucleotide 162 in a 7271 nucleotide ORF. NCU03584 is the likely ortholog of the polyketide synthetase responsible for perithecial pigmentation in the related fungus, Sordaria macrospora (Engh ).

Summary of results for combined strain set

Eighteen genomes were sequenced to an average depth ranging from 28 to nearly 108- fold coverage (Table 5). Sixteen strains were sequenced entirely with paired-end reads producing between 27 and 90 million paired end reads resulting in an average sequence depth ranging from 29X to 108X. Two strains, FGSC 3566 and FGSC 3831 had over 20 million single-end reads in addition to over sixty-nine million paired-end reads. While strain FGSC 7035 generated over 90 million reads, many of the reads were of low quality and only 52% were mapped to the genome. Thirteen strains were sequenced with an average read length of 35 bases while five strains were sequenced with an average read length of approximately 75 bases. There were no significant differences in the ability to map these reads to the genome for either approach and each genome was compared to the over 41 million bases of the reference genome. The portion of the genome that was aligned and compared to the reference genome ranged from 99% of the reference genome size in the most conserved strain to 81% for the most diverged strain. Since multi-alleleic calls are not expected in a haploid genome, 240 coding multi-alleleic sites and 1100 non-coding multi-alleleic sites were examined to determine what might have caused these calls. Out of 240 coding sites examined, 53% were caused by indels that were too large for maq to properly align the reads, 36% were in reads that had low mapping quality scores and/or were orphan pairs, suggesting that the read could have been misaligned, 9% appeared truly homozygous, and 2% could not be explained. For non-coding sites 4% could be explained by missed indels, 93% were in reads that had low mapping quality scores and/or were orphan pairs, suggesting that the read could have been misaligned, and 3% appeared truly homozygous. Preprocessed data from this project are available via GenBank (Table 1), or at http://genome.jgi-psf.org/Neucr1/Neucr1.home.html and tables with the processed data are available as supplemental data.

Table 5

Sequencing statistics

Strain	Average Depth	No. SE	% SE Mapped	No. PE	% PE Mapped	% PE Pairs	Read Length	% Ref. Covered	No. Homozygous SNPs	No. Multi-Allelic SNPs	No. Homozygous Indels
106	57.74	0	NA	38170318	84	97	75	97	23083	496	2974
305	107.95	0	NA	62988628	94	97	76	95	88337	1858	11242
309	73.5	0	NA	43519088	95	96	75	99	12793	481	1502
322	30.74	0	NA	39552812	94	95	35	95	118890	23599	13171
821	28.6	0	NA	36997202	93	94	35	93	163792	24554	18872
1211	29.85	0	NA	35860788	98	98	35	99	10322	10171	1214
1303	29.93	0	NA	36903172	97	96	35	98	42851	16505	5376
1363	33.71	0	NA	27162118	69	97	75	86	145722	919	21906
2261	73.42	0	NA	49455190	88	90	76	95	44123	716	6019
3114	28.82	0	NA	35889678	95	96	35	98	27169	13916	3783
3246	32.31	0	NA	39347694	99	95	35	99	17290	4243	1992
3562	32.22	0	NA	39097638	98	98	35	96	99284	7249	12009
3564	32.39	0	NA	39134336	98	97	35	98	33587	14394	4221
3566	83.31	27708365	97	69894872	98	96	36	99	23580	13936	3195
3831	80.55	22353786	92	72379318	99	97	36	99	12885	10076	2022
3921	31.19	0	NA	38460204	96	97	35	96	73150	7161	8564
7022	47.17	0	NA	55991650	97	96	36	97	61278	17713	7634
7035	34.23	0	NA	90007810	52	86	35	99	13310	5177	1557

SE, single end reads; PE, paired end reads.

SE, single end reads; PE, paired end reads. Because most of the target mutations were flanked by genetic markers that are already identified to the level of the gene sequence, the amount of the genome that require examination for each strain ranged from a low of 74 kilobases (kb) in strain FGSC 305 to over 4 megabases (Mb) in strain FGSC 3562 (Table 1). The average was just over 1.7 Mb, although for seven strains the amount of genome space that needed to be searched was less than 1 Mb. The number of annotated open reading frames provides another measure of the amount of genome space that must be searched to identify the mutation responsible for the relevant phenotype. For the strains in the current project this number ranged from a low of 18 ORFs in the 80 kb between flanking markers in strain FGSC 3246 to 1,433 ORFs in the 5.8 Mb to the right of cen-1 in strain FGSC 3566 (Table 1). In the 18 genomes, there are 1,184,610 identified single nucleotide variants (Table 4). Of these, 1,137,606 (96.03%) map to contigs 1-7 (Table 6). Among these, there are 527,620 distinct positions with identified variants in all 18 strains. 244,043 of these variants occur only in one strain. 96 variants are identified as occurring in all 18 genomes, suggesting that the actual polymorphism belongs to the reference genome (Table 7). The distribution of variants among the eighteen strains varies from a high of over 188,346 to a low of 13,274 (Table 3), while the number of unique SNPs ranged from a high of 73,336 to a low of 704 (Figure 2 and Table 8). Overall, there were nearly three times more transition mutations than transversions (Table S1 in File S1). The distribution of SNPs in the 100 kb region directly flanking each mutation was calculated, separating shared and unique SNPs, and there was no apparent association between either whether the mutation was spontaneous or induced, or which mutagen was employed (Table S2 in File S1), suggesting that strain lineage and back-crossing are more important than the mutagen in influencing variant distribution. Remarkably, there are only 413 ORFs, out of nearly 10,000, that have no polymorphisms among the 18 strains (Table S3 in File S1). Among these are 274 conserved hypothetical proteins, hypothetical, or predicted proteins, but also several ribosomal protein genes, histone genes, a mating type gene, and the frequency gene that is central to the cell's circadian rhythm.

Table 6

Distribution of SNPs among the supercontigs

Strain	Contig 1	Contig 2	Contig 3	Contig 4	Contig 5	Contig 6	Contig 7	Contigs > 7
106	5044	947	3607	2250	853	391	9186	1301
305	38046	2537	8514	1019	26797	857	8075	4350
309	4943	2055	891	592	896	454	604	2839
322	29658	7256	28243	12754	24790	16428	22415	946
821	45205	12122	16891	41336	15980	30781	19517	6514
1211	4379	2207	4267	1635	1094	1608	3633	1670
1303	7227	13847	10817	4606	5237	12333	3341	1947
1363	43626	21549	19525	8625	32371	3402	17307	236
2261	1304	789	10200	3545	261	10966	15258	2516
3114	2725	2165	27516	1868	1614	1710	1731	1756
3246	7811	845	792	2606	6415	482	547	2035
3562	9178	4064	19381	28334	20810	13947	8024	2795
3564	5492	3142	1510	4691	16787	12046	1828	2485
3566	8701	5056	1402	9660	4716	1698	1570	4713
3831	2410	1584	6176	1187	2824	1177	5243	2360
3921	16641	2307	17416	9018	14521	5537	12038	2833
7022	32843	3897	1749	3105	11562	11989	10164	3682
7035	6040	1396	3145	3668	675	668	869	2026

Table 7

Frequency of individual SNPs among eighteen strains of Neurospora crassa

No. Occurrences	No. SNPs
1	227988
2	115667
3	68186
4	40033
5	24418
6	13026
7	5285
8	3817
9	2912
10	769
11	559
12	68
13	49
14	33
15	43
16	48
17	42
18	96

Figure 2

Table 8

Number of unique SNPs in each strain

Strain	Unique SNPs
821	73336
3114	29263
1363	21607
3562	16432
7022	15721
3921	9433
1303	9355
305	6653
3566	5306
3564	4612
3246	4253
322	3496
1211	3468
2261	3227
7035	2714
3831	1962
106	1033
309	704

Unique single nucleotide polymorphisms (SNPs) in 18 strains of Neurospora crassa. The number of unique SNPs (Table 8) was plotted for each strain and strains were sorted based on the number of unique SNPs. For this analysis, SNPs that occurred more than once in the dataset were not included. There are 62,952 distinct indels (compared to the reference genome) present in the 18 strains analyzed (Table 4 and Table 9). Of these indels, 25,894 occur in only one strain (Table 9). Because many indels are present in multiple strains, the total number of indels mapped to the genomes is 146,762. Strains exhibit different numbers of indels (Table 4) ranging from a low of 1,494 in strain FGSC 1211 to a high of 24,952 in strain FGSC 1363. Thirteen strains have fewer than 10,000 indels and nine strains have fewer than 5,000 indels. Most strains have ten times more non coding indels than coding indels (Table 4). Indels affecting the splice junction are a small fraction of all indels, ranging from 1 to 3% of coding sequence indels in each strain. Of the distinct indels, 16,682 are repeated twice and 8,818 are repeated three times (Table 9). Relatively few distinct coding sequence indels are found more than ten times and there are more distinct indels that are present in all 18 strains than are in ten to seventeen strains combined, suggesting that the indels that are found in all 18 strains actually represent indels present in the reference genome relative to a putative consensus genome. 1,352 annotated ORFs have one indel and 495 contain two indels (Table 10). 198 ORFs have three indels and 96 ORFs have four indels. Fewer than 100 ORFs in total have more than five indels.

Table 9

Indels repeated among multiple strains

No. Indels	Present in How Many strains	No. CDS
25894	1	1885
16682	2	1168
8818	3	481
4968	4	284
3235	5	167
1586	6	69
640	7	39
453	8	25
347	9	17
108	10	2
56	11	4
25	12	0
22	13	0
14	14	1
15	15	1
13	16	1
19	17	2
57	18	15

Table 10

Number of indels per ORF among eighteen strains of Neurospora crassa

No. ORFs	Number of Indels
1352	1
495	2
198	3
96	4
55	5
31	6
19	7
9	8
3	9
4	10
3	11
0	12
4	13
1	23

The distribution of indel sizes was found to be biased, both with regard to size and distribution in the genome. Indels of size +4 or −4 nucleotides were more common than would be predicted (Figure 3) and are dominated indels of a few specific sequences. Though neither is overrepresented in the genome, TACC and TAGG, often tandemly repeated, comprise 30% of the tetrameric indels. Indels that are +/− 3 or are a multiple of 3 nucleotides are up to ten times more common among coding sequence than in non-coding regions (Figure 4). Over 24% of all indels of three nucleotides occur in protein coding sequence, whereas less than two percent of indels of two or four nucleotides occur in protein coding sequence. The size of indel that is identifiable by the maq software is dependent upon the length of the read, with insertions being harder to identify. To determine the effect of read length on the ability to identify small indels, we altered the reference of supercontig 1 at evenly spaced intervals to simulate indels, aligned un-altered data to the altered reference, and then determined if maq was able to correctly identify the simulated indels. In general insertions are harder to identify than deletions, and longer reads allow more indels to be identified (Figure S18). BreakDancer (Chen ) was used to identify larger events. It is known that this program will report false positive calls, especially for genomes that are repetitive, thus validation of predicted structural variation was necessary, as for FGSC 3921, above, although Neurospora has relatively low levels of repeated sequence (Galagan ).

Figure 3

Figure 4

Indel size frequency in coding sequence among eighteen strains of Neurospora crassa. The total number of indels of a given size that occur in coding sequence was compared to the total number of indels of the same size and the resulting fraction was multiplied by 100 (Table S5 in File S1).

Distribution of unique indel sizes among eighteen strains of Neurospora crassa. The number of indels (Y axis) is plotted vs. the size of the indels (X axis). Insertions and deletions are pooled to give a single value for each size (Table S4 in File S1). Indels of size four are strongly overrepresented. Indels that are a multiple of three are also overrepresented. While indels may occur in multiple strains, each indel was only counted once to generate this dataset. Indel size frequency in coding sequence among eighteen strains of Neurospora crassa. The total number of indels of a given size that occur in coding sequence was compared to the total number of indels of the same size and the resulting fraction was multiplied by 100 (Table S5 in File S1). Finally, the distribution of all polymorphisms is not uniform and demonstrates that different regions of each chromosome are inherited as blocks, or haplotypes (Figure S1, Figure S2, Figure S3, Figure S4, Figure S5, Figure S6, Figure S7, Figure S8, Figure S9, Figure S10, Figure S11, Figure S12, Figure S13, Figure S14, Figure S15, Figure S16, Figure S17, and Figure S19). It is suggested that such a block occurs around the mating type locus (Pomraning ; Wik ), which is on supercontig 1 at approximately 1,850,000. While most strains have either uniformly low or uniformly high divergence from the reference genome (e.g., strains 106 or 821 in Figure S3 and Figure S16, respectively) in this region, strains that show abrupt changes from high divergence from the reference genome to low divergence from the reference genome in this region delimit a sub-region from position 1,840,000 on the left (Strain 7035) to position 2,280,000 on the right (in multiple strains), which has no shifts in the level of divergence from the reference genome (Figure 5). This sub-region contains 116 annotated ORFs (Colot ) and includes 440 kilobases of DNA sequence. Immediately adjacent to the right end of this sub-region (from 2,280,000 to 2,410,000) is another sub-region with a high number of unique SNPs (in 15 strains). This sub-region contains 1,677 unique SNPs and 2,721 shared SNPs. Of these, 4,015 SNPs are non-coding, 123 are synonymous, and only 80 are nonsynonymous. The density of unique SNPs in this region is 13 per kb, as compared to the overall unique SNP density, which ranges from 0.02 per kb (in strain FGSC 309) to 1.8 per kb (in strain 821). The mat a region is defined by high a region of high SNPs from 1,320,000 to 1,940,000 in strain FGSC 3246 (the only mat a strain, shown in Figure 5A).

Figure 5

Putative mating type haplotype region. (A) FGSC 3246, (B) FGSC 7035, (C) FGSC 3562, (D) FGSC 3566. Distribution of SNPs and indels on the left arm of supercontig 1 in representative strains. The region from the left telomere through 2.5 million bases is shown. Total SNPs are plotted in blue. SNPs that are unique to each strain are plotted in red. Indels are plotted in green. The total number in a 10 Kb moving window is plotted on the Y axis. The X axis corresponds to the position along the contig. Strain 3246 is mat a while strains 7035, 3562, and 3566 are mat A.

Nonsense mutations

Four hundred and five distinct nonsense mutations were detected in the current data set for a total of 884 nonsense codons. The number of nonsense mutations varies among strains from a low of 11 in strain FGSC 309 to a high of 137 in strain FGSC 1363 (Table 11). Of these 405 different nonsense mutations, 188 occur once each. Nonsense SNPs occur in 337 different NCUs of which 175 occur in ORFs that have been deleted by the functional genomics program. While many of the ORFs carrying nonsense mutations are hypothetical or putative genes, fifteen of the genes carrying nonsense mutations have annotations in the Neurospora gene compendium, including genes such as sad-2 (NCU04294; (Shiu )), or so (ham-1, NCU02794; (Fleissner )). Only four strains exhibit any variants in sad-2, including two strains, FGSC 821 and FGSC 3921, which feature a nonsense mutation that causes a stop codon at amino acid position 208 out of 1098. This same ORF has no indels in any strains analyzed in the current work.

Table 11

Number of nonsense mutations among eighteen strains of Neurospora crassa

Strain	Nonsense SNP
106	18
305	67
309	11
322	95
821	122
1211	19
1303	35
1363	137
2261	37
3114	31
3246	14
3562	78
3564	36
3566	27
3831	15
3921	84
7022	45
7035	13

Four strains carry nonsense mutations at amino acid position 21 in NCU09544 (pod-2; (Seiler and Plamann 2003)). Two of these, strains, FGSC 305 and FGSC 1363, have morphological phenotypes, while a third putatively carries a mutation in a regulatory protein, and the last is a male barren mutant. While there are multiple alleles of NCU09544 present among these four strains, all alleles carry the same nonsense mutation and vary in the number SNPs and the sizes of indels present. NCU09544 has been deleted by the Neurospora Functional Genomics program and the resulting mutant (FGSC 12737) has no phenotype (Dunlap ) suggesting that the temperature sensitive pod-2 allele previously isolated (Seiler and Plamann 2003) is actually a defective interfering mutation (Fujimura ). Multiple nonsense mutations in single ORFs led us to identify two genes that appear to have been RIPed (Freitag ), NCU09968 and NCU09969. Neither gene is well conserved across sequenced fungi, although there does appear to be an expansion of putatively NCU09969 related genes in Chaetomium globosum. These two ORFs are adjacent on supercontig 6 and the co-occurrence of polymorphisms in around these two ORFs suggest that they are co-inherited. Neither have been deleted by the functional genomics program. Of the 405 nonsense mutations, 313 (77%) occur in ORFs annotated as conserved hypothetical proteins, 21 as ”hypothetical protein” and 24 as “predicted protein.” There are 9,734 total ORFs in the dataset, and 6,481 (48%) are conserved hypothetical proteins. Comparing the ratio of nonsense mutations among conserved hypothetical proteins to the ratio of conserved hypothetical proteins among all proteins shows that conserved hypothetical proteins are overrepresented among the ORFs carrying nonsense mutations by nearly 30%.

Introgressed region in strain FGSC 3114

The genome sequence of the strain carrying the introgressed meiotic drive element, Sk-2, has relatively low divergence from the reference genome in terms of SNPs (Table 3) and indels (Table 4). The variants are predominantly seen on supercontig 3, consistent with the genetic identification of the meiotic drive element (Table 6 and Figure S19). The genome sequence data clearly shows that the introgressed region does not include the left telomere on contig 3, but does span the centromere (Figure 6). The number of SNPs on supercontig 3 ranges from a low of 792 in strain FGSC 3246 to a high of 28,243 in strain 322 (Table 6). This variability is not evenly distributed with strains FGSC 322 and FGSC 1363 sharing most SNPs while other strains, such as FGSC 106 and FGSC 305 share SNPs only on the right end. While there are large numbers of SNPs and indels in the introgressed region, in the absence of a complete sequence for N. intermedia, their identity or distribution do not provide significant insight into the nature of the Sk-2 element.

Figure 6

Distribution of polymorphisms on supercontig 3. (A) Strain FGSC 3114 containing the introgressed Sk-2 region. (B) Strain FGSC 106 with a vertical arrow showing the location of com. (C) Strain FGSC 305. (D) Strain FGSC 1363. Total SNPs are plotted in blue. Unique SNPs are plotted in red. Indels are plotted in green. Polymorphisms were sorted by supercontig and position and the total number in a 10 Kb moving window is plotted on the Y axis. The X axis corresponds to the position along the supercontig (X 10 Kb).

Spontaneous mutation rate

The ability to sample genome sequence of strains that have been highly back-crossed provides an estimate of the upper limit of the rate of spontaneous mutations in sexual reproduction (µ) (Baer ). To generate this estimate, the number of SNPs unique to each strain was calculated (Table 8). The distribution of unique SNPs among strains follows a generally Poisson distribution (Figure 2). Strains 106 and 309 were deposited into the FGSC collection in 1960 and had been backcrossed into the reference genome three times. These strains have the lowest number of unique SNPs and hence provide the best estimate for the maximum value of the background mutation rate in the organism. The total sequenced genome size in the current dataset is 41,061,603, and strain FGSC 309 has 704 unique SNPs, which translates to a mutation rate of 1.71 × 10−5, whereas strain FGSC 106 has 1,033 unique SNPs, which translates to a mutation rate of 2.5 × 10−5. These values are two orders of magnitude lower than the rate suggested in (Drake ) who reported a rate of 0.003 for N. crassa. Adding in the unique indels does not significantly change this calculation. Strain FGSC 309 has 123 unique indels and adding these to the 704 unique SNPs gives an estimate for the maximum value of µ = 2 × 10−5. While some of the difference in value could be explained by false base calls in our data set, it is doubtful that the false call rate would account for the magnitude of this difference.

Discussion

Our sequencing and analysis has led to the putative identification of sixteen of the seventeen mutations in the strains sequenced, and delimited an introgressed region in the eighteenth strain. Because the sequence of each strain can be evaluated in the context of the sequence of eighteen other strains, the ability to evaluate causative sequence polymorphisms was markedly enhanced. Moreover, since each mutant strain has a defined phenotype that segregates in a limited region of the genome, any sequence polymorphisms that are outside that defined region could be excluded from the specific analysis to identify the lesion responsible for the phenotype. Within the genetically defined region, each sequence polymorphism was evaluated relative to the eighteen other sequences. If a polymorphism occurred in a strain that did not display the mutant phenotype, it was presumed that the sequence polymorphism is not responsible for the phenotype. This procedure constituted, is in one manner of speaking, an in silico bulk analysis. The ability to recover the same mutation by manual sequencing, as in FGSC 7035 or by PCR in FGSC 3921 (Figure 1), provided technical validation of the whole genome sequencing approach. Additionally, these evaluations of putative mutations were strengthened by data from the Neurospora Functional Genomics program (Dunlap ), which has generated knockout strains for 7,669 genes (at the time of writing). Of these 7,699, 1,271 are only available as heterokaryons. These heterokaryons were not able to be purified by crossing to a wild type strains, suggesting that the deleted ORFs are essential, or at least essential for sexual reproduction. The availability of these mutants as viable homokaryons or as heterokaryons facilitates evaluation of whether the polymorphisms identified by whole genome sequencing would likely be responsible for the phenotypes seen in the mutants sequenced in the current project. Five mutations were associated with knock-out strains exhibiting the same phenotype and seven mutations exhibit similar phenotypes to the phenotypes of classical mutants in the same gene in Neurospora, or in related organisms. While complementing each mutation is outside the scope of the current study, the ability to associate an otherwise anonymous sequence variant with characterized mutations in Neurospora or related organisms validates the approach taken in the current project. The distribution of sequence divergence among these eighteen strains is highly related to the strain histories. Strains that are closely related to the reference genome strain have the lowest sequence divergence relative to the reference genome, while those that were generated in a different background and not crossed into the reference genome background have the highest sequence divergence. For example, strain FGSC 821 is in the Emerson lineage and this strain has the highest sequence divergence from the reference of all the strains. Strains FGSC 106 and FGSC 309 were both deposited into the FGSC collection in 1960, but both bore mutations that had been induced in the reference genome background, and both were backcrossed three times into the reference genome background. Both of these strains have very low sequence divergence from the reference. However, the expectation that the mutations would be found in small regions of relatively high divergence was not supported. Instead, each strain has small regions of high divergence that have the appearance of blocks of DNA that has been inherited together, or haplotypes (Figure S1, Figure S2, Figure S3, Figure S4, Figure S5, Figure S6, Figure S7, Figure S8, Figure S9, Figure S10, Figure S11, Figure S12, Figure S13, Figure S14, Figure S15, Figure S16, and Figure S17). The characterization of a region surrounding the mating type locus provides a starting point for additional analysis of haplotype distribution among these strains. The distribution of indels, and especially those whose size is a multiple of three, was highly biased in coding regions (Figure 4). This distribution is similar to the distribution seen in the human genome (Messer and Arndt 2007). Although the present analysis does not include a characterization of the nature of indels, beyond their size and location, the abundance of tetrameric indels was unexpected. Nonsense mutations were identified in a surprisingly large number of exons, which, when considered in the context of the large number of numerous nonsynonymous SNPs, emphasizes the observation that most genes can be deleted with no visible phenotype (Dunlap ). The number of nonsense mutations and frameshift inducing insertions and deletions suggest that most of the classical mutant strains carry significant cryptic second-site mutations. Thus careful evaluation of historical functional analysis of classical mutations is warranted. Furthermore, the value of backcrossing a strain into a well characterized genetic background is strongly validated. Our analysis of genomic polymorphisms among eighteen strains of Neurospora crassa has led to identification of the nature of mutations associated with historically described phenotypes. It has also associated some of these classical mutants with other strains that have been characterized both in terms of phenotype and genotype. The ability to evaluate these polymorphisms in the context of classical genetic analysis as well as in the context of a functional genomics program has leveraged the whole genome analysis. The value of the in silico bulk sequence analysis is demonstrated by the facility with which polymorphisms were associated with phenotypes. The added value of the whole genome resequencing has included a characterization of SNP distribution, the identification of unexpected nonsense mutations, and has revealed the strong bias in the size of insertions and deletions relative to their location in coding vs. non-coding sequence. While genome resequencing is becoming more accessible, the ability to find meaningful information in this data, especially when combined with both classical and modern genetic data, is demonstrated by the present approach.

49 in total

1. The genetic basis of cellular morphogenesis in the filamentous fungus Neurospora crassa.

Authors: Stephan Seiler; Michael Plamann
Journal: Mol Biol Cell Date: 2003-09-05 Impact factor: 4.138

Review 2. The LIM domain: from the cytoskeleton to the nucleus.

Authors: Julie L Kadrmas; Mary C Beckerle
Journal: Nat Rev Mol Cell Biol Date: 2004-11 Impact factor: 94.444

3. A high-throughput gene knockout procedure for Neurospora reveals functions for multiple transcription factors.

Authors: Hildur V Colot; Gyungsoon Park; Gloria E Turner; Carol Ringelberg; Christopher M Crew; Liubov Litvinkova; Richard L Weiss; Katherine A Borkovich; Jay C Dunlap
Journal: Proc Natl Acad Sci U S A Date: 2006-06-26 Impact factor: 11.205

4. Linkage Data for Group III Markers in Neurospora.

Authors: D D Perkins; C Ishitani
Journal: Genetics Date: 1959-11 Impact factor: 4.562

5. Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Authors: Heng Li; Jue Ruan; Richard Durbin
Journal: Genome Res Date: 2008-08-19 Impact factor: 9.043

6. A gene controlling the early development of protoperithecium in Neurospora crassa.

Authors: T S Tee; H C Choke
Journal: Mol Gen Genet Date: 1970

7. Characterization of MoLDB1 required for vegetative growth, infection-related morphogenesis, and pathogenicity in the rice blast fungus Magnaporthe oryzae.

Authors: Ya Li; Shen Liang; Xia Yan; Hong Wang; Debao Li; Darren M Soanes; Nicholas J Talbot; Zonghua Wang; Zhengyi Wang
Journal: Mol Plant Microbe Interact Date: 2010-10 Impact factor: 4.171

8. The Saccharomyces cerevisiae Ca2+ channel Cch1pMid1p is essential for tolerance to cold stress and iron toxicity.

Authors: Edgar Peiter; Marc Fischer; Kate Sidaway; Stephen K Roberts; Dale Sanders
Journal: FEBS Lett Date: 2005-10-05 Impact factor: 4.124

9. SAD-2 is required for meiotic silencing by unpaired DNA and perinuclear localization of SAD-1 RNA-directed RNA polymerase.

Authors: Patrick K T Shiu; Denise Zickler; Namboori B Raju; Gwenael Ruprich-Robert; Robert L Metzenberg
Journal: Proc Natl Acad Sci U S A Date: 2006-02-06 Impact factor: 11.205

10. Mutational analysis of the glycosylphosphatidylinositol (GPI) anchor pathway demonstrates that GPI-anchored proteins are required for cell wall biogenesis and normal hyphal growth in Neurospora crassa.

Authors: Shaun M Bowman; Amy Piwowar; Mash'el Al Dabbous; John Vierula; Stephen J Free
Journal: Eukaryot Cell Date: 2006-03

37 in total

Review 1. Centromeres of filamentous fungi.

Authors: Kristina M Smith; Jonathan M Galazka; Pallavi A Phatale; Lanelle R Connolly; Michael Freitag
Journal: Chromosome Res Date: 2012-07 Impact factor: 5.239

Review 2. The United States Culture Collection Network (USCCN): Enhancing Microbial Genomics Research through Living Microbe Culture Collections.

Authors: Kyria Boundy-Mills; Matthias Hess; A Rick Bennett; Matthew Ryan; Seogchan Kang; David Nobles; Jonathan A Eisen; Patrik Inderbitzin; Irnayuli R Sitepu; Tamas Torok; Daniel R Brown; Juliana Cho; John E Wertz; Supratim Mukherjee; Sherry L Cady; Kevin McCluskey
Journal: Appl Environ Microbiol Date: 2015-06-19 Impact factor: 4.792

Review 3. Using next-generation sequencing to isolate mutant genes from forward genetic screens.

Authors: Korbinian Schneeberger
Journal: Nat Rev Genet Date: 2014-08-20 Impact factor: 53.242

4. A-to-I RNA editing is developmentally regulated and generally adaptive for sexual reproduction in Neurospora crassa.

Authors: Huiquan Liu; Yang Li; Daipeng Chen; Zhaomei Qi; Qinhu Wang; Jianhua Wang; Cong Jiang; Jin-Rong Xu
Journal: Proc Natl Acad Sci U S A Date: 2017-08-28 Impact factor: 11.205

5. Global analysis of serine-threonine protein kinase genes in Neurospora crassa.

Authors: Gyungsoon Park; Jacqueline A Servin; Gloria E Turner; Lorena Altamirano; Hildur V Colot; Patrick Collopy; Liubov Litvinkova; Liande Li; Carol A Jones; Fitz-Gerald Diala; Jay C Dunlap; Katherine A Borkovich
Journal: Eukaryot Cell Date: 2011-09-30

6. period-1 encodes an ATP-dependent RNA helicase that influences nutritional compensation of the Neurospora circadian clock.

Authors: Jillian M Emerson; Bradley M Bartholomai; Carol S Ringelberg; Scott E Baker; Jennifer J Loros; Jay C Dunlap
Journal: Proc Natl Acad Sci U S A Date: 2015-12-08 Impact factor: 11.205

7. Molecular analysis of mutants of the Neurospora adenylosuccinate synthetase locus.

Authors: A Wiest; A J McCarthy; R Schnittker; K McCluskey
Journal: J Genet Date: 2012-08 Impact factor: 1.166

8. Molecular analysis of intragenic recombination at the tryptophan synthetase locus in Neurospora crassa.

Authors: A Wiest; D Barchers; M Eaton; R Henderson; R Schnittker; K McCluskey
Journal: J Genet Date: 2013-12 Impact factor: 1.166

9. The F-box protein gene exo-1 is a target for reverse engineering enzyme hypersecretion in filamentous fungi.

Authors: Raphael Gabriel; Nils Thieme; Qian Liu; Fangya Li; Lisa T Meyer; Simon Harth; Marina Jecmenica; Maya Ramamurthy; Jennifer Gorman; Blake A Simmons; Kevin McCluskey; Scott E Baker; Chaoguang Tian; Timo Schuerg; Steven W Singer; André Fleißner; J Philipp Benz
Journal: Proc Natl Acad Sci U S A Date: 2021-06-29 Impact factor: 11.205

10. Analysis of complete nucleotide sequences of 12 Gossypium chloroplast genomes: origin and evolution of allotetraploids.

Authors: Qin Xu; Guanjun Xiong; Pengbo Li; Fei He; Yi Huang; Kunbo Wang; Zhaohu Li; Jinping Hua
Journal: PLoS One Date: 2012-08-02 Impact factor: 3.240