Literature DB >> 26223308

Identifying new sex-linked genes through BAC sequencing in the dioecious plant Silene latifolia.

N Blavet1,2, H Blavet2,3, A Muyle4, J Käfer4, R Cegan3, C Deschamps5, N Zemp1, S Mousset4, S Aubourg6, R Bergero7, D Charlesworth7, R Hobza2,3, A Widmer1, G A B Marais8.   

Abstract

BACKGROUND: Silene latifolia represents one of the best-studied plant sex chromosome systems. A new approach using RNA-seq data has recently identified hundreds of new sex-linked genes in this species. However, this approach is expected to miss genes that are either not expressed or are expressed at low levels in the tissue(s) used for RNA-seq. Therefore other independent approaches are needed to discover such sex-linked genes.
RESULTS: Here we used 10 well-characterized S. latifolia sex-linked genes and their homologs in Silene vulgaris, a species without sex chromosomes, to screen BAC libraries of both species. We isolated and sequenced 4 Mb of BAC clones of S. latifolia X and Y and S. vulgaris genomic regions, which yielded 59 new sex-linked genes (with S. vulgaris homologs for some of them). We assembled sequences that we believe represent the tip of the Xq arm. These sequences are clearly not pseudoautosomal, so we infer that the S. latifolia X has a single pseudoautosomal region (PAR) on the Xp arm. The estimated mean gene density in X BACs is 2.2 times lower than that in S. vulgaris BACs, agreeing with the genome size difference between these species. Gene density was estimated to be extremely low in the Y BAC clones. We compared our BAC-located genes with the sex-linked genes identified in previous RNA-seq studies, and found that about half of them (those with low expression in flower buds) were not identified as sex-linked in previous RNA-seq studies. We compiled a set of ~70 validated X/Y genes and X-hemizygous genes (without Y copies) from the literature, and used these genes to show that X-hemizygous genes have a higher probability of being undetected by the RNA-seq approach, compared with X/Y genes; we used this to estimate that about 30% of our BAC-located genes must be X-hemizygous. The estimate is similar when we use BAC-located genes that have S. vulgaris homologs, which excludes genes that were gained by the X chromosome.
CONCLUSIONS: Our BAC sequencing identified 59 new sex-linked genes, and our analysis of these BAC-located genes, in combination with RNA-seq data suggests that gene losses from the S. latifolia Y chromosome could be as high as 30 %, higher than previous estimates of 10-20%.

Entities:  

Mesh:

Year:  2015        PMID: 26223308      PMCID: PMC4520012          DOI: 10.1186/s12864-015-1698-7

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Of only a handful of plant sex chromosome systems that have been investigated at the molecular level, the XY chromosome system of Silene latifolia is one of the best-studied [1, 2]. However, finding sex-linked genes in this species has been a slow process and is still ongoing. Approaches such as screening cDNA libraries with probes from microdissected S. latifolia Y chromosomes identified only a few sex-linked genes (reviewed in [3]). Segregation analysis of intron variants and SNPs within plant families revealed more sex-linked genes (e.g. [4, 5]). Altogether, these approaches yielded about 30 validated S. latifolia sex-linked genes. Recently, however, three studies used RNA-seq to identify hundreds of S. latifolia sex-linked genes, either using segregation patterns within families [6, 7] or male and female full siblings from an inbred population [8]. Sex-linked genes were identified either by following allele transmission from parents to their progeny (in the two studies using families, [6, 7]), or by searching for SNPs homozygous in females and heterozygous in males, indicating Y-linkage [8]. As no S. latifolia reference genome is available, these searches started with either a de novo assembled reference transcriptome using the S. latifolia RNA-seq data [7, 8] or using 454 EST data from S. vulgaris, a close relative without sex chromosomes [6, 9], to map the S. latifolia reads and perform SNP-calling. Both approaches are subject to errors, especially when sex-linkage of a contig is inferred from the segregation pattern of only a single SNP, so the inferences were assessed by checking for complete sex-linkage of some of the inferred sex-linked genes, using PCR on sets of unrelated males and females [6, 7]. Further tests were done to check whether “tester sets” of well-validated sex-linked and autosomal genes (see above) were correctly assigned [6-8]. The results were encouraging, with most genes tested being correctly assigned. However, only a few newly inferred genes (~10 in each study) were checked experimentally, and the tester sets included only 10–20 sex-linked and 0-10 autosomal genes. Moreover, the RNA-seq studies focused on RNA from only one tissue (flower buds) and any sex-linked genes not expressed in flower buds, or expressed at low levels, must be missed [6-8]. The number of sex-linked genes in S. latifolia is therefore not yet accurately known. An alternative approach to discovering new sex-linked genes is to sequence BAC clones from the sex chromosomes. A handful of BACs from the S. latifolia X and Y chromosomes have already been sequenced (e.g. [10, 11]), and they yielded few new sex-linked genes. To improve the yield, we screened a BAC library with probes from validated X-linked or Y-linked genes of S. latifolia, which establishes sex-linkage of all genes found in the BAC sequences. Identifying both X-linked and Y-linked genes is important for estimating the proportion of X-linked genes that have lost their Y counterparts, indicating Y genetic degeneration of this plant sex chromosome system. Sequencing BACs should help identify genes with low expression levels, some of which were probably missed by previous studies, because most sex-linked genes identified so far in S. latifolia come from cDNA, ESTs or RNA-seq data, which will be enriched for highly expressed genes. Sequencing the complete S. latifolia sex chromosomes using BACs would be extremely costly as the X is 400 Mb, and the Y 550 Mb. However, BAC sequencing to obtain sequences of portions of the sex chromosomes is very useful. In particular, it can provide larger tester set to compare with results from RNA-seq studies (see above), as well as for analyses (explained below) for estimating changes in gene densities during the evolution of the X and Y chromosomes, and gene losses from the Y chromosome. We obtained ~4 Mb of BAC sequences from the S. latifolia sex chromosomes and from Silene vulgaris, a closely related non-dioecious plant without sex chromosomes, in order to identify both new sex-linked genes and their S. vulgaris homologs, which can serve as outgroup sequences for comparing the evolution of S. latifolia X-linked and Y-linked genes. A BAC library from a S. latifolia male was screened using probes specific for X-linked and Y-linked alleles of 10 previously validated X/Y gene pairs (see Methods and Additional file 2: Table S1). Orthologs of all 10 genes have been identified in S. vulgaris, all mapping to a single linkage group [5, 12], indicating that they were all on the ancestral proto-sex chromosomes, and not gained during the evolution of the S. latifolia sex chromosomes. Their map locations in S. latifolia indicate that they represent all evolutionary strata (chromosomal regions with different levels of X-Y divergence) previously described for this species [5, 13] (see also Additional file 1: Figure S1A). Annotation of the BAC sequences yielded 49 new X-linked genes and 10 new Y-linked genes. We analysed the gene densities of the X-linked, Y-linked and S. vulgaris BACs. We also searched by Blast the previously published RNA-seq data with the sequences of the new sex-linked genes in the BACs, and used the results to develop a new, combined approach to estimate Y gene loss. The results of our re-evaluation suggests that gene loss may have been underestimated based on RNA-seq alone, although more work is still needed to get a precise estimate of Y gene loss in S. latifolia.

Results and discussion

Obtaining S. latifolia X and Y genomic sequences and identifying genes

A total of 25 positive BAC clones were selected and sequenced (see Methods, Additional files 2 and 3: Tables S1 and S2). After further validation (see Methods), 24 clones were retained for analysis. These included 6 triplets of X/Y/vulgaris sequences, one X/vulgaris pair, one Y/vulgaris pair, and two single X BAC clones without Y chromosome or S. vulgaris homologs (Additional file 2: Table S1). The 16 sex-linked chromosomal fragments sequenced total ~2.5 Mb, the largest set of S. latifolia sex-linked genomic sequences so far obtained. These BAC sequences were assembled and annotated (see Methods, Additional files 2 and 3: Tables S1 and S2), revealing a total of 153 genes, 78 of which are from S. vulgaris. Including the probe genes, the S. latifolia genes total 58 X-linked and 17 Y-linked genes (Table 1 and Additional files 2 and 3: Tables S1 and S2). 59 of them are newly identified in S. latifolia, tripling the number of S. latifolia fully sex-linked genes with complete genomic sequences; 49 of these 59 new sex-linked genes are X-linked, and 10 are Y-linked.
Table 1

Gene number and density in S. latifolia X and Y and in S. vulgaris BAC clones

S. latifolia X S. latifolia Y S. vulgaris
Total of all genes, including genes used as “probes”581778
Total number of new genes491070
Total number of S. latifolia / S. vulgaris homologous gene pairs132-
Total physical size (Mb)1.71.091.05
Gene density per Mb341674

Gene density was computed using all available BAC data. When only triplets are used, the results are similar

Gene number and density in S. latifolia X and Y and in S. vulgaris BAC clones Gene density was computed using all available BAC data. When only triplets are used, the results are similar An all-against-all Blast search among the BAC-located genes revealed conserved blocks of several tens of kb around each probe gene in the S. latifolia X and S. vulgaris BAC sequences (Additional file 1: Figure S2). These blocks include 13 new X-vulgaris homologous gene pairs (Table 1 and Additional file 3: Table S2). When aligning X-linked and S. vulgaris sequences using MAUVE (Methods), we found conserved gene orders in the blocks around the probe genes, and sequence similarities in the intergenic regions. In contrast, Blast searching found only two new Y-vulgaris gene pairs (Table 1 and Additional file 3: Table S2), and MAUVE alignments found similarity between Y and S. vulgaris sequences mostly restricted to the probe gene itself (Additional file 1: Figure S2). This suggests the occurrence of insertions, deletions and other chromosomal rearrangements of the S. latifolia Y chromosome at a small (within BAC) scale, in addition to the large-scale rearrangements previously found [13-21]. To directly evaluate the extent of gene losses from the S. latifolia Y chromosome, we first searched for X/Y gene pairs (often called “gametologous pairs”, in which X and Y genes are alleles that diverged since X-Y recombination became suppressed), where one is clearly recognizable as a pseudogene. We found no such pairs. All pseudogenes found in the BAC sequences were duplicates of other genes in the same BAC clone. The only X/Y gene pairs in our BAC sequences are the “probe” genes, which were already known (Additional file 3: Table S2); none of the new X-linked genes have gametologs in the corresponding Y chromosome BAC sequence (Additional file 3: Table S2).

Assembling BACs from the X4, X7 and X6a regions and implications for the number of pseudoautosomal regions in S. latifolia sex chromosomes

We found overlaps between the X BAC sequences from three probes, genes X4, X7 and X6a. These BAC sequences were therefore assembled into a scaffold (Additional file 1: Figure S1B). The end of this scaffold (BAC clone BAC65P13) consists of X43.1 repeats typical of Silene telomeres [22]. These X43.1 repeats probably represent the X telomere, based on the following reasoning. BAC assembly and sequencing statistics indicate that 7 % of reads in BAC65P13 are from X43.1, yielding an estimate that the X43.1 repeat forms a ~6 kb region of this BAC. No interstitial X.43.1 signal was detected on the X chromosome in previous work using FISH [18], but a 6 kb sequence composed of units arranged in tandem should yield a clear fluorescent signal with the X43.1 probe. A non-telomeric location is therefore unlikely. Our results therefore suggest that we have reached the end of the Xq arm in S. latifolia. In turn, this implies that only the Xp end is pseudoautosomal. Our results are therefore consistent with the S. latifolia sex chromosomes having only a single pseudoautosomal region, and not two as AFLP mapping suggested [23]; a single pseudoautosomal region (PAR) is consistent with the latest genetic mapping [5] (although our work and [5] do not completely agree on the gene content of the Xq end).

Gene densities in S. latifolia X, Y and S. vulgaris BAC clones

We found an average of 34 genes/Mb in the S. latifolia X BAC sequences and 74 genes/Mb in those from S. vulgaris (Table 1). The gene densities we observed in both species’ BAC sequences are quite high, which suggests that we have sequenced gene-dense regions. The 2.2-fold lower gene density in the S. latifolia X is, however, consistent with the expectation based purely on the genome sizes of the two species (2.7 Gb for S. latifolia and the 1 Gb for S. vulgaris; see the Plant DNA C-value Database, http://data.kew.org/cvalues/). Assuming the same total number of genes in both species (which is likely as they are closely related species with an identical chromosome number of 2n = 24), and neglecting possible inter-chromosomal translocations in S. latifolia or S. vulgaris [5], the relative total genome sizes predict a 2.7-fold lower gene density in S. latifolia. In contrast, the S. latifolia Y BACs have an estimated average gene density of only 16 genes/Mb (Table 1), 2.1 times lower than the X. The S. latifolia Y chromosome is 550 Mb, considerably larger than the X (400 Mb; see [24]). If the number of genes were the same on both sex chromosomes (that is, if their size difference is due solely to the accumulation on the Y of sequences not present on the X, including transposable elements, NUMTs and NUPTs [14, 16, 18, 19, 21], and ignoring the possibility that the PAR may represent physically large regions [5]), the ratio of gene densities for Y versus X should be the same as the ratio of Y/X chromosome sizes, 550/400, predicting a mean Y density 1.4 times lower than that of the X. The observed value in the S. latifolia Y BAC sequences is nevertheless considerably lower than the expectation, and suggests losses of as much as 34 % of genes from the Y.

Searching for the BAC-located genes in RNA-seq data

We blasted our BAC-located genes to the RNA-seq contigs from previous studies (see Methods), which produced significant matches for 54 out of 63 genes (Table 2 and Additional file 1: Table S3), showing that most of our BAC-located genes (~85 %) are expressed in flower buds. Only half of these genes were identified as sex-linked by any of the previous studies (Table 2). As predicted (see Background) the genes not detected as sex-linked in any of the RNA-seq studies have much lower expression levels (as estimated by [8]) than those where sex-linkage was detected (RPKM values: 3008.3 versus 11251.2, respectively; the difference is significant by a one-tailed Student’s t test, p-value = 0.004). This suggests that failure to ascertain genes as sex-linked when they have low expression affects inferences using RNA-seq, in addition to absence of expression of some genes in flower buds.
Table 2

Comparison of BAC and RNA-seq data

Total numberBAC-located genes matching RNA-seq contigsBAC-located genes matching X/Y-inferred RNA-seq contigsBAC-located genes matching X-hemizygous-inferred RNA-seq contigsSources of RNA-seq data
X-linked BAC-located gene524412-M2012
3652BC2011
31135CF2011
46195All 3 combined
Y-linked BAC-located gene1173-M2012
611BC2011
613a CF2011
831All 3 combined
All BAC-located genes6354226All 3 combined

All BAC-located genes are included except the six probe genes for which both X and Y copies were already available. M2012: Muyle et al. 2012 (ref. [8]), BC2011: Bergero, Charlesworth 2011 (ref. [6]), CF2011: Chibalina, Filatov 2011 (ref. [7])

aAmong those 3, two genes were found to be X-hemizygous in [7], and XY in [6, 8]. In the combined data (see details in Methods), we considered these genes to be XY

Comparison of BAC and RNA-seq data All BAC-located genes are included except the six probe genes for which both X and Y copies were already available. M2012: Muyle et al. 2012 (ref. [8]), BC2011: Bergero, Charlesworth 2011 (ref. [6]), CF2011: Chibalina, Filatov 2011 (ref. [7]) aAmong those 3, two genes were found to be X-hemizygous in [7], and XY in [6, 8]. In the combined data (see details in Methods), we considered these genes to be XY

Re-evaluating Y gene loss using both BAC and RNA-seq data

Two RNA-seq studies have used X-linked genes to estimate Y gene loss in S. latifolia. Only 10 to 20 % of X-linked genes were estimated to have no Y transcripts, suggesting that Y degeneration and male hemizygosity may be modest in S. latifolia [6, 7]. Correct inference of X-hemizygous genes is critical for reliably estimating Y gene loss. If the Y copy of an X/Y gene pair is not expressed, or is expressed at low levels in the tissue(s) used for RNA-seq analysis, hemizygosity will be incorrectly inferred and gene losses from the Y will be overestimated. We found some examples of this when comparing the BAC and RNA-seq data (using stringent Blast criteria, see Methods). Two BAC-located genes matched contigs inferred as X/Y gene pairs from one study but with contigs inferred as X-hemizygous in others, and one Y-linked gene matched a contig inferred as X-hemizygous (Table 2). Among our X-linked BAC-located genes, five matched contigs inferred to be X-hemizygous (Table 2). Using our BAC-located genes that match RNA-seq contigs detected as sex-linked, this yields an estimate of 20 % of Y gene loss, the same as in the published RNA-seq studies [6, 7]. However, if coverage is low due to a low expression level, SNPs may not be identified; individuals cannot then be genotyped and no inferences about sex-linkage are possible. Recent data from animals suggests that average expression levels are lower for X-hemizygous genes than for X/Y gene pairs [25, 26], and therefore the RNA-seq approach may fail to detect X-hemizygous genes more often than X/Y gene pairs, resulting in an underestimation of gene losses from the Y. If this bias occurs, the BAC-located genes not matching contigs inferred as sex-linked should include more X-hemizygous genes than the ~20 % estimate above. To evaluate this possibility, it would be helpful to have an estimate of the proportion of X-hemizygous genes that were undetected by the RNA-seq studies. When these studies were done, very few validated X-hemizygous genes were available in S. latifolia. Only two fully degenerated Y-linked genes in S. latifolia have so far been documented [27, 28]. Two recent studies used segregation analysis in large families and inferred further X-hemizygous genes, one being a segregation analysis using RadSeq data [5, 29]; however  comparing these genes with the sex-linked contigs from RNA-seq studies reveals that ~57 % might be X/Y gene pairs, so we cannot use them as well-validated X-hemizygous genes (see the list of genes with X-hemizygous segregation patterns in Additional file 4: Table S4). We therefore used an indirect approach. Many well-validated X/Y gene pairs are now available, and can be used to estimate the probability that the combined RNA-seq studies fail to detect such a gene pair. Given this estimate, one can infer how many of the BAC-located genes that do not match sex-linked RNA-seq contigs could represent such missed X/Y gene pairs, and thus how many are probably truly X-hemizygous genes (schematized in Additional file 1: Figure S3). For the required estimate, we used all published well-validated X/Y gene pairs: the 17 experimentally validated ones (see references in Additional file 4: Table S4), 20 sex-linked contigs from RNA-seq studies that were validated by PCR [6], and 12 more from a recent segregation analysis [5]. All these are probably highly expressed genes. We added 21 more X/Y gene pairs from the RadSeq study [29], which uses genomic DNA, and can therefore ascertain genes even if their expression levels are low, for a total of 70 tester genes that were previously inferred as sex-linked. 78 % of these genes had significant matches with contigs from at least one of the three RNA-seq studies, implying that they are expressed in flower buds. Genes matching contigs not assigned as sex-linked in one study often matched sex-linked ones in another, so that only around 25 % of true X/Y gene pairs remained undetected in the three RNA-seq studies combined (Additional file 4: Table S4). This estimated proportion suggests that, out of our total number of 43 new X-linked BAC-located genes expressed in flower buds, 0.25*43 = 10.75 are probably X/Y gene pairs undetected in the combined RNA-seq data. Thus, 10.75 of the 22 BAC-located genes not matching sex-linked RNA-seq contigs (category (iii) in Table 3) are accounted for. This leaves 22 – 10.75 = 11.25 genes that are probably X-hemizygous, but failed to be detected by the RNA-seq studies. Only X-linked genes newly ascertained by our BAC sequencing are “ancestral” genes relevant for estimating gene losses (the probe genes were ascertained through detecting Y-linked variants, and were therefore previously known to have Y copies); there were probably 50 “ancestral” genes in our BAC sequences, 43 X BAC-located genes that lack copies in our Y BACs but have RNA-seq matches, plus the 7 Y-only BAC-located genes with RNA-seq matches (the total is 60 including the probe genes). The estimated number of Y gene losses is then as follows: 5 genes detected as X-hemizygous (category (ii) in Table 3) + 11.25 X-hemizygous genes that failed to be detected by the RNA-seq studies (see above). Dividing by 50 ancestral genes yields 33 % (or 27 % including the probe genes, Table 3). Using a similar approach to estimate gene losses from the X chromosome gives a considerably lower fraction, 5 % (or 4 % including the probe genes), significantly different from the estimate for the Y (Table 3, Fisher’s exact test p-values < 10−3 in either case). Estimates of ancestral gene numbers are particularly reliable when an outgroup is used to exclude genes that were gained after the sex chromosomes originated, by duplication and/or relocation onto the X. We therefore repeated this analysis, restricting it to genes with homologs on the S. vulgaris BAC sequences (which must have been present on the ancestral proto-sex chromosomes). The results are similar; excluding the “probe” genes, we estimate 34 % gene loss from the Y, and none from the X (Fisher’s exact test p-value = 0.003; see Additional file 1: Table S5, or, including the “probe” genes, 23 % and 0 % Y and X gene loss, respectively; Fisher’s exact test p-value < 0.05).
Table 3

Analysis of gene loss in X and Y chromosomes combining BAC and RNA-seq data

Categories of genesX-linked genesY-linked genes
All new genes in BAC sequences4910
No match to RNA-seq contigs63
Genes retained for analysis437
Category (i): X/Y gene pair results in RNA-seq analysis162
Category (ii): X-hemizygous results in RNA-seq analysis51
Category (iii): Not ascertained as sex-linked by RNA-seq analysis224
Estimated X/Y false negative rate for gene pairs for RNA-seq analysisa 25 %25 %
Expected number of XY pairs undetected in RNA-seq analysis10.751.75
Potential number of X-hemizygous (X0) or Y0 genes undetected in RNA-seq analysis11.252.25
Potential total number of X-hemizygous (X0) or Y0 genes (sum of detected + undetected in RNA-seq analysis numbers above)16.252.25
Potential proportion of X-hemizygous (X0) or Y0 genesb 27-33 %4-5 %

aBased on 39 genes previously known to have X-linked and Y-linked copies, see Additional file 4: Table S4

bBased on total numbers of potential ancestral genes, either including the probe genes, or excluding them, respectively (see text for details).

Analysis of gene loss in X and Y chromosomes combining BAC and RNA-seq data aBased on 39 genes previously known to have X-linked and Y-linked copies, see Additional file 4: Table S4 bBased on total numbers of potential ancestral genes, either including the probe genes, or excluding them, respectively (see text for details). Correct estimation of the proportion of X-hemizygous genes among the BAC-located genes depends on the representativeness of the X/Y gene pairs used as tester set. To check further our set of inferred X-hemizygous genes, we searched for genes that were wrongly classified as X-hemizygous, but which were actually X/Y gene pairs whose sequences are so diverged that they assembled into different contigs, one of which (the Y contig) was not detected. RNA-seq contigs representing the Y copies of these X-hemizygous genes should be found only in males. To test for such sequences among the RNA-seq contigs, we examined the BAC-located genes that the published RNA-seq analyses did not ascertain as sex-linked by blasting them against a set of RNA-seq contigs that were found only in males (from [8]). This yielded only between 3 and 5 significant matches (depending on the filtering of the RNA-seq data, see Methods). Thus, very few potentially highly diverged Y copies are present among the RNA-seq contigs; moreover, some of the male-specific contigs may not represent divergent Y copies but may simply be autosomal paralogs specifically expressed in males. The lack of evidence for the existence of many undetected X/Y gene pairs with diverged Y-linked copies agrees with our estimate that no more than 10 of the genes not ascertained as sex-linked by RNA-seq analysis are actually X/Y gene pairs (Table 3).

Conclusions

Our BAC sequencing effort resulted in 59 new validated sex-linked genes in S. latifolia, adding to the 43 already published ones available (listed in Additional file 4: Table S4). Comparing our new genes to sex-linked genes identified by RNA-seq studies shows that failure to ascertain genes as sex-linked when they have low expression is an important limitation of RNA-seq, in addition to non-expression in the flower bud tissues that have been used, illustrating the difficulty of reliably inferring sex-linkage, X-hemizygosity and gene loss from the Y chromosome without a reference genome. Analyses to take this ascertainment bias into account suggest that gene losses from the S. latifolia Y could be higher than previously thought, perhaps around 30 %, consistent with the gene densities in X/Y and S. vulgaris BACs. However, further work is needed to estimate Y gene loss in this species more precisely.

Methods

Isolation and sequencing of BAC clones

The BAC library was screened following [30]. Clones were gridded on nylon membrane filters and hybridized. The S. latifolia BAC library includes a total of 119,808 clones, with an average insert-size of 128 kb, which equates to 5.3 times the male haploid genome. The S. vulgaris BAC library (total of 55,296 clones), with an average insert-size of 110 kb, represents 6.8 haploid genomes of this species. We screened these libraries using probes designed from 10 published sex-linked genes and their homologs in S. vulgaris (shown in Additional file 1: Figure S1A, plus the triplet SlAP3X/Y-SvAP3). For each “probe” gene, the X-linked copy was used to screen the S. latifolia BAC library, and the Y copy to identify Y-linked BAC clones in the S. latifolia BAC library, while the S. vulgaris homolog was used to identify S. vulgaris BAC clones. For each probe, we found 1 to >100 positive clones. We selected clones showing strong hybridization with the probe, and only those that were confirmed by PCR with probe-derived primers were used in further analyses. Whenever possible, we sequenced one BAC clone for each probe gene. These clones were sequenced with coverage varying from 5–6 to 8–600 X for Sanger and 454, respectively (some clones with mate-pairs, and some without). The BAC sequences were validated by comparing the sequence of the “probe” gene from the BAC to the published sequence of the “probe” gene; this excluded only one BAC clone. This yielded complete triplets of X, Y and S. vulgaris BACs for some probe genes, but not all (Additional file 2: Table S1). All the “probe” genes except SlAP3 have already been mapped on the S. latifolia X chromosomes [4, 5], and their Y copies have been mapped on Y chromosome physical maps, see [13]. All the BAC contigs are available in Genbank (Accession numbers KC978922-KC977838). Additional file 2: Table S1 provides more details.

Assembly and annotation of BAC sequences

For each BAC clone, the reads were assembled de novo using Newbler v.2.5.3 (2010), except for three BAC clones sequenced using Sanger sequencing (19P24, 93 L17 and 78D08), which were assembled with phrap v.16 (2007). The assembly statistics in Additional file 2: Table S1 were obtained using QUAST [31]. Annotation (see Additional file 3: Table S2) was done using both homology-based and expression-data-based strategies using Uniprot and S. latifolia RNA-seq data from [8]. Truncated genes and genes with premature stop codons and/or frameshifts were annotated as pseudogenes. DNA repeats (including transposable elements) were annotated using the latest update of the database of DNA repeats in S. latifolia, based on an extensive search using genomic library screening and low coverage sequencing of the S. latifolia data [18, 20].

Sequence analysis

Homology among BAC clones from the same X/Y probe gene pair was assessed by aligning the BAC sequences with MAUVE 2.3.1 [32] after masking the repeats using RepeatMasker v3.3.0 (http://www.repeatmasker.org/) with the Silene DNA repeat database mentioned above. Homology between X/Y BAC pairs was also assessed by performing an all-against-all Blast search (with the default parameters) among the genes found in the X/Y BAC pair. The results are shown in Additional file 1: Figure S2, and the X-vulgaris and Y-vulgaris pairs that we found are listed in Additional file 1: Table S6. To obtain the results shown in Table 2, we performed a Blast search of all coding sequences (CDS, obtained by annotating the BAC sequences, see previous section) against the RNA-seq data from the three previous studies [6-8] using data available in Genbank [7] and our own data [6, 8]. We retained only manually checked Blast hits with e-values < 10−5, % identities > 90 %, and alignment lengths > 50 bp. Multiple corresponding RNA-seq contigs were allowed for a single BAC CDS to account for assembly problems in the RNA-seq data. The three RNA-seq studies were then combined to infer each CDS gene as being X/Y, X-hemizygous, or not detected as sex-linked in the RNA-seq data (Additional file 3: Table S2). A gene was classified as X/Y in RNA-seq data if any one of the matching RNA-seq contigs was classified as X/Y, and as X-hemizygous if it satisfied two criteria: (i) at least one matching RNA-seq contig was classified as X-hemizygous, and (ii) all other matching RNA-seq contigs were not classified as X/Y gene pairs. Finally, the gene was classified as not having been detected as sex-linked in RNA-seq data whenever all matching RNA-seq contigs failed to be detected as sex-linked. Expression level estimates were obtained from [8]. To check our X-hemizygous genes, we blasted them all (including those detected as X-hemizygous in the RNA-seq studies) against a set of RNA-seq contigs expressed only in males (using data from [8]). Some of these genes might correspond to sex-linked genes with highly diverged X and Y copies that assembled in separate RNA-seq contigs and might therefore be wrongly classified as X-hemizygous, or not be detected as sex-linked at all. To test for potentially Y-linked sequences, we used a set of male-specific contigs from the RNA-seq results. We required these contigs to be expressed in all males and none of the females, using (i) all male-specific contigs, N = 5,504 (ii) male-specific contigs without matches to any transposable element sequence (using the S. latifolia TE database mentioned above) and with more than 10 mapped reads in one of the libraries (to remove noisy expression), N = 3,400. Only sequences with Blast hits of > 100 bp, e-values < 10−4, scores > 80 and identities > 80 % were retained. Fisher’s exact tests and Student’s t tests were done using the relevant statistical functions in R (http://www.r-project.org/).

Availability of supporting data

The BAC contigs are available in Genbank (Accession numbers KC978922-KC977838).
  31 in total

1.  Isolation of genes from plant Y chromosomes.

Authors:  Dmitry A Filatov
Journal:  Methods Enzymol       Date:  2005       Impact factor: 1.600

2.  An X-linked gene with a degenerate Y-linked homologue in a dioecious plant.

Authors:  D S Guttman; D Charlesworth
Journal:  Nature       Date:  1998-05-21       Impact factor: 49.962

3.  QUAST: quality assessment tool for genome assemblies.

Authors:  Alexey Gurevich; Vladislav Saveliev; Nikolay Vyahhi; Glenn Tesler
Journal:  Bioinformatics       Date:  2013-02-19       Impact factor: 6.937

4.  Origins and functional evolution of Y chromosomes across mammals.

Authors:  Diego Cortez; Ray Marin; Deborah Toledo-Flores; Laure Froidevaux; Angélica Liechti; Paul D Waters; Frank Grützner; Henrik Kaessmann
Journal:  Nature       Date:  2014-04-24       Impact factor: 49.962

5.  Strong accumulation of chloroplast DNA in the Y chromosomes of Rumex acetosa and Silene latifolia.

Authors:  P Steflova; R Hobza; B Vyskot; E Kejnovsky
Journal:  Cytogenet Genome Res       Date:  2013-09-14       Impact factor: 1.636

6.  Accumulation of chloroplast DNA sequences on the Y chromosome of Silene latifolia.

Authors:  Eduard Kejnovsky; Zdenek Kubat; Roman Hobza; Martina Lengerova; Shusei Sato; Satoshi Tabata; Kiichi Fukui; Sachihiro Matsunaga; Boris Vyskot
Journal:  Genetica       Date:  2006 Sep-Nov       Impact factor: 1.082

7.  RAD mapping reveals an evolving, polymorphic and fuzzy boundary of a plant pseudoautosomal region.

Authors:  S Qiu; R Bergero; S Guirao-Rico; J L Campos; T Cezard; K Gharbi; D Charlesworth
Journal:  Mol Ecol       Date:  2015-07-27       Impact factor: 6.185

8.  SlWUS1; an X-linked gene having no homologous Y-linked copy in Silene latifolia.

Authors:  Yusuke Kazama; Kiyoshi Nishihara; Roberta Bergero; Makoto T Fujiwara; Tomoko Abe; Deborah Charlesworth; Shigeyuki Kawano
Journal:  G3 (Bethesda)       Date:  2012-10-01       Impact factor: 3.154

9.  Mammalian Y chromosomes retain widely expressed dosage-sensitive regulators.

Authors:  Daniel W Bellott; Jennifer F Hughes; Helen Skaletsky; Laura G Brown; Tatyana Pyntikova; Ting-Jan Cho; Natalia Koutseva; Sara Zaghlul; Tina Graves; Susie Rock; Colin Kremitzki; Robert S Fulton; Shannon Dugan; Yan Ding; Donna Morton; Ziad Khan; Lora Lewis; Christian Buhay; Qiaoyan Wang; Jennifer Watt; Michael Holder; Sandy Lee; Lynne Nazareth; Jessica Alföldi; Steve Rozen; Donna M Muzny; Wesley C Warren; Richard A Gibbs; Richard K Wilson; David C Page
Journal:  Nature       Date:  2014-04-24       Impact factor: 49.962

10.  Expansion of the pseudo-autosomal region and ongoing recombination suppression in the Silene latifolia sex chromosomes.

Authors:  Roberta Bergero; Suo Qiu; Alan Forrest; Helen Borthwick; Deborah Charlesworth
Journal:  Genetics       Date:  2013-06-03       Impact factor: 4.562

View more
  12 in total

1.  Impact of repetitive DNA on sex chromosome evolution in plants.

Authors:  Roman Hobza; Zdenek Kubat; Radim Cegan; Wojciech Jesionek; Boris Vyskot; Eduard Kejnovsky
Journal:  Chromosome Res       Date:  2015-09       Impact factor: 5.239

Review 2.  Repetitive sequences and epigenetic modification: inseparable partners play important roles in the evolution of plant sex chromosomes.

Authors:  Shu-Fen Li; Guo-Jun Zhang; Jin-Hong Yuan; Chuan-Liang Deng; Wu-Jun Gao
Journal:  Planta       Date:  2016-02-26       Impact factor: 4.116

Review 3.  Dosage compensation evolution in plants: theories, controversies and mechanisms.

Authors:  Aline Muyle; Gabriel A B Marais; Václav Bačovský; Roman Hobza; Thomas Lenormand
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2022-03-21       Impact factor: 6.237

4.  The unique genomic landscape surrounding the EPSPS gene in glyphosate resistant Amaranthus palmeri: a repetitive path to resistance.

Authors:  William T Molin; Alice A Wright; Amy Lawton-Rauh; Christopher A Saski
Journal:  BMC Genomics       Date:  2017-01-17       Impact factor: 3.969

5.  The Evolution of Sex Chromosomes and Dosage Compensation in Plants.

Authors:  Aline Muyle; Rylan Shearn; Gabriel Ab Marais
Journal:  Genome Biol Evol       Date:  2017-03-01       Impact factor: 3.416

6.  Evidence for Dosage Compensation in Coccinia grandis, a Plant with a Highly Heteromorphic XY System.

Authors:  Cécile Fruchard; Hélène Badouin; David Latrasse; Ravi S Devani; Aline Muyle; Bénédicte Rhoné; Susanne S Renner; Anjan K Banerjee; Abdelhafid Bendahmane; Gabriel A B Marais
Journal:  Genes (Basel)       Date:  2020-07-13       Impact factor: 4.096

7.  Proteomics: a promising tool for research on sex-related differences in dioecious plants.

Authors:  Le Yang; Fangping Gong; Erhui Xiong; Wei Wang
Journal:  Front Plant Sci       Date:  2015-11-04       Impact factor: 5.753

8.  A new physical mapping approach refines the sex-determining gene positions on the Silene latifolia Y-chromosome.

Authors:  Yusuke Kazama; Kotaro Ishii; Wataru Aonuma; Tokihiro Ikeda; Hiroki Kawamoto; Ayako Koizumi; Dmitry A Filatov; Margarita Chibalina; Roberta Bergero; Deborah Charlesworth; Tomoko Abe; Shigeyuki Kawano
Journal:  Sci Rep       Date:  2016-01-08       Impact factor: 4.379

9.  The slowdown of Y chromosome expansion in dioecious Silene latifolia due to DNA loss and male-specific silencing of retrotransposons.

Authors:  Janka Puterova; Zdenek Kubat; Eduard Kejnovsky; Wojciech Jesionek; Jana Cizkova; Boris Vyskot; Roman Hobza
Journal:  BMC Genomics       Date:  2018-02-20       Impact factor: 3.969

10.  DNA methylation and genetic degeneration of the Y chromosome in the dioecious plant Silene latifolia.

Authors:  José Luis Rodríguez Lorenzo; Roman Hobza; Boris Vyskot
Journal:  BMC Genomics       Date:  2018-07-16       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.