Literature DB >> 29096008

Satellite DNAs are conserved and differentially transcribed among Gryllus cricket species.

Octavio Manuel Palacios-Gimenez1,2, Vanessa Bellini Bardella1, Bernardo Lemos2, Diogo Cavalcanti Cabral-de-Mello1.   

Abstract

Satellite DNA (satDNA) is an abundant class of non-coding repetitive DNA that is preferentially found as tandemly repeated arrays in gene-poor heterochromatin but is also present in gene-rich euchromatin. Here, we used DNA- and RNA-seq from Gryllus assimilis to address the content and transcriptional patterns of satDNAs. We also mapped RNA-seq libraries for other Gryllus species against the satDNAs found in G. assimilis and G. bimaculatus genomes to investigate their evolutionary conservation and transcriptional profiles in Gryllus. Through DNA-seq read clustering analysis using RepeatExplorer, dotplots analysis and fluorescence in situ hybridization mapping, we found that ∼4% of the G. assimilis genome is represented by 11 well-defined A + T-rich satDNA families. These are mainly located in heterochromatic areas, with some repeats able to form high-order repeat structures. By in silico transcriptional analysis we identified satDNAs that are conserved in Gryllus but differentially transcribed. The data regarding satDNA presence in G. assimilis genome were discussed in an evolutionary context, with transcriptional data enabling comparisons between sexes and across tissues when possible. We discuss hypotheses for the conservation and transcription of satDNAs in Gryllus, which might result from their role in sexual differentiation at the chromatin level, heterochromatin formation and centromeric function.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29096008      PMCID: PMC5909420          DOI: 10.1093/dnares/dsx044

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


1. Introduction

Satellite DNA (satDNA) is an abundant class of non-coding repetitive DNA in most eukaryotic genomes. SatDNAs constitute clustered arrays of tandemly repeated sequences often located in the gene-poor heterochromatin of centromeres and telomeres. SatDNA arrays are also dispersed in eu/heterochromatin of sex chromosomes as for example in the sorrel Rumex acetosa, the flatworm Schistosoma mansoni and the cricket Eneoptera surinamensis. SatDNA can also occur as single or short arrays nearby protein-coding genes within autosomal euchromatin. Clustered and dispersed organizational patterns of satDNA are achieved by multiple mechanisms of non-reciprocal transfer such as unequal crossing-over, intrastrand homologous recombination, gene conversion, rolling-circle replication and transposition. These genetic exchanges lead to homogenization of arrays over the time in a process called concerted evolution. Even though most satDNA arrays are embedded in tightly condensed heterochromatin that is considered transcriptionally silent and inert, evidence for their transcription has been documented in insects, vertebrates and plants., It has been shown that satDNA could be processed into small RNAs, like siRNAs and piwiRNAs, which are involved, for example, in epigenetic process of heterochromatin formation in organisms as diverse as Schizosaccharomyces pombe, Drosophila, nematodes and plants. Recently, it has also been shown that the major satDNA TCAST1 plays a role in the modulation of protein-gene expression in the beetle Tribolium castaneum. Finally, satDNA arrays adopt specific folding structures, known as high order repeat structures (HORs), in which a block of multiple repeat units form large folding units that are tandemly repeated and attract nuclear proteins. This folding makes satellites potential carriers of a ‘chromatin code’ possibly contributing to cell identity and the specificity of chromosome territories. In Orthoptera, satDNAs have been described in about 25 species, including several grasshoppers,, but also the desert locust Schistocerca gregaria in which transcription patterns were investigated and the cave cricket genus Dolichopoda. For crickets, satDNAs were isolated in about 15 species,, and among Gryllus two distal satDNAs were isolated from the genome of G. bimaculatus with one of them (GBH535) observed in other two genus representatives, suggesting conservation. The genus Gryllus is composed of about 94 species (http://orthoptera.speciesfile.org/Common/basic/Taxa.aspx? TaxonNameID=1122353) that have been used as models for speciation, behavior and ecological, physiology, developmental biology, population genetics and evolutionary studies over several decades. With respect to chromosomal analysis, the genus presents highly conserved karyotypes with 2n = 29♂/30♀ and X0♂/XX,, although in some cases diploid number reduction and chromosomal polymorphisms have been reported., In a previous study, we characterized the chromosomes of G. assimilis to understand genome organization and evolution of some repetitive DNAs (e.g. 18 S and 5 S rDNA, U1 and U2 snDNA, histone H3 gene, microsatellites arrays and C DNA fractions) and use this species as model for chromosomal and genomic analysis. The species presents the typical 2n = 29♂/30♀ karyotype, an X0♂/XX♀ sex-determining system, C-positive heterochromatin around centromeres and terminal regions that primarily did not reveal A + T or G + C base pair richness. C-positive regions were enriched with the C DNA fraction. The chromosomal localization of 18 S rDNA, 5 S rDNA, U1 and U2 snDNA was shown to be conserved with the occurrence of a small number of clusters, in contrast to the dispersed localization and multiple clusters for histone H3 gene and several microsatellites arrays. In the present study, we used DNA- and RNA-seq from a G. assimilis inbred line kept in our lab to address the content and transcriptional patterns of satDNAs in the species. Additionally, we mapped the RNA-seq libraries available at NCBI for other Gryllus (G. bimaculatus, G. rubens and G. firmus) against the satDNAs found in G. assimilis and G. bimaculatus genomes to investigate the conservation of satDNA and their transcriptional profiles in Gryllus. The data regarding satDNAs presence in the genome of G. assimilis were discussed in an evolutionary context, with transcriptional data enabling comparisons between the sexes and across tissues when possible. We hypothesize that the functionality of satDNAs in Gryllus might result from their role in sexual differentiation at the chromatin level, heterochromatin formation and centromeric function.

2. Materials and methods

Samples, chromosome obtaining and genomic DNA extraction

Males and females of Gryllus assimilis were obtained from a pool of individuals that had been bred at the Univ. Estadual Paulista–UNESP (Rio Claro, SP, Brazil). Mitotic chromosomes preparations were obtained from embryo neuroblasts using standard procedures described elsewhere with slight modifications according to Palacios-Gimenez et al. Genomic DNA of adult males and females were extracted from femurs using the phenol/chloroform-based procedure described in Sambrook and Russel.

Illumina sequencing and graph-based clustering of sequencing reads

Paired-end sequencing (2x101) were performed in libraries constructed as recommended using Illumina TruSeq DNA PCR-Free kit (Illumina Inc., San Diego, CA, USA) from female genomic DNA. The library fragment sequencing was performed by Macrogen service facility (Macrogen Inc., South Korea) using a HiSeq2000 system. Sequencing reads were preprocessed to check the quality of the reads with FASTQC and we did a quality filtering with the FASTX-Toolkit suit. The paired-end reads were joined using the ‘fastq-join’ software of the FASTX-Toolkit suit using default options. To search for satDNA in G. assimilis genome, we performed a graph-based clustering and assembly of these sequences using the standard RepeatExplorer pipeline., Subsequently, we examined those clusters that displayed repeat graph density in the RepeatExplorer summary output to identify satDNAs families. We refined the identification using dotplot graphics implemented in Dotlet to confirm their tandem organization. All clusters containing reads with sequences represented above 0.01% of genome proportion (high copy number sequences) were analysed in detail.

Isolation and sequence analysis of satDNAs

Clusters with high graph density were analysed using Tandem Repeats Finder (TRF) algorithm to identify the DNA sequence that maximized the alignment scores between the different monomers that could be defined in tandem. All clusters have been processed with TRF using alignment parameters 2, 3, 5 for match, mismatch and indels, respectively, and a minimum alignment score of 50. Moreover, we used the dotplot graphic alignment tool implemented in Dotlet to identify the exact start and end of monomers of the same family and to confirm their tandem organization. The monomer with maximum length was used as the representative copy for each satDNA family, and also as the query sequences in further BLAST (http://www.ncbi.nlm.gov/Blast/) and Repbase (http://www.girinst.org/repbase/) searches to check similarity with published sequences. Also, these canonical monomers were BLASTed against the satDNAs of the cricket Eneoptera surinamensis and the grasshopers Ronderosia bergii,Locusta migratoria and Eumigus monticola. Sequence alignments of satDNAs copies were performed using Muscle implemented in MEGA5. MEGA5 was also used to estimate nucleotide divergence (p distance), A + T content and to perform repeat length analysis. Evolutionary relationships among sequences were inferred by neighbor-joining (NJ) trees using the proportion of nucleotide differences (p distance) in MEGA5. To predicted secondary structure of G. assimilis satDNAs we used CentroidFold software with McCaskill inference engine and 2^–5 weight for base pairs set parameters as options. The assembled consensus sequences of each satDNA family was used to design primers with opposite directions (Supplementary File S1), using the Primer3 software or manually. In order to verify the presence of satDNAs families, we performed polymerase chain reactions (PCR). PCRs were performed using 10× PCR Rxn Buffer, 0.2 mM MgCl2, 0.16 mM dNTPs, 2 mM of each primer, 1 U of Taq Platinum DNA Polymerase (Invitrogen, San Diego, CA, USA) and 50–100 ng/μl of template DNA. The PCR conditions included an initial denaturation at 94 °C for 5 min and 30 cycles at 94 °C (30 s), 55 °C (30 s) and 72 °C (80 s), plus a final extension at 72 °C for 5 min. The PCR products were visualized on a 1% agarose gel. The monomeric bands were isolated and purified using the Zymoclean™ Gel DNA Recovery Kit (Zymo Research Corp., The Epigenetics Company, USA) according to the manufacturer‘s recommendations and then used as source for reamplification for subsequent analysis. To check the isolation of sequences of interest, purified PCR products were Sanger sequenced in both directions using the service of Macrogen Inc., and then compared to consensus sequences obtained by genomic analysis. The monomer consensus sequences belonging to each of the satDNAs families were deposited in the NCBI database under the accession numbers MF991236-MF991248. In addition, the consensus sequences for each satDNAs family can be found in the Supplementary File S2 and sequence alignments are available upon request.

Probes and fluorescence in situ hybridization

PCR products for each satDNA with more than 50 bp were labelled by nick translation using biotin-14-dATP (Invitrogen) or digoxigenin-11-dUTP (Roche, Mannheim, Germany). SatDNAs with less than 50 bp were labelled directly at the 5’ end with biotin-14 dATP (Sigma-Aldrich, St Louis, MO, USA) during their synthesis. Single- or two-colour fluorescence in situ hybridization (FISH) was performed according to Pinkel et al. with modifications using mitotic chromosome preparations. The 18 S rDNA probe from Dichotomius geminatus was used to check the possible overlapping with satDNAs in the secondary constriction of pair 1. The probes labelled with digoxigenin-11-dUTP were detected using anti-digoxigenin-rhodamine (Roche) and the probes labelled with biotin-14-dATP were detected using streptavidin conjugated with Alexa Fluor 488 (Invitrogen). Following FISH, chromosomal preparations were counterstained using 4’, 6-diamidine-2’-phenylindole (DAPI) and mounted in VECTASHIELD (Vector, Burlingame, CA, USA). Chromosomes and hybridization signals were observed using an Olympus BX61 fluorescence microscope equipped with appropriate filter sets. Black-and-white images were recorded using a DP71 cooled digital camera. The images were pseudo-coloured in blue (chromosomes) and red or green (signals), merged and optimized for brightness and contrast using Adobe Photoshop CS2.

Transcription of satDNAs

We used Illumina RNA-seq reads (2x126) from G. assimilis male and female heads, testis and ovary transcriptome projects that are in preparation in our lab (unpublished data) to investigate satDNA transcription in each tissue. For this species, three biological replicates for every tissue were used. For comparative proposes, we investigated whether the 13 satDNAs from G. assimilis identified here were transcribed across available tissues of G. bimaculatus, G. rubens and G. firmus RNA-seq data. We also used the GBH535 and GBH542 satDNAs families isolated from G. bimaculatus for the same approach. Consensus sequences for each satDNA from G. assimilis were used for analysis and for G. bimaculatus consensus for satDNA families were generated using clones deposited at NCBI with the follows access numbers: GBH535 family, AB204914-AB204938; GBH542 family, AB204939-AB204951. We downloaded from NCBI database RNA-seq data from three Gryllus species, as follows: mixed ovaries and embryos RNA-seq reads (accession SRX023832), mixed-stage embryos RNA-seq reads (accession SRX0238310) and ovaries RNA-seq reads (accession SRX023831) from G. bimaculatus; whole animal samples RNA-seq reads (accession SRX1596750, SRX1596749, SRX1596748, SRX1596747, SRX1596746, SRX1596745, SRX1596744, SRX1596743, SRX1596742, SRX1596741, SRX1596740, SRX1596739, SRX1596738, SRX1596737, SRX1596736, SRX1596735, SRX1596734, SRX1596733, SRX1596732, SRX1596731, SRX1596730, SRX1596729, SRX1596728, SRX1596727) from G. rubens; flight muscle from long winged female with histolyzed flight muscle (LWFHFM) samples RNA-seq reads (accession SRX272161, SRX272160, SRX272159), flight muscle from long winged female with functional flight muscle (LWFFM) samples RNA-seq reads (accession SRX272158, SRX272157, SRX272156, SRX272155), fat body from short winged female incapable of flight (FBSWFIF) samples RNA-seq reads (accession SRX272155, SRX272154, SRX272153, SRX272152, SRX272151, SRX272150, SRX272127, SRX272125), and fat body from long winged female with functional flight muscles (FBLWFFM) samples RNA-seq reads (accession SRX272124, SRX272122, SRX272120, SRX272119, SRX272117, SRX272111, SRX272106, SRX272104) from G. firmus. Raw RNA-seq reads from all tissue libraries were mapped to each of the satDNA sequences using Bowtie2 with the parameters–senstitive as options. For smaller repeats, as for example Gas1 (11 bp), Gas4 (73 bp), Gas9 (82 bp) and Gas11 (10 bp) the mapping was performed on dimers or several monomers were concatenated until reaching 200 bp length. The mapping results were converted from sorted into binary format using the SAMtools and the aligned reads were counted using the SAMtools options to compare between sequences and tissues in order to estimate their genome proportion (i.e. the expression value of the number of raw reads that align to a satDNA divided by the total number of raw reads in the sequencing library). The Bowtie2 output was used to estimate the relative abundances of these transcripts with Cufflinks. The quantification step includes raw read counts and scaled read counts. The scaling method applied was FPKM (fragments per kilo-base of transcript per million mapped reads, the expression value obtained after normalization of read counts by both transcript length and number of mapped reads in each RNA-seq library).

3. Results

SatDNAs identification and sequence characterization

Illumina DNA sequencing produced 37,297,670 paired-end reads with a total of 3,767,064,670 nucleotides (nt). The G + C content is 38.66% and the ratio of reads that have a phred quality score over 30 (Q30) is 91.3%. Given that the mean genome size of G. assimilis is 2.13 pg, this represents about 1.8x genome coverage. For clustering analysis through RepeatExplorer pipeline, we used 4,035,746 Illumina paired-end reads as input. This subset was randomly selected for computational efficiency and returned 277,347 clusters (containing 37% of reads) that corresponded to the most abundant repetitive sequences in G. assimilis, including satDNAs and other non-characterized repetitive elements. The number of singletons sequences was 2,540,450 containing 63% of reads. To search for satDNAs, the top 224 most abundant clusters representing repetitive elements in the RepeatExplorer summary output with number of reads above 0.01% of genome proportion were analysed in detail. The abundance for each family was variable ranging from 0.014% to 1.35% of the genome (Table 1).
Table 1

Main features of the satDNAs isolated from G. assimilis genome

Repeat familyMonomer length (bp)AT %Genome proportion%Nucleotide divergence (±SE) %Reads/ ContigsTotal of monomers into clustersMax. number of tandem arrays per contigsTotal repeat family length (kb)
Gas11154.51.358.5 (±2.9)222.813, 13821634.518
Gas220564.11.0315.5 (±0.9)95.35112422.96
Gas318759.40.2469.3 (±1)90.242123.927
Gas47343.80.24230.5 (±2.9)696.7151253.723
Gas516563.60.21216.6 (±1.2)22.0871511.715
Gas6
 Gas6-119938.70.1517 (±1)1218.2831.592
 Gas6-220035.20.18221.7 (±1.7)915.625631.224
Gas71973.70.1506.4 (±1.5)377.62157312.983
Gas8
 Gas8-117965.90.1442.8 (±1.3)724.75220.358
 Gas8-218162.40.1275.4 (±0.8)731222.172
Gas98246.30.04915.7 (±2.1)14.251531.29
Gas1016168.90.01913.2 (±1.6)21.4821.288
Gas1110500.0143.9 (±1.1)27945350.45

SE, standard deviation.

Main features of the satDNAs isolated from G. assimilis genome SE, standard deviation. Through dotplot analysis we confirmed the tandem organization of 13 satDNAs. These were grouped into 11 well-defined satDNA families, named Gas1, Gas2, Gas3, Gas4, Gas5, Gas6-1, Gas6-2, Gas7, Gas8-1, Gas8-2, Gas9, Gas10 and Gas11 according to decreasing abundance. Together, satDNAs comprised about 4% of the female genome, with sequences showing an A + T content ranging from 35.2% to 73.7%. Repetitive monomeric units ranged from 10 to 205 bp long with nucleotide divergence within the families varying from 2.8% to 30.5%. Most satDNAs found here constitute heavy satDNAs due to high A + T content (Table 1). To search for HORs, we recovered and counted the maximum number of tandem arrays per contigs for each satDNA family as possible, using dotplot analysis. Then, we counted the total number of monomers present in each the clusters (Table 1). These results mean that some satDNAs family, e.g. Gas1, Gas4, Gas7 and Gas11, are present in multiples distinct contigs, indicating the presence of distinct sequence subtypes and possibly containing sequences organized into HORs. For Gas6 satDNAs family we identified two subfamilies, named Gas6-1 and Gas6-2 (Table 1). The size of Gas6-1 is 199 bp while Gas6-2 is 200 bp. Nucleotide divergence between both Gas6-1 and Gas6-2 is 21.4%. Similarly, Gas8 displayed two subfamilies, named Gas8-1 and Gas8-2, with monomer unit ranging from 179 bp and 181 bp, respectively (Table 1). The nucleotide divergence between them is 12.8%. The results of different satDNAs subfamilies in G. assimilis genome is supported by NJ trees which showed Gas6-1, Gas6-2, Gas8-1 and Gas8-2 allocated in cluster-specific branches, indicating that each subfamily is composed of exclusive repeat-variants probably originating from a common ancestor (Supplementary File S3). The search for similarity with other previously described sequences in NCBI BLAST and Repbase for each satDNA showed that Gas9 (82 bp) has 93% of identity with Gryllus bimaculatus mRNA of 91 bp length (NCBI access number AK277574.1) and also that Gas10 (161 bp) has 87% of identity with G. bimaculatus mRNA of 109 bp length (NCBI access number AK272100.1). The remaining satDNAs did not reveal similarity with any other previously described sequences.

Chromosomal localization of satDNAs

In most autosomes, satDNAs were located preferentially in pericentromeric regions, extending to the short arm that corresponds to the C-band positive blocks observed by Palacios-Gimenez et al. (Figs. 1 and 2). This pattern was observed for Gas1 (Fig. 1a), Gas4 (Fig. 1d), Gas7 (Fig. 1f), Gas8 (Fig. 2c and d), Gas9 (Fig. 1g), Gas10 (Fig. 1h), Gas11 (Fig. 1i). Besides pericentromeric signals some small autosomes revealed terminal labelling in both arms, varying from two to six chromosomes, which in some cases extended to pericentromere (Figs. 1a, 1d, 1e–I, 2c and 2d). Moreover, for Gas7 interstitial blocks were noticed, including the pair 1 (Fig. 1f). For Gas1, Gas4, Gas8-1, Gas9, Gas10 and Gas11 no signals were observed in the pair 1 (Figs. 1a, 1d, 1g–I and 2c). Concerning the other four satDNAs, distinct patterns were noticed as follow: Gas2, blocks restrict to pericentromeric region for all chromosomes (Fig. 1b); Gas3, interstitial signals in five pairs (Fig. 1c); Gas5, small terminal blocks in short arm of some chromosomes and occurrence of two small elements with signals in both ends (Fig. 1e); Gas6, the two subfamilies restrict to the secondary constriction of pair 1 (Fig. 2a and b) that also correspond to C-positive heterochromatic band and G + C positive blocks () and location of 18 S rDNA (Fig. 2a and b) and U2 snDNA observed by Palacios-Gimenez et al.
Figure 1

Chromosomal location of nine satDNAs in mitotic chromosomes of embryos of G. assimilis by FISH. The satDNA names are shown in the images. Asterisks indicate chromosomes with signals in both termini.

Figure 2

Chromosomal location of the Gas6 (a and b) and Gas8 (c and d) satDNAs subfamilies in mitotic chromosomes of embryos of G. assimilis by FISH. The satDNA names are shown in the images. Note the specific chromosomal localization on the pair 1 of the Gas6 subfamilies (a and b) contrasting with the scattered clusters for Gas8 subfamilies (c and d). The insets in (a) and (b) shows the pair 1 with overlapped hybridization signals for the satDNAs and 18 S rDNA. Asterisks indicate chromosomes with signals in both termini.

Chromosomal location of nine satDNAs in mitotic chromosomes of embryos of G. assimilis by FISH. The satDNA names are shown in the images. Asterisks indicate chromosomes with signals in both termini. Chromosomal location of the Gas6 (a and b) and Gas8 (c and d) satDNAs subfamilies in mitotic chromosomes of embryos of G. assimilis by FISH. The satDNA names are shown in the images. Note the specific chromosomal localization on the pair 1 of the Gas6 subfamilies (a and b) contrasting with the scattered clusters for Gas8 subfamilies (c and d). The insets in (a) and (b) shows the pair 1 with overlapped hybridization signals for the satDNAs and 18 S rDNA. Asterisks indicate chromosomes with signals in both termini. For the X chromosome no signals were observed for Gas3 (Fig. 1c), Gas5 (Fig. 1e) and Gas6 (Fig. 2a and b). The other satDNAs occurred at one end of X chromosome (Figs. 1a, 1d, 1f–I, 2c and 2d), except for Gas2 that is pericentromeric (Fig. 1b). Additionally, for Gas8-2 one interstitial block was also noticed (Fig. 2d). The different chromosomal localization of Gas8-1 and Gas8-2 (Fig. 2c and d) highlights that the two subfamilies occupy separate genomic regions. To explore the possibility of satDNA transcription we used RNA-seq data from four Gryllus species: Illumina paired-end reads from G. assimilis male heads (143,521,774 reads) and female (156,505,578 reads) heads, testis (133,834,504 reads) and ovary (135,002,056 reads); Illumina paired-end reads from G. rubens whole animal (792,280,896 reads); Illumina single line reads from G. firmus FBLWFFM (56,930,863), FBSWFIF (40,216,469), LWFFM (25,270,610), LWFHFM (18,443,958) and Titanium 454 GS FLX reads from G. bimaculatus mixed ovaries and embryos (1,542,093 reads) and mixed-stage embryos (9,867 reads) and ovaries (8,421 reads). Raw RNA-seq reads from all tissue libraries were mapped to each of the satDNA sequences. The quantification step includes raw read counts and scaled read counts. The scaling method applied was FPKM (fragments per kilo-base of transcript per million mapped reads). The mapping of transcriptomic libraries of G. assimilis, G. bimaculatus, G. firmus and G. rubens against the 13 G. assimilis satDNAs detected here and the two G. bimaculatus satDNAs (GBH535 and GBH542) published by Yoshimura et al. revealed the transcription of satDNAs in several species and tissues. We found evidence that seven satDNA families (Gas1, Gas3, Gas5, Gas6-1, Gas8-2, Gas11 and GBH535) were transcribed in at least one of the three species (G. assimilis, G. firmus and G. rubens). The degree of expression was variable depending on the repeat mapped, the species, tissue and sex (Fig. 3, Tables 2–4). We did not find evidence for satDNA transcription in G. bimaculatus even when we examined satDNA isolated from its own genome (i.e. GBH535 and GBH542) by Yoshimura et al., suggesting either no transcription of satellites and absence of Gas repeats in this species when comparing with the congeneric species.
Figure 3

Differential expression of satDNAs between male and female in distinct body parts in the G. assimilis genome (a), between different tissues in G. firmus female (b) and whole animal samples in G. rubens (c). The quantification method applied is FPKM (fragments per kilo-base of transcript per million mapped reads, the expression value obtained after normalization of read counts by both transcript length and number of mapped reads in each RNA-seq library). FBLWFFM, fat body from long winged female with functional flight muscle; FBSWFIF, fat body from short winged female incapable flight; LWFFM, long winged female with functional flight muscle; LWFHFM, long winged female with histolyzed flight muscle.

Table 2

Number of raw reads from each tissue sequencing library that align to each G. assimilis satDNAs studied and their proportion with respect to the total number of reads (i.e. number of raw reads that align to a satDNA divided by the total number of raw reads in the sequencing library) in the gDNA and in each of the four different tissue transcriptomes obtained by Illumina sequencing

SatDNAGenome proportionTranscriptome
Male headFemale headTestisOvary
ReadsProportionFPKMReadsProportionFPKMReadsProportionFPKMReadsProportionFPKM
Gas10.01350000007145.3E–062.7E + 07282.1E–071.9E + 07
Gas30.00246624.3E–073.2E + 061348.6E–073.1E + 061067.9E–07392,449000
Gas50.00212261.8E–073.1E + 06704.5E–072.3E + 063182.4E–0611.8E + 06322.4E–074.7E + 06
Gas8-20.00127000000664.9E–07385,3920600
Total reads143,521,774156,505,578133,834,504135,002,056

The quantification scaling method applied is FPKM (fragments per kilo-base of transcript per million mapped reads, the expression value obtained after normalization of read counts by both transcript length and number of mapped reads in each RNA-seq library).

Number of raw reads from each tissue sequencing library that align to each G. assimilis satDNAs studied and their proportion with respect to the total number of reads (i.e. number of raw reads that align to a satDNA divided by the total number of raw reads in the sequencing library) in the gDNA and in each of the four different tissue transcriptomes obtained by Illumina sequencing The quantification scaling method applied is FPKM (fragments per kilo-base of transcript per million mapped reads, the expression value obtained after normalization of read counts by both transcript length and number of mapped reads in each RNA-seq library). Table showing the number of raw reads from each tissue sequencing library from G. firmus female that align to each of the G. assimilis (Gas) and G. bimaculatus (GB) satDNAs studied and their proportion with respect to the total number of reads (i.e. number of raw reads that align to a satDNA divided by the total number of raw reads in the sequencing library) in each of the four different tissue transcriptomes obtained by Illumina sequencing The quantification scaling method applied is FPKM (fragments per kilo-base of transcript per million mapped reads, the expression value obtained after normalization of read counts by both transcript length and number of mapped reads in each RNA-seq library). FBLWFFM,  fat body from long winged female with functional flight muscle; FBSWFIF,  fat body from short winged female incapable flight; LWFFM,  long winged female with functional flight muscle; LWFHFM,  long winged female with histolyzed flight muscle. Table showing the number of raw reads from whole animal samples sequencing library from G. rubens that align to each of the G. assimilis (Gas) and G. bimaculatus (GB) satDNAs studied and their proportion with respect to the total number of reads (i.e. number of raw reads that align to a satDNA divided by the total number of raw reads in the sequencing library) in the transcriptome obtained by Illumina sequencing The quantification scaling method applied is FPKM (fragments per kilo-base of transcript per million mapped reads, the expression value obtained after normalization of read counts by both transcript length and number of mapped reads in the RNA-seq library). Differential expression of satDNAs between male and female in distinct body parts in the G. assimilis genome (a), between different tissues in G. firmus female (b) and whole animal samples in G. rubens (c). The quantification method applied is FPKM (fragments per kilo-base of transcript per million mapped reads, the expression value obtained after normalization of read counts by both transcript length and number of mapped reads in each RNA-seq library). FBLWFFM, fat body from long winged female with functional flight muscle; FBSWFIF, fat body from short winged female incapable flight; LWFFM, long winged female with functional flight muscle; LWFHFM, long winged female with histolyzed flight muscle. The repeat Gas1 is transcribed in G. assimilis, Gas11 is transcribed in fat body tissues (FBLWFFM and FBSWFIF) of G. firmus and Gas6-1 is transcribed in G. rubens (Fig. 3a–c). The repeat GBH535 is transcribed only in G. firmus and G. rubens, but, in the former species, this repeat was transcribed only in FBSWFIF (Fig. 3b and c). Three repeats, Gas3, Gas5 and Gas8-2 are transcribed in the three species, but Gas3 is not expressed in the G. assimilis ovary and Gas5 is absent in G. firmus LWFFM (Fig. 3a and b); Gas8-2 is transcribed in G. firmus FBLWFFM and FBSWFIF, while in G. assimilis it is observed only in testis, but in G. rubens this repeat is transcribed in the whole animal sample (Fig. 3a, b and c). Because the tissues used for transcriptional analysis were all different among the three species we could not assess the species-specific or tissue-specific transcription.

Secondary structure prediction

Of the 13 satDNAs identified in G. assimilis, 7 of them are able to adopt secondary structures with well-defined helices, i.e. Gas2, Gas4, Gas5, Gas6-1, Gas8-1, Gas8-2 and Gas10. The base pairing probability of the secondary structures showed that the abovementioned repeats are able to form short and long helices ranging from 2 bp to 7 bp in length (Supplementary File S4). Interestingly, both transcription and adoption of secondary structure were observed for the repeats Gas5, Gas6-1 and Gas8-2 (Table 2, Fig. 3, Supplementary File S4).

4. Discussion

Structural organization of satDNAs in G. assimilis

Gryllus species have been used as model in a variety of fields, from studies in behavioral ecology, to physiology and genetics. However, information concerning their genome organization and chromosomal aspects are poorly known. A more complete picture concerning chromosomal and genomic organization of distinct DNA classes are important issues that could help in future genomic studies. Our analysis revealed that repetitive DNAs represent a large portion of G. assimilis genome (about 40% from 2.13 pg of genome size). This high abundance of repeats could create gaps and other problems in future genome assembly project in this species, and putatively in other species of the genus should they have similarly high repeat content. These assembly gaps are expected to be particularly problematic in peri/centromeric chromosomal areas and in short arms of acrocentric chromosomes that as noticed through chromosomal mapping contain large arrays of satDNAs. A prior knowledge of the composition, localization and chromosomal distribution of repeated sequences, including satDNAs as reported here, is therefore needed. Among crickets, satDNAs have been studied in Dolichopoda species,,,Gryllus bimaculatus, G. rubens, Gryllus sp. and Eneoptera surinamensis. Although only in E. surinamensis the same strategy used here, that allow characterizing the satellites, was used. Comparative analysis from both satDNAs libraries, i.e. G. assimilis and E. surinamensis, revealed the absence of similarity between them. The satDNAs of G. assimilis were clustered primarily in heterochromatin, coincident with the C-DNA fraction. Moreover, the chromosomal distribution of G. assimilis satDNAs is similar to those A + T-rich GBH535 and GBH542 families described for G. bimaculatus, but it is distinct from the clustered, dispersed and intermingled pattern seen in the eu/heterochromatin of E. surinamensis. Considering the data from G. assimilis and G. bimaculatus three features regarding satDNA in this genus are worth highlighting: (i) the predominance of A + T-rich satDNAs families, (ii) the trends of these arrays to occupy more restrict chromosomal localization (i.e. constitutive heterochromatin), like in other Orhoptera, but distinct from E. surinamensis and (iii) also the possibility that the predominance of A + T-rich satDNAs could be a common feature in Gryllus genomes. This is in contrast with E. surinamensis, a species with highly rearranged karyotype, in which G + C-rich satDNAs predominates, with some of them scattered across multiple genomic locations. Sequences organized as HORs in which a block of multiple basic repeat units are able to form a larger array unit and larger units are repeated tandemly as observed for Gas1, Gas4, Gas7 and Gas11 is known to be present in centromeres from primates, mouse and insects. The occurrence of such structures at the peri/centromeres levels, their sequence conservation (low nucleotide divergence) and wide evolutionary distribution suggests their involvement in structural and functional organization of centromeres, with putative structural function for G. assimilis chromosomes. A few features of the chromosomal distribution of satDNA repeats in Gryllus are evident. Some satDNAs are located in multiple chromosomes, while other are located in fewer chromosomes (Gas3) or even exclusive in specific chromosomal elements (Gas6). This suggests differential dynamics for expansion and clusters dispersion of distinct repeats. Considering the observed satDNA distribution some explanations could be addressed. First, some satellites are coincident with C-positive heterochromatin (i.e. Gas1, Gas4, Gas7, Gas8, Gas9, Gas10 and Gas11), revealing the contribution of this class of repetitive DNA for the amount and complexity of heterochromatin in Gryllus. Second, the overlapping of Gas6-1 and Gas6-2 along with 18 S rDNA and U2 snDNA in the secondary constriction of pair, reveals the diverse constitution of this region. Third, the localization of Gas2 in all centromeres of G. assimilis suggests the possible involvement of this repeat for centromere function. Fourth, interstitial satDNAs placed in C-negative areas indicate the possible occurrence of heterochromatin, not revealed by C-banding, which has low complexity. Fifth, chromosomal distribution of satDNAs (i.e. Gas1, Gas4, Gas5, Gas9, Gas10 and Gas11) in both centromeric and telomeric regions of some acrocentric chromosomes suggests putative inversions involving repeats near centromeres leading to intrachromosomal dispersion of satDNAs. The putative inversions are reinforced by occurrence of karyotypes variable in morphology among Gryllus species., Finally, we noted that such differential chromosomal distribution of repeats is not strictly necessary for the emergence of new repetitive variants, as is evident when the subfamilies of Gas6 and Gas8 are compared. For example, Gas6-1 and Gas6-2 are restricted to the pair 1 suggesting common origin within the same chromosome. These repeats diverged by accumulation of mutations in each array followed by amplification and spreading involving well-known molecular mechanisms, like unequal crossing-over, intrastrand homologous recombination, gene conversion, rolling-circle replication and transposition. In contrast, Gas8-1 and Gas8-2 are placed and distributed on several different chromosomes suggesting the possibility of emergence of new repertoires from multiple chromosomes. As expected due to the euchromatic nature of the G. assimilis X chromosome—with heterochromatin restrict only to terminal regions—the satDNAs were placed primarily at the ends of this chromosome, and only Gas8-2 presented one interstitial block on the X. Low accumulation of satDNA in the X chromosome of G. bimaculatus was also noticed, with signals restrict to interstitial heterochromatin. These data contrast with the high accumulation of repeats in sex chromosomes of E. suinamensis, with a highly differentiated neo-X1X2Y.

Transcription activity of satDNA in cricket species reveals wide evolutionary conservation of sequences and putative functionality

SatDNAs transcription has been detected in several organisms, with growing evidence pointing to the importance of this kind of non-coding sequence as global genome regulators.,, For instance, in chicken, zebrafish and in the newts Triturus cristaceus carnifex and Notophthalmus viridescens, satDNAs are transcribed throughout embryogenesis. In many insect species, satDNAs are also expressed throughout the development, and display expression differences between tissues and between sexes.,,, Here, we found evidence that seven satDNAs which includes those isolated here from G. assimilis (i.e. Gas1, Gas3, Gas5, Gas6-1, Gas8-2, Gas11) and those isolated from G. bimaculatus (i.e. GBH535), are shared among crickets species but they are differentially transcribed in different body parts as well as between sexes. It means wide evolutionary conservation of satDNAs among cricket species after satDNA library divergence possibly as a consequence of its functionality. The seven transcribed satDNAs are presents as mRNAs polyadenylated, though we are not sure if such transcripts are exported to the cytoplasm or kept in the nucleus. Three repeats (Gas3, Gas5 and Gas8-2) are commonly transcribed in the genome of G. assimilis, G. firmus and G. rubens and considering the chromosomal localization of these repeats in the constitutive heterochromatin of G. assimilis, such transcripts can be important to heterochromatin formation. In this way, satDNA transcripts are processed into small interfering RNAs (siRNAs) participating in the heterochromatin formation and control of gene expression. In other cases, we observed that Gas satDNAs show differential transcription for different tissues among species. The library sizes and construction protocol could account for these differences. Because of the tissues studied were all different among the three species we could not speculate about species-specific or tissue-specific transcription. The Gas1 was transcribed in testis and ovary and no transcription of this element is seen in male or female heads. For Gas8-2 it is highlighted differential transcription in gonads probably due to the highly specificity of the tissues studied. These findings suggest possible involvement of satDNAs (Gas1 and Gas8-2) during male and female meiosis or gonad maturation and function. However, further experimental evidences are needed to test this possibility. In other cases when the same tissues were compared, we observed that GBH535 is transcribed in the FBSWFIF but not in the FBLWFFM. Furthermore, Gas5 is transcribed in the LWFHFM but not in the LWFFM. This finding seems to indicate some relation between transcription of satDNAs and flying ability in G. firmus, although it deserves experimental validation. Bearing in mind the high diversity of satDNAs and their transcripts, several sequence-specific regulatory signals might reside within them acting as bar code allowing the cell to identify specific chromosome territories. For example, these signals can involve DNAs, RNAs or proteins as well as secondary or tertiary structures of RNA-mediated cathalysis, as noticed for example in cave cricket and beetle. Through in silico analysis of G. assimilis satDNAs we found that seven of them are able to adopt secondary structures with well-defined stem-loops of double-stranded RNA stretches. Moreover, three of them showed to be transcribed adopting stem-loops of double-stranded RNA stretches. Such secondary structures that they adopt could determine RNA-protein interactions, suggesting functional roles. Transcripts of satDNAs able to adopt hummer-head like structures have also been detected, for example, in salamanders, schistosomes and Dolichopoda cave cricket species.,, It has been demonstrated that these hummer-head like structures can function as rybozymes with self-cleavage activities, though the physiological role of them is unclear and intriguing., In the light of such evidence, it is also possible that the folding helps to satDNA dispersion along the genome by rolling-circle replication mechanism, in which circular monomer result from secondary structure RNA processing into linear monomers and subsequently circularization by a host-specific RNA ligase., Our finding prospects raise that the high repertoire of satDNAs and its transcripts might have relevance in organization and regulation of genomes, which will set the stage for further functional genome analyses in Gryllus. Bearing in mind the transcription of satellites it is plausible that the transcripts could be responsible for epigenetic chromatin modification as well as might have effect on the gene expression. Although further research in this area is needed, our structural and functional study provides an important step to understand the biology of satDNAs in Orthoptera, highlighting the importance of crickets as classical model organisms for evolutionary studies. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.
Table 3

Table showing the number of raw reads from each tissue sequencing library from G. firmus female that align to each of the G. assimilis (Gas) and G. bimaculatus (GB) satDNAs studied and their proportion with respect to the total number of reads (i.e. number of raw reads that align to a satDNA divided by the total number of raw reads in the sequencing library) in each of the four different tissue transcriptomes obtained by Illumina sequencing

SatDNATranscriptome
FBLWFFM
FBSWFIF
LWFFM
LWFHFM
ReadsProportionFPKMReadsProportionFPKMReadsProportionFPKMReadsProportionFPKM
Gas36,2001.09E–045.7E + 064,6901.2E–045.82E + 064531.8E–057.7E + 064512.4E–056.2E + 06
Gas5861.5E–06117, 117681.7E–06134,350000221.2E–06443,217
Gas8-2335.8E–07546, 893123.0E–07475,428000000
Gas111973.5E–061.23E + 061132.8E–06949,536000000
GBH535000174.2E–0791,237.1000000
Total reads56,930,86340,216,46925,270,61018,443,958

The quantification scaling method applied is FPKM (fragments per kilo-base of transcript per million mapped reads, the expression value obtained after normalization of read counts by both transcript length and number of mapped reads in each RNA-seq library). FBLWFFM,  fat body from long winged female with functional flight muscle; FBSWFIF,  fat body from short winged female incapable flight; LWFFM,  long winged female with functional flight muscle; LWFHFM,  long winged female with histolyzed flight muscle.

Table 4

Table showing the number of raw reads from whole animal samples sequencing library from G. rubens that align to each of the G. assimilis (Gas) and G. bimaculatus (GB) satDNAs studied and their proportion with respect to the total number of reads (i.e. number of raw reads that align to a satDNA divided by the total number of raw reads in the sequencing library) in the transcriptome obtained by Illumina sequencing

satDNAReadsProportionFPKM
Gas314,1441.8E–54.1E + 06
Gas51,2061.5E–6476,399.7
Gas6–1465.8E–0811,844.3
Gas8–21421.8E–0744,825.5
GBH5354,3485.5E–06656,004.45
Total reads792,280,896

The quantification scaling method applied is FPKM (fragments per kilo-base of transcript per million mapped reads, the expression value obtained after normalization of read counts by both transcript length and number of mapped reads in the RNA-seq library).

  67 in total

Review 1.  Peripheral circadian rhythms and their regulatory mechanism in insects and some other arthropods: a review.

Authors:  Kenji Tomioka; Outa Uryu; Yuichi Kamae; Yujiro Umezaki; Taishi Yoshii
Journal:  J Comp Physiol B       Date:  2012-02-12       Impact factor: 2.200

Review 2.  Transcription and RNA interference in the formation of heterochromatin.

Authors:  Shiv I S Grewal; Sarah C R Elgin
Journal:  Nature       Date:  2007-05-24       Impact factor: 49.962

3.  Transcription of Satellite DNAs in Insects.

Authors:  Zeljka Pezer; Josip Brajković; Isidoro Feliciello; Durđica Ugarković
Journal:  Prog Mol Subcell Biol       Date:  2011

4.  The genetics of speciation: genes of small effect underlie sexual isolation in the Hawaiian cricket Laupala.

Authors:  C K Ellison; C Wiley; K L Shaw
Journal:  J Evol Biol       Date:  2011-03-07       Impact factor: 2.411

5.  Large tandem repeats make up the chromosome bar code: a hypothesis.

Authors:  Olga Podgornaya; Ekaterina Gavrilova; Vera Stephanova; Sergey Demin; Aleksey Komissarov
Journal:  Adv Protein Chem Struct Biol       Date:  2013       Impact factor: 3.507

6.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

7.  Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data.

Authors:  Petr Novák; Pavel Neumann; Jirí Macas
Journal:  BMC Bioinformatics       Date:  2010-07-15       Impact factor: 3.169

8.  Regulation of heterochromatic silencing and histone H3 lysine-9 methylation by RNAi.

Authors:  Thomas A Volpe; Catherine Kidner; Ira M Hall; Grace Teng; Shiv I S Grewal; Robert A Martienssen
Journal:  Science       Date:  2002-08-22       Impact factor: 47.728

9.  Contrasting patterns of transposable element and satellite distribution on sex chromosomes (XY1Y2) in the dioecious plant Rumex acetosa.

Authors:  Pavlina Steflova; Viktor Tokan; Ivan Vogel; Matej Lexa; Jiri Macas; Petr Novak; Roman Hobza; Boris Vyskot; Eduard Kejnovsky
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

10.  High-throughput analysis of the satellitome illuminates satellite DNA evolution.

Authors:  Francisco J Ruiz-Ruano; María Dolores López-León; Josefa Cabrero; Juan Pedro M Camacho
Journal:  Sci Rep       Date:  2016-07-07       Impact factor: 4.379

View more
  7 in total

1.  Cytogenetic Analysis, Heterochromatin Characterization and Location of the rDNA Genes of Hycleus scutellatus (Coleoptera, Meloidae); A Species with an Unexpected High Number of rDNA Clusters.

Authors:  Laura Ruiz-Torres; Pablo Mora; Areli Ruiz-Mena; Jesús Vela; Francisco J Mancebo; Eugenia E Montiel; Teresa Palomeque; Pedro Lorite
Journal:  Insects       Date:  2021-04-26       Impact factor: 2.769

2.  Satellitome Analysis in the Ladybird Beetle Hippodamia variegata (Coleoptera, Coccinellidae).

Authors:  Pablo Mora; Jesús Vela; Francisco J Ruiz-Ruano; Areli Ruiz-Mena; Eugenia E Montiel; Teresa Palomeque; Pedro Lorite
Journal:  Genes (Basel)       Date:  2020-07-13       Impact factor: 4.096

Review 3.  Decoding the Role of Satellite DNA in Genome Architecture and Plasticity-An Evolutionary and Clinical Affair.

Authors:  Sandra Louzada; Mariana Lopes; Daniela Ferreira; Filomena Adega; Ana Escudeiro; Margarida Gama-Carvalho; Raquel Chaves
Journal:  Genes (Basel)       Date:  2020-01-09       Impact factor: 4.096

4.  Comparative analysis of morabine grasshopper genomes reveals highly abundant transposable elements and rapidly proliferating satellite DNA repeats.

Authors:  Octavio M Palacios-Gimenez; Julia Koelman; Marc Palmada-Flores; Tessa M Bradford; Karl K Jones; Steven J B Cooper; Takeshi Kawakami; Alexander Suh
Journal:  BMC Biol       Date:  2020-12-21       Impact factor: 7.431

Review 5.  Dark Matter of Primate Genomes: Satellite DNA Repeats and Their Evolutionary Dynamics.

Authors:  Syed Farhan Ahmad; Worapong Singchat; Maryam Jehangir; Aorarat Suntronpong; Thitipong Panthum; Suchinda Malaivijitnond; Kornsorn Srikulnath
Journal:  Cells       Date:  2020-12-18       Impact factor: 6.600

6.  Satellitome Analysis of Rhodnius prolixus, One of the Main Chagas Disease Vector Species.

Authors:  Eugenia E Montiel; Francisco Panzera; Teresa Palomeque; Pedro Lorite; Sebastián Pita
Journal:  Int J Mol Sci       Date:  2021-06-03       Impact factor: 5.923

7.  Satellite DNAs Unveil Clues about the Ancestry and Composition of B Chromosomes in Three Grasshopper Species.

Authors:  Diogo Milani; Vanessa B Bardella; Ana B S M Ferretti; Octavio M Palacios-Gimenez; Adriana de S Melo; Rita C Moura; Vilma Loreto; Hojun Song; Diogo C Cabral-de-Mello
Journal:  Genes (Basel)       Date:  2018-10-26       Impact factor: 4.096

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.