| Literature DB >> 31559657 |
Tihana Vondrak1,2, Laura Ávila Robledillo1,2, Petr Novák1, Andrea Koblížková1, Pavel Neumann1, Jiří Macas1.
Abstract
Amplification of monomer sequences into long contiguous arrays is the main feature distinguishing satellite DNA from other tandem repeats, yet it is also the main obstacle in its investigation because these arrays are in principle difficult to assemble. Here we explore an alternative, assembly-free approach that utilizes ultra-long Oxford Nanopore reads to infer the length distribution of satellite repeat arrays, their association with other repeats and the prevailing sequence periodicities. Using the satellite DNA-rich legume plant Lathyrus sativus as a model, we demonstrated this approach by analyzing 11 major satellite repeats using a set of nanopore reads ranging from 30 to over 200 kb in length and representing 0.73× genome coverage. We found surprising differences between the analyzed repeats because only two of them were predominantly organized in long arrays typical for satellite DNA. The remaining nine satellites were found to be derived from short tandem arrays located within LTR-retrotransposons that occasionally expanded in length. While the corresponding LTR-retrotransposons were dispersed across the genome, this array expansion occurred mainly in the primary constrictions of the L. sativus chromosomes, which suggests that these genome regions are favourable for satellite DNA accumulation.Entities:
Keywords: zzm321990Lathyrus sativuszzm321990; centromeres; fluorescence in situ hybridization (FISH); heterochromatin; long-range organization; nanopore sequencing; satellite DNA; sequence evolution; technical advance
Mesh:
Substances:
Year: 2019 PMID: 31559657 PMCID: PMC7004042 DOI: 10.1111/tpj.14546
Source DB: PubMed Journal: Plant J ISSN: 0960-7412 Impact factor: 6.417
Characteristics of the investigated satellite repeats
| Satellite family | Monomer [bp] | AT [%] | Genomic abundance | FISH probe | |
|---|---|---|---|---|---|
| Subfamily | [%] | [Mbp/1C] | |||
| FabTR‐2 | 49 | 71.4 | 1.700 | 110.8 | LASm3H1 |
| FabTR‐51 | 3.101 | 202.2 | |||
|
| 80 | 46.3 | 2.500 | 163.0 | LASm1H1 |
|
| 79 | 51.9 | 0.560 | 36.5 | LasTR6_H1 |
|
| 118 | 50.0 | 0.041 | 2.7 | |
| FabTR‐52 | 2.019 | 131.6 | |||
|
| 55 | 47.3 | 2.000 | 130.4 | LASm2H1 |
|
| 32 | 50.0 | 0.019 | 1.2 | |
| FabTR‐53 | 2.600 | 169.5 | c1644 + c1645 | ||
|
| 660 | 76.6 | n.d. | ||
|
| 368 | 76.4 | n.d. | ||
|
| 565 | 75.9 | n.d. | ||
| FabTR‐54 | 104 | 51.0 | 0.840 | 54.8 | LasTR5_H1 |
| FabTR‐55 | 78 | 55.1 | 0.480 | 31.3 | LasTR7_H1 |
| FabTR‐56 | 46 | 60.9 | 0.250 | 16.3 | LasTR8_H1 |
| FabTR‐57 | 61 | 65.6 | 0.130 | 8.5 | LasTR9_H1 |
| FabTR‐58 | 86 | 59.3 | 0.140 | 9.1 | LasTR10_H1 |
| FabTR‐59 | 131 | 49.6 | 0.110 | 7.2 | LasTR11_H1 |
| FabTR‐60 | 86 | 52.3 | 0.110 | 7.2 | LasTR12_H1 |
Figure 1Schematic representation of the analysis strategy. (a) Nanopore read (grey bar) containing arrays of satellites a (orange) and b (green). The orientations of the arrays with respect to sequences in the reference database are indicated. (b) LASTZ search against the reference database results in similarity hits (displayed as arrows showing their orientation, with colours distinguishing satellite sequences) that are quality‐filtered to remove non‐specific hits (c). The filtered hits are used to identify the satellite arrays as regions of specified minimal length that are covered by overlapping hits to the same repeat (d). The positions of these regions are recorded in the form of coded reads where the sequences are replaced by satellite codes and array orientations are distinguished using uppercase and lowercase characters (e). The coded reads are then used for various downstream analyses. (f) Array lengths are extracted and analyzed regardless of orientation of the arrays but while distinguishing the complete and truncated arrays (here it is shown for satellite a). (g) Analysis of the sequences adjacent to the satellite arrays includes 10 kb regions upstream (−) and downstream (+) of the array. This analysis is performed with respect to the array orientation (compare the positions of upstream and downstream regions for arrays in forward (A1, A3) versus reverse orientation (A2)).
Figure 2Length distributions of the satellite repeat arrays. The lengths of the arrays detected in the nanopore reads are displayed as weighted histograms with a bin size of 5 kb; the last bin includes all arrays longer than 120 kb. The arrays that were completely embedded within the reads (red bars) are distinguished from those that were truncated by their positions at the ends of the reads (blue bars). Due to the array truncation, the latter values are actually underestimations of the real lengths of the corresponding genomic arrays and should be considered as lower bounds of the respective array lengths. Tandem repeats forming long arrays are shown in panel (a), while the remaining repeats forming predominantly short arrays are in panel (b).
Figure 3Sequence composition of the genomic regions adjacent to the satellite repeat arrays. The plots show the proportions of repetitive sequences identified within 10 kb regions upstream (positions −1 to −10 000) and downstream (1 to 10 000) of the arrays of individual satellites (the array positions are marked by vertical lines, and the plots are related to the forward‐oriented arrays). Only the repeats detected in proportions exceeding 0.05 are plotted (coloured lines). The black lines represent the same satellite as examined. Tandem repeats forming long arrays are shown in panel (a), while the remaining repeats forming predominantly short arrays are in panel (b).
Figure 4Detection of the Ogre sequences coding for the retrotransposon conserved protein domains in the genomic regions adjacent to the satellite repeat arrays. (a) The plots show the proportions of similarity hits from the individual domains and their orientation with respect to the forward‐oriented satellite arrays. (b) A schematic representation of the Ogre element with the positions of the protein domains and short tandem repeats downstream of the coding region.
Figure 5Periodicity spectra revealed by the fast Fourier transform analysis of the satellite repeat arrays. Each spectrum is an average of the spectra calculated for the individual arrays longer than 30 kb of the same satellite family or subfamily. The numbers of arrays used for the calculations are in parentheses. The peaks corresponding to the monomer lengths listed in Table 1 are marked with red asterisks. The peaks in the FabTR‐2 spectrum corresponding to higher‐order repeats are indicated by the horizontal line.
Figure 6Distribution of the satellite repeats on the metaphase chromosomes of Lathyrus sativus (2n = 14). (a–f) The satellites were visualized using multi‐colour FISH, with individual probes labelled as indicated by the colour‐coded descriptions. The chromosomes counterstained with DAPI are shown in grey. The numbers in panel (c) correspond to the individual chromosomes that were distinguished using the hybridization patterns of the FabTR‐54 sequences. This satellite was then used for chromosome discrimination in combination with other probes. (g–i) Simultaneous detection of the Ogre integrase probe (INT) and the satellite FabTR‐52‐LAS‐A demonstrates the different distribution of these sequences in the genome. The probe signals and DAPI counterstaining are shown as separate grayscale images (g–i) and a merged image (j). The arrowheads point to the primary constrictions of chromosomes 7.