| Literature DB >> 28570674 |
Paul Bilinski1, Yonghua Han2,3, Matthew B Hufford4, Anne Lorant1, Pingdong Zhang3,5, Matt C Estep6, Jiming Jiang3, Jeffrey Ross-Ibarra1,7.
Abstract
Highly repetitive regions have historically posed a challenge when investigating sequence variation and content. High-throughput sequencing has enabled researchers to use whole-genome shotgun sequencing to estimate the abundance of repetitive sequence, and these methodologies have been recently applied to centromeres. Previous research has investigated variation in centromere repeats across eukaryotes, positing that the highest abundance tandem repeat in a genome is often the centromeric repeat. To test this assumption, we used shotgun sequencing and a bioinformatic pipeline to identify common tandem repeats across a number of grass species. We find that de novo assembly and subsequent abundance ranking of repeats can successfully identify tandem repeats with homology to known tandem repeats. Fluorescent in-situ hybridization shows that de novo assembly and ranking of repeats from non-model taxa identifies chromosome domains rich in tandem repeats both near pericentromeres and elsewhere in the genome.Entities:
Mesh:
Year: 2017 PMID: 28570674 PMCID: PMC5453492 DOI: 10.1371/journal.pone.0177896
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Counts of reads per sequence library for each taxa.
An accession ID of NA indicates a purchase from a local nursery or sample not registered with GRIN. Taxa were selected broadly from across the Andropogoneae tribe, with higher density sampling in the Tripsacum genus to study tandem repeat variation within a genus. We used A. nepalensis, rice, and bamboo as outgroups to the Andropogoneae. Asterisks indicate genome size estimates published in this study. GS = Genome size.
| Genus | Species | Reads | GS (pg/1C) | AccessionID |
|---|---|---|---|---|
| 746994 | 1.79* | PI 219568 | ||
| 662118 | 2.02[ | PI 384059 | ||
| 861995 | 1.86* | PI 206889 | ||
| 920258 | 0.75* | Kew 0183574 | ||
| 599567 | 0.50[ | NA | ||
| 628030 | 2.1[ | NA | ||
| 473944 | 0.75[ | PI 564163 | ||
| 288175 | 5.8* | MIA 34430 | ||
| 391848 | 3.88[ | MIA 34597 | ||
| 743668 | 3.47* | MIA 34719 | ||
| 723097 | 3.04* | MIA 34792 | ||
| 238983 | 4.55* | MIA 34501 | ||
| 435815 | 4.93[ | PI 428198 | ||
| 661535 | 0.73* | SM3109 | ||
| 4422188 | 2.73[ | RIMMA0019 | ||
| 5106091 | 5.28[ | NA |
Fig 1Percentage genomic composition of all tandem repeat contigs in monocot taxa.
Values are derived from the proportion of all reads mapping to any tandemly repetitive contig derived from TRF after MIRA assembly. Species are ordered in approximate phylogenetic relationship, with a phylogenetic schematic below the graph.
Fig 2Genomic composition of top 4 tandemly repetitive contigs.
The top 4 contigs in each species were defined as not having homology to one another, in order to identify independent repeat motifs. Species are ordered in approximate phylogenetic relationship, with a phylogenetic schematic below the graph. Values were calculated as a percentage of total genomic reads mapping to each tandem repeat family. Tandem repeat families are ordered by their genomic abundance from left to right.
Fig 3Fluorescent in situ hybridization of the highest abundant tandem repeats in three grasses.
(A1-C1) Somatic metaphase chromosomes prepared from A. nepalensis (A1), H. hirta (B1), and U. digitatum (C1), respectively. (A2-C2) FISH signals derived from the three repeats identified in the three species. (A3-C3) Images merged from chromosomes and FISH signals. Scale bar = 10 microns. On all images, knobs are indicated with white arrows.