| Literature DB >> 20067643 |
Thomas K Wolfgruber1, Gernot G Presting.
Abstract
BACKGROUND: Repeat-rich regions such as centromeres receive less attention than their gene-rich euchromatic counterparts because the former are difficult to assemble and analyze. Our objectives were to 1) map all ten centromeres onto the maize genetic map and 2) characterize the sequence features of maize centromeres, each of which spans several megabases of highly repetitive DNA. Repetitive sequences can be mapped using special molecular markers that are based on PCR with primers designed from two unique "repeat junctions". Efficient screening of large amounts of maize genome sequence data for repeat junctions, as well as key centromere sequence features required the development of specific annotation software.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20067643 PMCID: PMC2824676 DOI: 10.1186/1471-2105-11-23
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Data flow through JunctionViewer 2.0. This figure illustrates how JunctionViewer 2.0 transforms input documents into a graphical representation of DNA sequence features within a query sequence. The query sequence represented here is BAC CH201-530C10 (GenBank accession AC184133.3) (Figure 2).
Figure 2JunctionViewer 2.0 display of a BAC sequence. (a) Sequences annotated include: LTRs from 5 subfamilies of CRM1 (different shades of blue) [8], CRM2 (maroon), and CRM3 (pink) as well CDSs of CRM1/CRM2/CRM3 (tan), CentA LTR and CDS (orange), and tandem repeat CentC (green). For clarity, all CRM CDSs are drawn in the same color - the subfamily and recombinant subtype of each CRM is identified by the LTR. UTRs are not displayed, since these are highly variable regions including long stretches of homopolymers. Our annotations also included non-centromeric "maize repeats" (grey), "maize genes" (red), "maize organelle" (purple), and "rice genes" (yellow) homologous sequences [for details please see Additional file 4]. (b) Birds-eye view of the complete BAC CH201-530C10/AC184133.3 (labeled c0530C10 in the image as named in FPC) showing the complex arrangement of CentC and CRM sequences. Two overlapping chart sets above the BLAST/cross_match homologies, use different y-axes that indicate the number of anti-CENH3 ChIP-Seq reads covering each nucleotide. Red and blue datapoints represent coverage by reads that match one or two BACs, once or any number of times within the BAC, respectively. Grey points are plotted in the background graph using a different y-axis and represent coverage by reads matching any number of BACs any number of times within a BAC. Thus, the grey charts indicate the general degree of association of a given sequence class (e.g., CRM element) with the centromere protein CENH3, while red and blue charts highlight sequence regions that are specifically bound to CENH3. At the bottom of the display, thin red and blue arrows indicate >= 100 nt exactly matching within the query. (c) Close-up of a BAC section showing nested insertions consisting of a CRM1 B element insertion into a CentC array, followed by insertion of a CRM1 R4 or R5 element into the CRM1 B element, moving the CentC sequences even further apart.