| Literature DB >> 19099592 |
William Nelson1, Meizhong Luo, Jianxin Ma, Matt Estep, James Estill, Ruifeng He, Jayson Talag, Nicholas Sisneros, David Kudrna, HyeRan Kim, Jetty S S Ammiraju, Kristi Collura, Arvind K Bharti, Joachim Messing, Rod A Wing, Phillip SanMiguel, Jeffrey L Bennetzen, Carol Soderlund.
Abstract
BACKGROUND: Many plant genomes are resistant to whole-genome assembly due to an abundance of repetitive sequence, leading to the development of gene-rich sequencing techniques. Two such techniques are hypomethylated partial restriction (HMPR) and methylation spanning linker libraries (MSLL). These libraries differ from other gene-rich datasets in having larger insert sizes, and the MSLL clones are designed to provide reads localized to "epigenetic boundaries" where methylation begins or ends.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19099592 PMCID: PMC2628917 DOI: 10.1186/1471-2164-9-621
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
HMPR and MSLL libraries, and alignment to 16,861 BACs.
| HMPR | Enzyme | Digestion | Reads | |||
| Ha | HpyCH4 IV | 10 U/ug | 171 | 27 | 3.1 | 28% |
| Hb | HpyCH4 IV | 0.1 U/ug | 181 | 26 | 2.7 | 19% |
| Hc | HpaII | 10 U/ug | 734 | 120 | 3.5 | 28% |
| Hd | HpaII | 0.5 U/ug | 1482 | 278 | 3.5 | 25% |
| He | HpaII | 0.25 U/ug | 5189 | 913 | 3.3 | 27% |
| Hf | HpaII | 0.125 U/ug | 10428 | 1838 | 3.1 | 25% |
| Hg | HpaII | 0.05 U/ug | 1488 | 246 | 3.2 | 27% |
| Hh | HpaII | 0.025 U/ug | 177 | 19 | 3.1 | 19% |
| Hi | HpaII | 0.0125 U/ug | 174 | 22 | 2.6 | 18% |
| Hj | HpaII | 0.005 U/ug | 177 | 34 | 2.9 | 38% |
| Hk | HpaII | 0.00125 U/ug | 183 | 29 | 2.4 | 37% |
| Hl | HpyCH4 IV | 0.5 U/ug | 1441 | 112 | 2.2 | 31% |
| Hm | HpyCH4 IV | 0.25 U/ug | 2239 | 239 | 3.0 | 36% |
| Hn | HpyCH4 IV | 0.125 U/ug | 1492 | 239 | 3.0 | 34% |
| Ho | HpyCH4 IV | 0.05 U/ug | 2550 | 492 | 1.9 | 21% |
| Hp | HpyCH4 IV | 0.025 U/ug | 8218 | 1483 | 3.0 | 22% |
| Hq | HpyCH4 IV | 0.0125 U/ug | 1782 | 329 | 2.9 | 25% |
| Hr | HpyCH4 IV | 0.005 U/ug | 1544 | 262 | 2.3 | 23% |
| Hs | HpyCH4 IV | 0.002 U/ug | 649 | 115 | 2.3 | 29% |
| Total | 40299 | 6865 | 3.0 | 26% | ||
| MSLL | Enzyme | Length (kb) | Reads | Paired hits | Avg Span (kb) | Avg % |
| La | SalI | 35–60 | 7536 | 93 | 46 | 78% |
| Lb | HpaII | 35–60 | 8733 | 146 | 39 | 82% |
| Lc | HpaII | 20–35 | 9639 | 483 | 26 | 78% |
| Ld | SalI | 20–35 | 10916 | 469 | 24 | 56% |
| Le | HpaII | 12–20 | 8920 | 630 | 15 | 65% |
| Lf | SalI | 60–100 | 5436 | 10 | 80 | 78% |
| Lh | SalI | >100 | 6158 | 0 | n/a | n/a |
| Li | HpaII | 60–100 | 8495 | 21 | 70 | 86% |
| Lj | HpaII | >100 | 7715 | 0 | n/a | n/a |
| Lz | HpaII | 7–12 | 7184 | 807 | 11 | 55% |
| Total – Hpa | 50,686 | 2087 | 66% | |||
| Total – Sal | 30,046 | 572 | 60% | |||
| Total | 80,732 | 2659 | 65% | |||
1 The ZMMB libraries followed by the indicated suffix (e.g. ZMMBLj) can be ordered from the Arizona Genomics Institute [43].
2 HMPR clones have size ranges of 2–4 kb.
3 The percent of sequence masked within the spanned BAC regions, as determined by RepeatMasker against the TIGR maize repeat database.
Comparison of different gene enrichment and unfiltered methods, showing percentages of sequence masked by repeats in various categories of the TIGR v4.0 maize repeat database, as well as percentages of sequences having similarity to a maize EST contig.
| Sequences1 | 30,046 | 50,686 | 40,299 | 133,806 | 172,600 | 191,715 | 49,364 |
| Avg. size (bp) | 671 | 698 | 932 | 1172 | 1094 | 337 | 748 |
| Retro | 20% | 20% | 11% | 26% | 12% | 4% | 64% |
| Transposon | 0.6% | 0.9% | 0.7% | 0.5% | 0.7% | 0.3% | 0.6% |
| MITE | 0.6% | 1.3% | 1.0% | 0.4% | 0.7% | 0.6% | 0.2% |
| Centromere | 0.2% | 0.9% | 0.2% | 0.4% | 0.09% | 0.02% | 0.8% |
| Telomere | 0.001% | 0.03% | 0.08% | 0.02% | 0.02% | 0.02% | 0.01% |
| Ribosomal | 0.4% | 1.5% | 1.3% | 0.2% | 0.1% | 0.1% | 2.1% |
| Unknown | 14% | 19% | 13% | 9% | 14% | 7% | 14% |
| Total repeat | 37% | 42% | 28% | 38% | 27% | 12% | 82% |
| EST Contigs | 31% | 16% | 22% | 25% | 21% | 22% | 7% |
1 The MF, HC sequences are contigs of reads; all other types are direct reads.
Figure 1Alignment of MSLL-. The horizontal scale in the upstream and downstream regions is basepairs; in the gene interior, it is fractional distance along the gene length. The vertical scale indicates the fraction of gene-aligning sequences which cover the gene region in question.
Figure 2Starting point location of alignments of sequence types to gene regions. MSLL, HMPR, MF, HC, RM, and UF sequences were aligned to 151 curated gene sequences (see text). Bars indicate the percentage of the alignments for each sequence type for which the initial base was located in the indicated gene region (exon, intron, 5' intergenic, 3' intergenic). The "size" field shows the total amount of sequence of each type.
Figure 3Example alignments of MSLL BACs with maize genomic assemblies. (a) routine observation, with the genes shown by arrows and the areas between the genome primarily comprised of LTR retrotransposons and a few other repeats. (b) unusual alignment, where MSLL ends flank apparently methylated genes. Predicted genes are shown by arrows, with the size and orientation indicating the predicted size and transcriptional orientation of the candidate gene. Each vertical lines indicates a gap in the sequence assembly, while the triangles indicate sites for the restriction enzyme SalI, which was used to generate the two BACs shown. The genomic sequence scaffolds depicted are from Bruggmann and coworkers [31].
Figure 4The maize mini-BAC page [28]. The Name entries link to the Genbank record and the Contig entries link to the WebFPC contig display [48] to show the position of the clone. Clicking on a BAC icon displays a Genome Browser for the BAC, with additional tracks for RM, HC, and MF sequences.