| Literature DB >> 19936048 |
Fusheng Wei1, Joshua C Stein, Chengzhi Liang, Jianwei Zhang, Robert S Fulton, Regina S Baucom, Emanuele De Paoli, Shiguo Zhou, Lixing Yang, Yujun Han, Shiran Pasternak, Apurva Narechania, Lifang Zhang, Cheng-Ting Yeh, Kai Ying, Dawn H Nagel, Kristi Collura, David Kudrna, Jennifer Currie, Jinke Lin, Hyeran Kim, Angelina Angelova, Gabriel Scara, Marina Wissotski, Wolfgang Golser, Laura Courtney, Scott Kruchowski, Tina A Graves, Susan M Rock, Stephanie Adams, Lucinda A Fulton, Catrina Fronick, William Courtney, Melissa Kramer, Lori Spiegel, Lydia Nascimento, Ananth Kalyanaraman, Cristian Chaparro, Jean-Marc Deragon, Phillip San Miguel, Ning Jiang, Susan R Wessler, Pamela J Green, Yeisoo Yu, David C Schwartz, Blake C Meyers, Jeffrey L Bennetzen, Robert A Martienssen, W Richard McCombie, Srinivas Aluru, Sandra W Clifton, Patrick S Schnable, Doreen Ware, Richard K Wilson, Rod A Wing.
Abstract
Most of our understanding of plant genome structure and evolution has come from the careful annotation of small (e.g., 100 kb) sequenced genomic regions or from automated annotation of complete genome sequences. Here, we sequenced and carefully annotated a contiguous 22 Mb region of maize chromosome 4 using an improved pseudomolecule for annotation. The sequence segment was comprehensively ordered, oriented, and confirmed using the maize optical map. Nearly 84% of the sequence is composed of transposable elements (TEs) that are mostly nested within each other, of which most families are low-copy. We identified 544 gene models using multiple levels of evidence, as well as five miRNA genes. Gene fragments, many captured by TEs, are prevalent within this region. Elimination of gene redundancy from a tetraploid maize ancestor that originated a few million years ago is responsible in this region for most disruptions of synteny with sorghum and rice. Consistent with other sub-genomic analyses in maize, small RNA mapping showed that many small RNAs match TEs and that most TEs match small RNAs. These results, performed on approximately 1% of the maize genome, demonstrate the feasibility of refining the B73 RefGen_v1 genome assembly by incorporating optical map, high-resolution genetic map, and comparative genomic data sets. Such improvements, along with those of gene and repeat annotation, will serve to promote future functional genomic and phylogenomic research in maize and other grasses.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19936048 PMCID: PMC2773423 DOI: 10.1371/journal.pgen.1000728
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Physical and genetic features of AR182.
Genetic (center), Optical (left), and physical and synteny maps (right) of AR182 are shown. (A) Magnified view of the AR182 pseudomolecule in silico restriction map (blue track) and Optical Map (gold track) alignment after finishing (each box = 1 restriction fragment); turquoise box demarcates zoomed region on (B) showing the entire alignment ∼22 Mb (∼1,000 ordered restriction fragments); (C) A simplified maize Chr4 genetic map; (D) The maize Chr4 physical map; (E) Overall syntenic relationships among maize, sorghum and rice with respect to AR182. This three-way synteny map was built by SyMap [83] from pseudomolecule to pseudomolecule comparisons.
Summary of transposable elementsa.
| No. of TE families | No. of TEs (×1000) | Coverage(Mb) | Fraction of genome | ||||||||||
| Class | Superfamilies | B73 | AR182 | AGP182 | B73 | AR182 | AGP182 | B73 | AR182 | AGP182 | B73 | AR182 | AGP182 |
|
|
| 109 | 77 | 74 | 404 | 4.572 | 4.734 | 484 | 6.333 | 6.421 | 23.7 | 29.2 | 28.8 |
|
| 134 | 69 | 69 | 477 | 3.488 | 3.691 | 948 | 8.448 | 8.668 | 46.4 | 38.9 | 38.9 | |
|
| 163 | 91 | 89 | 222 | 2.294 | 2.361 | 92.9 | 1.19 | 1.274 | 4.5 | 5.5 | 5.7 | |
|
| 31 | 31 | 31 | 35 | 0.31 | 0.31 | 20 | 0.225 | 0.224 | 1 | 1 | 1 | |
|
| 4 | 2 | 2 | 2 | 0.028 | 0.024 | 0.5 | 0.007 | 0.007 | 0 | 0 | 0 | |
|
| 441 | 270 | 265 | 1,140 | 10.69 | 11.12 | 1,546 | 16.2 | 16.594 | 75.6 | 74.6 | 74.4 | |
|
|
| 156 | 66 | 65 | 12.4 | 0.092 | 0.095 | 64.7 | 0.576 | 0.586 | 3.2 | 2.7 | 2.7 |
|
| 230 | 178 | 181 | 31.8 | 0.42 | 0.432 | 23.4 | 0.31 | 0.317 | 1.1 | 1.4 | 1.5 | |
|
| 155 | 88 | 88 | 12.9 | 0.163 | 0.166 | 20.2 | 0.219 | 0.221 | 1 | 1 | 1 | |
|
| 127 | 45 | 47 | 14 | 0.165 | 0.166 | 2.3 | 0.025 | 0.025 | 0.1 | 0.1 | 0.1 | |
|
| 179 | 137 | 137 | 49.7 | 0.579 | 0.588 | 19.8 | 0.229 | 0.232 | 1 | 1 | 1.1 | |
|
| 8 | 6 | 6 | 22 | 1.149 | 1.299 | 45.5 | 0.653 | 0.647 | 2.2 | 3 | 3 | |
|
| 855 | 520 | 524 | 143 | 2.568 | 2.746 | 176 | 2.012 | 2.028 | 8.6 | 9.2 | 9.4 | |
|
| 1,296 | 790 | 789 | 1,283 | 13.26 | 13.866 | 1,722 | 18.22 | 18.622 | 84.5 | 83.8 | 83.8 | |
a data of genome and AGP182, the AR182 corresponding sequence in B73 RefGen_v1 are from Schnable et al. .
Figure 2TE and gene distribution along AR182.
The distribution was constructed based on nucleotide length of the related TE in 100-kb sliding windows. The numbers at the left vertical axis represent the nucleotide length of related TE classifications. The numbers in the right axis are the gene number counts.
Comparison of maize, sorghum, and rice genesa.
| Parameter | Gene set | Mean | Std dev. | Median | Max |
|
| Zm (all) | 3.5 | 3.7 | 2.4 | 29.2 |
| Zm (syn) | 3.4 | 3.8 | 2.1 | 23 | |
| Sb (syn) | 3.1 | 2.5 | 2.4 | 14.5 | |
| Os (syn) | 2.9 | 2.2 | 2.3 | 12 | |
|
| Zm (all) | 306 | 394 | 156 | 3,389 |
| Zm (syn) | 251 | 337 | 134 | 3,087 | |
| Sb (syn) | 246 | 329 | 133 | 3,090 | |
| Os (syn) | 243 | 330 | 131 | 3,627 | |
|
| Zm (all) | 482 | 1079 | 151 | 18,487 |
| Zm (syn) | 498 | 1123 | 144 | 18,487 | |
| Sb (syn) | 361 | 532 | 149 | 8,794 | |
| Os (syn) | 329 | 478 | 147 | 9,436 | |
|
| Zm (all) | 5 | 4.6 | 3 | 37 |
| Zm (syn) | 5.3 | 4.9 | 4 | 28 | |
| Sb (syn) | 5.6 | 5.3 | 4 | 28 | |
| Os (syn) | 5.6 | 5.2 | 4 | 28 |
a Where multiple transcripts are described for a gene, the one with the longest coding sequence was used.
b Zm = maize; Sb = sorghum; Os = rice. “all” refers to the entire set of 544 maize genes; “syn” refers to a set of 341 presumed orthologous and syntenic genes in each species. For consistency, only exons and introns within the CDS were characterized in the “syn” set.
Figure 3Comparative mapping of protein-coding and miRNA genes in orthologous segments of the rice, sorghum, and maize genomes.
Abbreviations: Osj2 = rice (japonica) chromosome 2; Sb4 = sorghum chromosome 4; Zm4 = maize chromosome 4 (AR182); Zm5 = maize chromosome 5 (homeologous region). All mappings are drawn relative to rice as a common reference. Genes are shown as tick marks in the outer radius of correspondence lines. Inversions are indicated with yellow highlighting. For Zm4 the density of repetitive sequence is shown in gray. Zm5 mappings are to individual BACs (boxes) projected onto the FPC map. (A) Mappings of protein-coding genes based on reciprocal best hit. (B) Mappings of miRNA genes based on family membership.
Figure 4Distribution of truncated genes among syntenic and non-syntenic maize loci.
Protein coding length ratio (length of maize/length of sorghum or rice) between highest scoring homologs is used as a measure of truncation. Non-syntenic loci contrast with syntenic loci in showing a bimodal distribution.
Figure 5Box-plots showing divergence rates among syntenic (SYN) and non-syntenic (nSYN) maize genes relative to their best scoring homolog in sorghum.
(A) Ks. (B) Ka. Genes were categorized by CDS length ratio using a threshold of 0.8 (maize CDS length / sorghum CDS length). Sample sizes: nSYN(<0.8) n = 68; nSYN(≥0.8) n = 32; SYN(<0.8) n = 54; SYN(≥0.8) n = 340.
Total number of distinct small RNAs originating from different classes of sequences.
| Library | DNA transposons | LTR retro-transposons | Centromeric and satellite sequences | Tandem repeats | Genes |
|
| 1837 | 60154 | 578 | 36424 | 14161 |
|
| 4286 | 124401 | 986 | 71688 | 25281 |
|
| 4626 | 134070 | 970 | 79825 | 28035 |
|
| 2633 | 219291 | 1342 | 145308 | 35234 |
|
| 4899 | 153478 | 751 | 98543 | 36230 |
|
| 3656 | 138279 | 925 | 86358 | 27788 |
Figure 6Distributions of DNA transposons and their related small RNAs.
(A) Number of distinct DNA transposon-related small RNAs under different genetic backgrounds. (B) The total number of DNA transposon-related small RNAs under different genetic backgrounds. (C) Small RNAs discrepancies between B73 and K55-wt backgrounds.
Sequence and gene content comparison between AR182 and AGP182.
| Feature | AR182 | AGP182 |
|
| 21,702,972 | 22,259,975 |
|
| 21,640,322 | 22,140,315 |
|
| 910 | 1170 |
|
| 140,460 | 555,826 |
|
| 0.63% | 2.50% |
|
| 2,192,652 | 2,288,766 |
|
| 19,307,210 | 19,295,723 |
|
| 544 | 493 |
|
| 304 | 305 |
|
| 125 | 122 |
|
| 115 | 63 |
a number difference is due to unresolved tandem duplication.
b excluding 3 genes from contamination.