| Literature DB >> 19936069 |
Carol Soderlund1, Anne Descour, Dave Kudrna, Matthew Bomhoff, Lomax Boyd, Jennifer Currie, Angelina Angelova, Kristi Collura, Marina Wissotski, Elizabeth Ashley, Darren Morrow, John Fernandes, Virginia Walbot, Yeisoo Yu.
Abstract
Full-length cDNA (FLcDNA) sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 5' and 3' UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs), only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org).Entities:
Mesh:
Substances:
Year: 2009 PMID: 19936069 PMCID: PMC2774520 DOI: 10.1371/journal.pgen.1000740
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
ZM_BF ESTs after removal of slippage and poly(A).
| EST Prefixes | Total | |||
| ZM_BFa | ZB_BFb | ZM_BFc | ||
| From library | B | B | C | B and C |
| # ESTs | 19,027 | 227,558 | 111,421 | 358,006 |
| # Mate-pairs | 7,862 | 89,674 | 43,146 | 140,682 |
| % 5′ ESTs | 55% | 56% | 58% | 56% |
| % 3′ ESTs | 44% | 42% | 41% | 42% |
| Average length | 673 | 671 | 688 | 677 |
These ESTs were sequenced as a preliminary project and were submitted to GenBank in 2002.
TE analysis of the 27k FLcDNAs.
| LTR | DNA TE | Non-LTR | Total | |
|
| ||||
| # of insertions | 1068 | 1073 | 168 | 2309 |
| Length (kb) | 227.5 | 370.2 | 59.8 | 657.5 |
|
| ||||
| 5′ UTR | 75 | 93 | 30 | 198 (12%) |
| 5′ UTR-CDS | 55 | 51 | 4 | 110 (6%) |
| CDS | 134 | 217 | 25 | 376 (22%) |
| CDS-3′ UTR | 92 | 66 | 18 | 176 (10%) |
| 3′ UTR | 375 | 394 | 51 | 820 (49%) |
| Total | 731 | 821 | 128 | 1680 |
SSR analysis of the 27k FLcDNAs.
| SSR location | Di-NR | Tri-NR | Tetra-NR | Penta-NR | Hexa-NR | Total | % |
| 5′ UTR | 177 | 256 | 124 | 225 | 32 | 814 | 38.2% |
| 5′ UTR-CDS | 1 | 4 | 2 | 3 | 0 | 10 | 0.5% |
| CDS | 17 | 752 | 15 | 22 | 40 | 846 | 39.7% |
| CDS-3′ UTR | 0 | 1 | 1 | 6 | 0 | 8 | 0.4% |
| 3′ UTR | 121 | 85 | 102 | 130 | 16 | 454 | 21.3% |
| Total | 316 | 1098 | 244 | 386 | 88 | 2132 | |
| % | 14.8% | 51.5% | 11.4% | 18.1% | 4.1% |
NR = nucleotide repeat.
Figure 1Homolog analysis with rice, sorghum, Arabidopsis, and poplar.
The 27k FLcDNAs were searched against annotated protein gene models of each species, and the overlapping matches between species are displayed in the Venn diagram. Two overlaps (25 overlapping hits between rice and poplar, and two overlapping hits between Arabidopsis and sorghum) are not listed.
Gene Ontology (GO) analysis of putative unique maize FLcDNAs.
| Ontology | GO ID | GO lineage | FLcDNA count | Child GO description |
| Biological Process (GO:0008150) | GO:0044260 | Cellular macromolecule metabolic process | 3 | lipoprotein metabolic process, lipoprotein biosynthetic process, protein amino acid lipidation, phosphoinositide biosynthetic process, GPI anchor biosynthetic process |
| GO:0043933 | Macromolecular complex subunit organization | 23 | macromolecular complex assembly, protein complex assembly, protein oligomerization, protein homooligomerization | |
| GO:0065007 | Biological regulation | 112 | regulation of biological quality, homeostatic process, temperature homeostasis, homoiothermy | |
| GO:0032501 | Multicellular organismal process | 58 | temperature homeostasis, homoiothermy | |
| GO:0051869 | Response to stimulus | 73 | response to stress, response to abiotic stimulus, response to temperature stimulus, response to cold, response to freezing | |
| Molecular Function (GO:0003674) | GO:0005488 | Binding | 53 | water binding, ice binding |
| GO:0030528 | Transcription regulator activity | 3 | Transcription activator activity | |
| GO:0060089 | Molecular transducer activity | 20 | Ligand-dependent nuclear receptor activity | |
| GO:0045735 | Nutrient reservoir activity | 4 | Nutrient reservoir activity | |
| GO:0015457 | Auxiliary transport protein activity | 1 | channel regulator activity, channel inhibitor activity, potassium channel regulator activity, ion channel inhibitor activity, potassium channel inhibitor activity |
Top 10 putative transcription factors in the 27k FLcDNA set compared with rice and Arabidopsis.
| Maize | Japonica rice | Arabidopsis | ||||
| TF family | count | % | count | % | count | % |
| bHLH | 141 | 7.2% | 184 | 7.7% | 127 | 6.6% |
| MYB | 137 | 7.0% | 138 | 5.8% | 150 | 7.8% |
| bZIP | 127 | 6.5% | 109 | 4.6% | 72 | 3.7% |
| HB | 103 | 5.2% | 103 | 4.3% | 87 | 4.5% |
| C3H | 96 | 4.9% | 90 | 3.8% | 59 | 3.1% |
| AP2/EREBP | 90 | 4.6% | 182 | 7.6% | 146 | 7.6% |
| NAC | 85 | 4.3% | 149 | 6.3% | 107 | 5.6% |
| WRKY | 82 | 4.2% | 113 | 4.7% | 72 | 3.7% |
| C2H2 | 64 | 3.3% | 113 | 4.7% | 134 | 7.0% |
| MADS | 61 | 3.1% | 83 | 3.5% | 104 | 5.4% |
Gao et al. [27].
Guo et al. [28].
Percent of 1,965 putative TFs identified from the maize FLcDNAs.
Percent of 2,383 TFs identified from the japonica rice annotated genes.
Percent of 1,922 TFs identified from the Arabidopsis annotated genes.
Top 10 GO Slim annotations and number of FLcDNAs.
| Biological | Cellular | Molecular | |||
| metabolic process | 4819 | membrane | 6359 | binding | 6150 |
| cellular process | 4785 | nucleus | 4162 | catalytic activity | 5326 |
| response to stress | 3873 | plasma membrane | 3849 | protein binding | 4933 |
| biosynthetic process | 2894 | plastid | 3395 | transferase activity | 4102 |
| transport | 2431 | cytoplasm | 2917 | nucleotide binding | 4076 |
| transcription | 2333 | extracellular region | 1886 | hydrolase activity | 3950 |
| biological process | 2059 | intracellular | 1862 | DNA binding | 2666 |
| protein modification process | 2034 | mitochondrion | 1450 | oxidoreductase activity | 2567 |
| response to abiotic stimulus | 1977 | vacuole | 1386 | kinase activity | 1973 |
| catabolic process | 1832 | cell wall | 1138 | nucleic acid binding | 1368 |
Summary of FLcDNA mapping to the maize genome.
| Total cDNAs | 27,455 | 100.0% |
| Mapped | 25,753 | 93.8% |
| Single locus | 24,354 | 88.7% |
| Multi-loci | 1,399 | 5.1% |
| Unmapped | 1,531 | 5.6% |
| Homologs | 1,417 | 5.2% |
| Unknown | 114 | 0.4% |
| Mapped to unknown chromosome | 35 | 0.1% |
| Contaminants | 136 | 0.5% |
Figure 2FLcDNA density heat-map displayed on the maize chromosomes.
The 24,354 FLcDNAs that mapped to a single locus were counted in 1Mb bins (number of FLcDNA/Mb), color-coded, and plotted on the maize chromosomes. The yellow indicates average density (∼12cDNAs/Mb), the red is higher than average, and the blue is lower. The brown-colored bars next to each chromosome represent the regions where FLcDNA density is higher than average +2 standard deviations ( = 32 FLcDNAs/Mb).
Figure 3Detection of homeologous genes in the maize genome.
Potential homeologous genes for the 24,354 single locus-mapped FLcDNAs were computed by using relaxed mapping parameters for aligning them to the maize genome. Approximately 44% SL-FLcDNAs had homeologous regions (ID; identity, AL; alignment length, SL; single locus, ML; multi loci (>4), 2–4L; 2–4 mapped loci).
Assemblies of four maize FLcDNA projects.
| Project | #FLcDNA | Line | Libraries |
| Yu-BC | 27455 | B73 | Different tissues and treatments |
| Wang | 2073 | Inbred W22 | Endosperm development |
| Messing | 3370 | Han 21 | Osmotically stressed seedling |
| Feldman | 36316 | Hybrid | Different tissues |
The 69,306 downloaded FLcDNA sequences were further trimmed with our scripts, which removed some clones from the assembled set.
The first letter of each project name is used to indicate if there was at least one FlcDNA from the project in the contig.
Strict assembly 1: 100% identity, ≤10 ignored end bases (see Materials and Methods).
Semi-strict assembly 2: 90% identity, ≤15 ignored end bases.
Loose assembly 3: Reverse orientation, 80% identity, ≤350 ignored end bases.
Length of individual FLcDNA clones and assembled FLcDNA contigs.
| 27k FLcDNA Assemblies | 69k FLcDNA Assemblies | |||||||
| Lengths in bases | 27k FL cDNAs | Strict | Semi | Loose | 69k FL cDNAs | Strict | Semi | Loose |
| <100 | 0 | 0 | 0 | 0 | 171 | 156 | 67 | 64 |
| 101–500 | 1052 | 685 | 614 | 435 | 5535 | 4992 | 2774 | 1046 |
| 501–1000 | 4552 | 3833 | 3644 | 2353 | 23530 | 21311 | 14676 | 7603 |
| 1001–1500 | 9903 | 9023 | 8536 | 6613 | 20317 | 18619 | 14007 | 9883 |
| 1501–2000 | 8251 | 7442 | 7071 | 6079 | 13680 | 12526 | 10077 | 8278 |
| 2001–2500 | 2768 | 2579 | 2470 | 2317 | 4585 | 4298 | 3822 | 3505 |
| 2501–3000 | 697 | 672 | 659 | 626 | 1112 | 1075 | 1027 | 1007 |
| 3001–3500 | 188 | 187 | 182 | 185 | 238 | 233 | 238 | 272 |
| 3501–4000 | 34 | 35 | 33 | 35 | 36 | 36 | 38 | 41 |
| >4001 | 10 | 10 | 12 | 13 | 10 | 10 | 13 | 16 |
| Total | 27455 | 24467 | 23221 | 18656 | 69214 | 63256 | 46739 | 31715 |
Analysis of assembled contigs.
| 27k FLcDNA assemblies | 69k FLcDNA assemblies | |||||
| Strict | Semi | Loose | Strict | Semi | Loose | |
| FLcDNA in contigs | 5282 | 7373 | 14450 | 10453 | 35989 | 54119 |
| Reversed | 0 | 0 | 146 | 0 | 0 | 1043 |
| Total Contigs | 2294 | 3139 | 5651 | 4495 | 13513 | 16620 |
| 2 clones | 1823 | 2402 | 3669 | 3521 | 8422 | 7138 |
| 3–5 clones | 451 | 700 | 1885 | 933 | 4681 | 8126 |
| 6–10 clones | 20 | 37 | 95 | 38 | 382 | 1253 |
| 11–20 clones | 0 | 0 | 1 | 3 | 26 | 95 |
| 21–50 clones | 0 | 0 | 1 | 0 | 2 | 8 |
| Contigs ≥4 FLcDNA | 150 | 230 | 731 | 311 | 2053 | 5341 |
| SNPs | 0 | 35 | 23263 | 0 | 4786 | 176626 |
| Contigs with SNPs | 0 | 10 | 315 | 0 | 780 | 3317 |
| GPs | 0 | 13 | 360 | 0 | 935 | 41060 |
| Contigs with GPs | 0 | 10 | 139 | 0 | 505 | 10889 |
| with SNPs+GPs | 0 | 4 | 111 | 0 | 423 | 1747 |
| Alternative 5′ sites | 48% | 49% | 54% | 44% | 53% | 60% |
| Clustered sites | 44% | 26% | 50% | 39% | 46% | 52% |
| ≥100 (# ≥2 FLc) | 31%(13) | 31%(17) | 32%(33) | 22%(45) | 26%(148) | 29%(248) |
| ≥50<100 | 7% (1) | 7% (6) | 9% (7) | 8% (3) | 7% (28) | 9% (68) |
| ≥25<50 | 5% (3) | 5% (5) | 7% (9) | 8% (6) | 8% (45) | 9%(110) |
| ≥10<25 | 5% (3) | 5% (7) | 6%(14) | 9%(10) | 10% (80) | 10%(210) |
| Alternative 3′ sites | 44% | 45% | 52% | 47% | 54% | 61% |
| Clustered sites | 33% | 34% | 40% | 33% | 37% | 43% |
| ≥100 (# ≥2 FLc) | 7% (1) | 8% (1) | 12% (8) | 8% (5) | 9% (37) | 14%(115) |
| ≥50<100 | 9% (6) | 8%(10) | 9%(18) | 8%(13) | 8% (55) | 9%(138) |
| ≥25<50 | 8% (4) | 8% (9) | 9%(18) | 7%(10) | 8% (69) | 9%(199) |
| ≥10<25 | 8% (9) | 9%(17) | 9%(38) | 8%(22) | 10%(168) | 11%(413) |
Reverse complemented clones (only allowed in the loose assembly).
SNPs and GPs (gap polymorphisms) were only identified in these contigs.
Many of these SNPs and GPs for the loose assembly are in the end regions.
Percentage of ends that are not the first/last two bases of the consensus sequence.
Percentage of clustered ends, where each cluster contains the ends within 10 bases of another.
Percentage of clusters ≥100 bases from the previous cluster (number of these that have at least 2 FLcDNAs in both clusters). The next three rows are the same but with different distances.
Figure 4A contig in the 69k loose assembly with multiple SNPs and GPs.
(A) The red vertical lines indicate a mismatch with the consensus sequence, and green vertical lines indicate a gap in relation to the consensus sequence. The clone prefix indicates the project (BC – Yu, W – Wang, M – Messing, F – Feldman). (B) Base view of the 5′ ends of 7 FLcDNAs with alternative start sites in relation to the consensus sequence. This is a close-up of bases 97–164 from the alignment shown in (A). The red bases do not agree with the consensus.
Size and number of small gapsa.
| 27k Assembly | 69k assembly | |||
| Gap Sizes | Semi | Loose | Semi | Loose |
| 1 | 437 | 3294 | 9446 | 18439 |
| 2 | 156 | 1411 | 2470 | 5902 |
| 3 | 141 |
| 2288 |
|
| 4 | 58 | 417 | 919 | 2049 |
| 5 | 54 | 290 | 572 | 1236 |
| 6 |
|
|
|
|
| 7 | 32 | 154 | 359 | 733 |
| 8 | 21 | 100 |
| 802 |
| 9 |
|
| 373 |
|
| 10 | 21 | 53 | 121 | 279 |
| 11 | 14 | 37 | 95 | 203 |
| 12 |
|
|
|
|
| 13 | 14 | 27 | 71 | 155 |
| 14 |
| 39 | 70 | 128 |
| 15 | 21 |
|
|
|
| 16 | 20 | 23 | 53 | 95 |
| 17 | 10 | 21 | 17 | 49 |
| 18 | 5 |
|
|
|
| 19 | 5 | 8 | 6 | 29 |
| 20 | 3 | 18 | 4 | 52 |
| 21 | 1 | 17 | 5 | 72 |
| 22 | 1 | 6 | 5 | 40 |
| 23 | 0 | 6 | 2 | 27 |
| 24 | 0 | 8 | 2 | 52 |
| 25 | 1 | 6 | 4 | 28 |
Bold text highlight counts that are greater than the previous.