| Literature DB >> 12620126 |
Yunxia Zhu1, Benjamin L King, Babak Parvizi, Brian P Brunk, Christian J Stoeckert, John Quackenbush, Joel Richardson, Carol J Bult.
Abstract
Databases of experimentally generated and computationally derived transcript sequences are valuable resources for genome analysis and annotation. The utility of such databases is enhanced when the sequences they contain are integrated with such biological information as genomic location, gene function, gene expression and phenotypic variation. We present the analysis and results of a semi-automated process of connecting transcript assemblies with highly curated biological information for mouse genes that is available through the Mouse Genome Informatics (MGI) database.Entities:
Mesh:
Year: 2003 PMID: 12620126 PMCID: PMC151306 DOI: 10.1186/gb-2003-4-2-r16
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Selected database content statistics for the MGI information resource
| Category | Number |
| References | 74,845 |
| Genetic markers | 51,398 |
| Genes | 31,708 |
| Genetic markers mapped | 41,342 |
| Genes mapped | 22,645 |
| Curated mouse/human orthologs | 7,566 |
| Genes with molecular probes and segments data | 25,672 |
| Number of genetic markers with molecular polymorphisms | 12,718 |
| Number of genes with molecular polymorphisms | 3,599 |
| MGI markers with GenBank sequence associations | 29,144 |
| Genes with SwissProt-TrEMBL protein sequences | 13,633 |
The database content of MGI is updated daily. The current database content statistics can be found at the MGI FTP site (MGI Data and Statistical Reports). MGI contains information on genetic markers (such as sequence-tagged site (STS) markers), genes and other genomic features.
Figure 1Association of MGI genes with TIGR mouse TCs or DoTS mouse DTs through the shared references of GenBank accession identifiers can be represented as a set of graphs. The associations can be classified into four categories: one-to-one, one-to-many, many-to-one, and many-to-many.
Figure 2Examples of MGI-to-TC and TC-to-MGI associations with supporting GenBank sequences. (a) MGI genes may associate with zero, one or more TCs. Each association is supported by one or more GenBank sequences that are shared by the MGI gene and the related TC. For example, the association of MGI gene Nes (nestin; MGI:101784) with TC577815 is supported by AK012622 and with TC601026 is supported by AK009706, AF076623, AA166324, BC022629 and C78523. (b) TCs may associate with zero, one or more MGI genes. Each association is supported by one or more GenBank sequences that are shared by the TC and the related MGI gene.
Statistics of associations between MGI genes and transcript assemblies
| Datasets | TIGR TCs | DoTS DTs |
| Sequences used to build TCs and DTs | 2,611,422 | 2,495,338 |
| Sequences included in the assemblies (excluding singletons) | 2,254,999 | 2,044,540 |
| Assemblies (excluding singletons) | 105,520 | 128,341 |
| GenBank sequences shared by MGI markers and assemblies | 43,200 | 52,754 |
| MGI genes linked to assemblies through GenBank sequences | 20,783 | 24,340 |
| Assemblies linked to MGI genes through GenBank sequences | 20,942 | 25,799 |
Classification of associations between MGI genes and both DT and TC gene indices
| Datasets | TIGR | DoTS |
| One-to-one MGI gene to assembly | 13,451 | 16,996 |
| One-to-many MGI gene to assembly* | 1,975 | 2,522 |
| Many-to-one MGI gene to assembly† | 1,932 | 1,675 |
| Many-to-many MGI gene to assembly‡ | 454 | 531 |
*The link of one MGI gene to multiple assemblies is counted as one association. †The link of multiple MGI genes to one assembly is ounted as one association. ‡The link of multiple MGI genes to multiple assemblies is counted as one association.
Figure 3Transcripts can be aligned to the mouse genome assembly using BLAT search at the UCSC Genome Browser. Aligning regions (usually exons) are shown as black blocks. The aligning regions are connected by lines representing gaps (usually spliced-out introns), with arrowheads indicating the direction of transcription. (a) The alignment of TCs and DTs associated with MGI gene Ncam1(MGI:97281) to the annotated Ncam1 gene on chromosome 9 shows that alternative spliced exons and alternative poly(A) addition sites cause multiple transcripts from one gene. The tracks of *TC640342 and *DT.487850 are matches with lower percentage identity over a shorter region of the sequence. (b) The alignment of TCs and DTs associated with the MGI gene Dtna (MGI:106039) to the annotated Dtna gene on chromosome 18 demonstrates that three alternative promoters are actively used, as suggested by published experimental results.
Comparison of the constituent sequences of TCs and DTs
| Category | Number |
| DT and TC pairs analyzed* | 11,126 |
| DT and TC that have the same constituent sequences† | 1,305 |
| DT is a subset of TC† | 1,416 |
| TC is a subset of DT† | 736 |
| DT and TC assemblies that share one sequence | 148 |
| DT and TC assemblies that share 2-4 sequences | 466 |
| DT and TC assemblies that share 5-9 sequences | 709 |
| DT and TC assemblies that share 10-99 sequences | 4,890 |
| DT and TC assemblies that share 100 or more sequences | 1,448 |
| DT and TC assemblies that share zero sequence | 8 |
*Only those with one-to-one relationship to the same MGI genes were compared. †These were not included in the count of DT and TC with shared sequences.