| Literature DB >> 16907970 |
Jakob Fredslund1, Lene H Madsen, Birgit K Hougaard, Anna Marie Nielsen, David Bertioli, Niels Sandal, Jens Stougaard, Leif Schauser.
Abstract
BACKGROUND: Complete or near-complete genomic sequence information is presently only available for a few plant species representing a large phylogenetic diversity among plants. In order to effectively transfer this information to species lacking sequence information, comparative genomic tools need to be developed. Molecular markers permitting cross-species mapping along co-linear genomic regions are central to comparative genomics. These "anchor" markers, defining unique loci in genetic linkage maps of multiple species, are gene-based and possess a number of features that make them relatively sparse. To identify potential anchor marker sequences more efficiently, we have established an automated bioinformatic pipeline that combines multi-species Expressed Sequence Tags (EST) and genome sequence data.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16907970 PMCID: PMC1570147 DOI: 10.1186/1471-2164-7-207
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The pipeline of the marker candidate algorithm. In the first step, EST collections of selected species are compared with the proteome of the reference species in order to estimate the copy number. Sequences with one or two homologs in the Arabidopsis proteome are considered because Arabidopsis has undergone a recent whole genome duplication whereas legumes have not. EST sequences passing this criterion are compared to L. japonicus and M. truncatula genomic sequences in order to score the presence and length of introns. Sequences with the same Arabidopsis reference are then aligned and primers are designed using this alignment as input. For this purpose, the PriFi software [13] is used.
Figure 2Phylogenetic trees of legumes and grasses. Phylogenetic relationship of a) legumes and b) grasses. Species with sequence information used in this study are shown together with selected other species. (Modified after [40])
Arabidopsis homolog count for the collections of gene indices.
| Total number of gene indices | 28,460 | 36,878 | 63,676 | 9,484 |
| One Arabidopsis homolog | 2,647 (278) | 3,613 (869) | 4,281 | 1,441 |
| Two Arabidopsis homologs | 1,613 (172) | 2,231 (522) | 3,349 | 1,009 |
| Total (one and two Arabidopsis homologs) | 4,062 (450) | 5,644 (1,391) | 7,630 | 2,450 |
The species name is followed by the Release version (in parenthesis), and the number of gene indices (GI, combined EST clusters and singleton ESTs) are indicated for each species. Numbers of ESTs with one and two Arabdopsis homologs are listed. The numbers in parenthesis indicate the number of gene indices which show an intron when compared to the respective genomic sequence. L. japonicus GIs were compared to L. japonicus genomic sequences, whereas M. truncatula GIs were compared to M. truncatula genomic sequences.
The numbers of identified CATS and their anchoring in current L. japonicus and M. truncatula maps
| Reference Genome | ||
| Number of CATS | 148 | 311 |
| Number of map- anchored CATS | 29 | 66 |
Figure 3Distribution of CATS on a) . Red and green triangles indicate positions of markers with one and two homologous gene sequences in Arabidopsis, respectively. Chromosomes scale according to their genetic length.
Figure 4Distribution of CATS on rice chromosomes. Red and blue marks indicate the positions of markers with one and two rice homologous gene sequences, respectively. The scale of chromosome diagrams reflects their relative physical sizes.