| Literature DB >> 24767513 |
Haibao Tang, Vivek Krishnakumar, Shelby Bidwell, Benjamin Rosen, Agnes Chan, Shiguo Zhou, Laurent Gentzbittel, Kevin L Childs, Mark Yandell, Heidrun Gundlach, Klaus F X Mayer, David C Schwartz, Christopher D Town1.
Abstract
BACKGROUND: Medicago truncatula, a close relative of alfalfa, is a preeminent model for studying nitrogen fixation, symbiosis, and legume genomics. The Medicago sequencing project began in 2003 with the goal to decipher sequences originated from the euchromatic portion of the genome. The initial sequencing approach was based on a BAC tiling path, culminating in a BAC-based assembly (Mt3.5) as well as an in-depth analysis of the genome published in 2011.Entities:
Mesh:
Year: 2014 PMID: 24767513 PMCID: PMC4234490 DOI: 10.1186/1471-2164-15-312
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Overview of (A) assembly and (B) annotation strategies used in the Mt4.0 genome release.
Summary of sequencing libraries as input to the ALLPATHS-LG assembler
| Frag | Illumina PE-200 | 207 ± 40 | 212,635,636 | 49.5 | 84,508,836 | 57.9 |
| Frag | Illumina PE-376 | 244 ± 75 | 269,583,440 | 48.3 | 87,087,424 | 73.1 |
| Frag | total | | 482,219,076 | 97.8 | 171,596,260 | 131.1 |
| Jump | Illumina 3Kb | 2014 ± 785 | 117,669,776 | 13.2 | 19,112,272 | 34 |
| Jump | Illumina 4.5Kb | 4866 ± 549 | 78,918,228 | 7.3 | 6,426,863 | 104.2 |
| Jump | Illumina 5Kb | 5062 ± 776 | 200,273,082 | 32.5 | 9,097,003 | 154.9 |
| Jump | Illumina 7Kb | 7455 ± 998 | 50,076,448 | 0.8 | 424,740 | 10.8 |
| Jump | 454 FLX 3Kb | 2260 ± 816 | 1,499,510 | 0.4 | 440,037 | 3.7 |
| Jump | total | | 448,437,044 | 54.2 | 35,500,915 | 307.7 |
| Long_jump | Fosmid lib | 35000 ± 7000 | 68,372 | 0 | 7,626 | 0.8 |
| Long_jump | BAC lib mtrs | 65000 ± 13000 | 40,080 | 0 | 9,269 | 1.9 |
| Long_jump | BAC libs mte1 and mth2 | 100000 ± 20000 | 151,538 | 0 | 38,306 | 14.5 |
| Long_jump | BAC lib mth4 | 200000 ± 40000 | 17,042 | 0 | 4,303 | 2.7 |
| Long_jump | Total | 277,032 | 0.1 | 59,504 | 19.9 |
Type refers to the ALLPATHS terminology of sequencing libraries - “frag” refers to short insert paired-end libraries that are typically two ends of <1Kb fragments, “jump” refers to long insert mate pair libraries that are typically between 1Kb to 10Kb, “long_jump” refers to the ends of fosmids and BACs.
Figure 2Example of breakpoint identification using (A) GBS map and (B) optical map alignment. Red arrows indicate the same breakpoint on Scaffold0004 indicated by GBS map and optical map alignment.
Evidence tracks used in Medicago reannotation pipeline
| Prediction | AUGUSTUS | Yes | Yes |
| Prediction | FGENESH | Yes | Yes |
| Prediction | GENEMARK | No | Yes |
| Transcript | Medicago ESTs | Yes | Yes |
| Transcript | RNA-seq assembled with Rnnotator | Yes | Yes |
| Transcript | RNA-seq assembled with CLC | Yes | No |
| Transcript | RNA-seq assembled with CUFFLINKS | Yes | No |
| Transcript | Legacy Mt3.5 loci transferred using GMAP and liftOver | Yes | Yes |
| Protein | Plant uniref90 proteins | Yes | Yes |
| Protein | Six plant proteomes ( | Yes | Yes |
| Protein | GENEWISE with | Yes | No |
Statistics of the final assembly, including the total numbers of base pairs on each chromosome and unplaced scaffolds
| chr1 | 50,275,726 | 2,715,429 | 52,991,155 | 94.9 % | 86.9 % |
| chr2 | 43,694,219 | 2,035,453 | 45,729,672 | 95.5 % | 84.3 % |
| chr3 | 52,386,245 | 3,128,907 | 55,515,152 | 94.4 % | 83.8 % |
| chr4 | 54,533,855 | 2,048,528 | 56,582,383 | 96.4 % | 89.6 % |
| chr5 | 43,376,507 | 254,224 | 43,630,731 | 99.4 % | 92.6 % |
| chr6 | 31,992,419 | 3,283,294 | 35,275,713 | 90.7 % | 79.3 % |
| chr7 | 46,512,325 | 2,660,098 | 49,172,423 | 94.6 % | 85.4 % |
| chr8 | 43,183,948 | 2,386,037 | 45,569,985 | 94.8 % | 81.9 % |
| chr total | 365,955,244 | 18,511,970 | 384,467,214 | 95.2 % | 85.7 % |
| Unplaced | 24,050,008 | 4,319,556 | 28,369,564 | 84.8 % | n. a. |
Figure 3Increased amount of chromosome-anchored sequences in Medicago Mt4.0 compared to Mt3.5. Red-colored portion of the chromosomes represent BAC sequences used in Mt3.5, while the white regions on the chromosomes represent newly anchored sequences in Mt4.0.
Figure 4Heatmap of linkage disequilibrium between pairwise SNP markers in the Mt4.0 assemblies. Pairwise linkage disequilibrium (LD) between markers was calculated as r value based on segregations of individuals within LR4 mapping population.
Characteristics of high confidence and low confidence gene sets
| Number of genes | 31,661 | 19,233 |
| Number of single-exon genes | 6,103 (19%) | 5,351 (28%) |
| Number of multi-exon genes | 25,558 (81%) | 13,882 (72%) |
| Number of genes with alternative transcript variants | 6,041 (19%) | 347 (2%) |
| Number of predicted transcripts | 42,481 | 19,838 |
| Number of distinct exons | 174,533 | 52,850 |
| Mean gene locus size (first to last exon) | 3,280 | 1,526 |
| Mean transcript size (UTR, CDS) | 1,618 | 841 |
| Mean number of transcripts per gene | 1.3 | 1.0 |
| Mean number of distinct exons per gene | 5.5 | 2.7 |
| Mean exon size | 308 | 296 |
Figure 5Syntenic dot plot of Medicago genome versus itself, showing blocks derived from papilionoid whole genome duplication event. Contrasting (A) Mt3.5 and (B) Mt4.0 with the same synteny block chaining settings (see Methods).
Figure 6analyses of comparisons between legume species with whole genome sequences. Percentage of gene pairs is taken as the counts of syntenic homologs within a Ks range (with bin sizes of 0.05) divided by all the syntenic homologs identified.
Figure 7JCVI website for Medicago genome resources, showing a number of services and tools to interact with the Mt4.0 datasets (A) JBrowse shows the alignment of annotation evidence to the genome; (B) Keyword search supports extraction of gene lists; (C) TAIR-style gene information page of Mt4.0 gene models; (D) Textpresso for mining Medicago-related literature.