| Literature DB >> 33919286 |
Parwinder Kaur1, Christopher Lui2, Olga Dudchenko2,3, Raja Sekhar Nandety4, Bhavna Hurgobin5,6, Melanie Pham2,3, Erez Lieberman Aiden1,2,3,7,8, Jiangqi Wen4, Kirankumar Mysore4.
Abstract
Legumes are of great interest for sustainable agricultural production as they fix atmospheric nitrogen to improve the soil. Medicago truncatula is a well-established model legume, and extensive studies in fundamental molecular, physiological, and developmental biology have been undertaken to translate into trait improvements in economically important legume crops worldwide. However, M. truncatula reference genome was generated in the accession Jemalong A17, which is highly recalcitrant to transformation. M. truncatula R108 is more attractive for genetic studies due to its high transformation efficiency and Tnt1-insertion population resource for functional genomics. The need to perform accurate synteny analysis and comprehensive genome-scale comparisons necessitates a chromosome-length genome assembly for M. truncatula cv. R108. Here, we performed in situ Hi-C (48×) to anchor, order, orient scaffolds, and correct misjoins of contigs in a previously published genome assembly (R108 v1.0), resulting in an improved genome assembly containing eight chromosome-length scaffolds that span 97.62% of the sequenced bases in the input assembly. The long-range physical information data generated using Hi-C allowed us to obtain a chromosome-length ordering of the genome assembly, better validate previous draft misjoins, and provide further insights accurately predicting synteny between A17 and R108 regions corresponding to the known chromosome 4/8 translocation. Furthermore, mapping the Tnt1 insertion landscape on this reference assembly presents an important resource for M. truncatula functional genomics by supporting efficient mutant gene identification in Tnt1 insertion lines. Our data provide a much-needed foundational resource that supports functional and molecular research into the Leguminosae for sustainable agriculture and feeding the future.Entities:
Keywords: HiC; Leguminosae; Medicago truncatula cv. R108; Tnt1 insertion landscape; chromosome-length genome assembly
Mesh:
Substances:
Year: 2021 PMID: 33919286 PMCID: PMC8122578 DOI: 10.3390/ijms22094326
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Hi-C map of the draft and chromosome-length assemblies of Medicago truncatula cv. R108 genome. Contact matrices were generated by aligning the same Hi-C data set to the R108 v1.0 draft genome (left) and MedtrR108_hic genome assembly generated using Hi-C (right). Pixel intensity in the matrix indicates how often a pair of loci co-locate in the nucleus. Correspondence between loci in the draft and final assemblies is illustrated using chromograms. The chromosome-length assembly scaffolds in Med-trR108_hic are assigned a linear color gradient. hic are assigned a linear color gradient; the same colors are then used for the corresponding loci in the R108v1.0 (left). The draft scaffolds are ordered by sequence name. Gridlines highlight the boundaries of eight chromosome-length scaffolds in MedtrR108_hic (right). Scaffolds smaller than 10 kb in R108v1.0 are not included.
Figure 2Snail plots describing the assembly statistics of the (A) MedtrR108_hic assembly and (B) R108 v1.0 assembly. Note the larger values for the longest scaffolds, N50 and N90, for MedtrR108_hic than R108 v1.0. The plots were generated using https://github.com/rjchallis/assembly-stats, accessed on 17 March 2021.
Assembly statistics for the MedtrR108_hic genome assembly. Note that scaffolds smaller than 1 Kbp are excluded from the analysis.
| Statistics | MedtrR108_hic |
|---|---|
| Draft scaffolds | |
| Base pairs | 399,348,955 |
| Number of contigs | 1005 |
| Contig N50 | 5,925,378 |
| Number of scaffolds | 909 |
| Scaffold N50 | 12,848,239 |
| Chromosome-length scaffolds | |
| Base pairs | 390,045,474 |
| Number of contigs | 209 |
| Contig N50 | 6,045,855 |
| Number of scaffolds | 8 |
| Scaffold N50 | 51,860,634 |
| Small scaffolds | |
| Base pairs | 5,840,890 |
| Number of contigs | 248 |
| Contig N50 | 24,000 |
| Number of scaffolds | 236 |
| Scaffold N50 | 24,736 |
| Tiny scaffolds | |
| Base pairs | 3,462,591 |
| Number of contigs | 558 |
| Contig N50 | 9246 |
| Number of scaffolds | 557 |
| Scaffold N50 | 9246 |
Figure 3Assembly using Hi-C improves comparative analysis. (A) Whole-genome alignments of MedtrR108_hic versus A17 Mt5.0 highlight the peculiarity of the A17 genotype better than those between R108 v1.0 and A17 Mt5.0 [14]. (B) Circos plot depicts the genome structure of the syntenic relationship between MedtrR108_hic (chromosome names on right in black) and A17 Mt5.0 (chromosome names on left in blue) via syntenic links. The translocated regions on chromosomes 4 and 8 are highlighted: A denotes a 12 Mb syntenic region between MedtrR108_hic chromosome 4 (41.1–53.2 Mb) and A17 Mt5.0 chromosome 8 (37–49.7 Mb), and B denotes a 17 Mb syntenic region between MedtrR108_hic chromosome 8 (32.9–50.2 Mb) and A17 Mt5.0 chromosome 4 (46.9–64.7 Mb). The syntenic links represent syntenic blocks that are at least 50 Kbp long, and chromosome sizes are shown in Mb. Only the largest scaffolds/chromosomes determined syntenic relationships.
Tnt1 insertion distribution on the Medicago truncatula R108 Hi-C genome.
| Mapping Description | No of FSTs | % of Total FSTs |
|---|---|---|
| FSTs mapped to Chromosome 1 | 27,902 | 12.61 |
| FSTs mapped to Chromosome 2 | 24,559 | 11.1 |
| FSTs mapped to Chromosome 3 | 27,679 | 12.51 |
| FSTs mapped to Chromosome 4 | 26,975 | 12.19 |
| FSTs mapped to Chromosome 5 | 25,313 | 11.44 |
| FSTs mapped to Chromosome 6 | 16,433 | 7.43 |
| FSTs mapped to Chromosome 7 | 25,451 | 11.5 |
| FSTs mapped to Chromosome 8 | 27,115 | 12.25 |
| Total mapped to 8 chromosomes | 201,427 | 91.03 |
| Total mapped to non Chr scaffolds | 1361 | 0.62 |
Figure 4Circular genomic visualization of Tnt1 insertions in Medicago truncatula R108 genome. The figure was generated using the R statistical platform in the Rcircos package. The outer band (outer circle) has chromosome locations (Chr1-Chr8). Each of the chromosome regions was divided into 500 Kb bins and plotted as bins with specific genomic locations. The first band of the circle represents the GC percentage of the chromosome regions specific to those divided bins. The second inner circle represents Tnt1 insertions (as a measure of their FST lengths) in different chromosomes of the MedtrR108_hic assembly.