| Literature DB >> 25805887 |
Hideki Hirakawa1, Yoshihiro Okada2, Hiroaki Tabuchi3, Kenta Shirasawa1, Akiko Watanabe1, Hisano Tsuruoka1, Chiharu Minami1, Shinobu Nakayama1, Shigemi Sasamoto1, Mitsuyo Kohara1, Yoshie Kishida1, Tsunakazu Fujishiro1, Midori Kato1, Keiko Nanri1, Akiko Komaki1, Masaru Yoshinaga3, Yasuhiro Takahata3, Masaru Tanaka3, Satoshi Tabata1, Sachiko N Isobe4.
Abstract
Ipomoea trifida (H. B. K.) G. Don. is the most likely diploid ancestor of the hexaploid sweet potato, I. batatas (L.) Lam. To assist in analysis of the sweet potato genome, de novo whole-genome sequencing was performed with two lines of I. trifida, namely the selfed line Mx23Hm and the highly heterozygous line 0431-1, using the Illumina HiSeq platform. We classified the sequences thus obtained as either 'core candidates' (common to the two lines) or 'line specific'. The total lengths of the assembled sequences of Mx23Hm (ITR_r1.0) was 513 Mb, while that of 0431-1 (ITRk_r1.0) was 712 Mb. Of the assembled sequences, 240 Mb (Mx23Hm) and 353 Mb (0431-1) were classified into core candidate sequences. A total of 62,407 (62.4 Mb) and 109,449 (87.2 Mb) putative genes were identified, respectively, in the genomes of Mx23Hm and 0431-1, of which 11,823 were derived from core sequences of Mx23Hm, while 28,831 were from the core candidate sequence of 0431-1. There were a total of 1,464,173 single-nucleotide polymorphisms and 16,682 copy number variations (CNVs) in the two assembled genomic sequences (under the condition of log2 ratio of >1 and CNV size >1,000 bases). The results presented here are expected to contribute to the progress of genomic and genetic studies of I. trifida, as well as studies of the sweet potato and the genus Ipomoea in general.Entities:
Keywords: CNVs; Ipomoea trifida; SNPs; core- and line-specific sequences; genome sequence assembly
Mesh:
Year: 2015 PMID: 25805887 PMCID: PMC4401327 DOI: 10.1093/dnares/dsv002
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Statistics of the assembled genome sequences for Mx23Hm and 0431-1
| Sequenced line | Mx23Hm | 0431-1 |
|---|---|---|
| (ITR_r1.0) | (ITRk_r1.0) | |
| Number of sequences | 77,400 | 181,194 |
| Total length (bases) | 512,990,885 | 712,155,587 |
| Average length (bases) | 6,628 | 3,930 |
| Max length (bases) | 910,847 | 1,352,076 |
| Min length (bases) | 300 | 300 |
| N50 length (bases) | 42,586 | 36,283 |
| A | 108,919,552 | 155,339,270 |
| T | 108,380,339 | 154,432,148 |
| G | 60,024,339 | 86,821,603 |
| C | 60,253,902 | 87,276,414 |
| 175,412,753 | 228,286,152 | |
| Total (ATGC) | 337,578,132 | 483,869,435 |
| GC% (GC/ATGC) | 35.6 | 36.0 |
Figure 1.Total lengths of core candidates and line-specific sequences of ITR_r1.0 (Mx23Hm) and ITRk_r1.0 (0431-1).
Figure 2.Venn diagram showing the numbers of gene clusters in Ipomoea trifida and other plant species, i.e. Arabidopsis thaliana, potato (Solanum tuberosum), and cassava (Manihot esculenta). The black and white numbers in parenthesis represent the numbers of non-clustered sequences and sequences clustered with other species, respectively. The green, purple, orange, aqua, and red numbers in parenthesis represent the total numbers of putative genes subjected to clustering.
Figure 3.CNV distributions and corresponding positions of core candidates and line-specific sequences of scaffold Itr_sc000048.1 (1,177,009 bases in length). Red dots show the log2 ratios of CNVs between Mx23Hm and 0431-1. The upper bar represents core candidate (red) and line-specific (blue) sequences in their corresponding positions.