| Literature DB >> 24280374 |
Yoshihiro Kawahara1, Melissa de la Bastide, John P Hamilton, Hiroyuki Kanamori, W Richard McCombie, Shu Ouyang, David C Schwartz, Tsuyoshi Tanaka, Jianzhong Wu, Shiguo Zhou, Kevin L Childs, Rebecca M Davidson, Haining Lin, Lina Quesada-Ocampo, Brieanne Vaillancourt, Hiroaki Sakai, Sung Shin Lee, Jungsok Kim, Hisataka Numa, Takeshi Itoh, C Robin Buell, Takashi Matsumoto.
Abstract
BACKGROUND: Rice research has been enabled by access to the high quality reference genome sequence generated in 2005 by the International Rice Genome Sequencing Project (IRGSP). To further facilitate genomic-enabled research, we have updated and validated the genome assembly and sequence for the Nipponbare cultivar of Oryza sativa (japonica group).Entities:
Year: 2013 PMID: 24280374 PMCID: PMC5395016 DOI: 10.1186/1939-8433-6-4
Source DB: PubMed Journal: Rice (N Y) ISSN: 1939-8425 Impact factor: 4.783
Assessment and processing of re-sequencing datasets used in this study
| Library | Typea | Original read length (bp) | Number of reads | |||
|---|---|---|---|---|---|---|
| Initial purity filtered reads | Remaining after low quality trimming | Remaining after adaptor trimming | Remaining after pairingb | |||
| NIAS | SE | 36 | 70,387,992 | 59,966,207 | 59,344,842 | - |
| 51 | 60,164,564 | 51,717,577 | 50,981,527 | - | ||
| CSHL | PE | 76 | 269,062,790 | 261,197,347 | 257,096,019 | 246,779,548 (10,316,471) |
a SE: single-end reads, PE: paired-end reads.
b The value in parenthesis is the number of unpaired reads for which one of the reads in a pair was discarded during the preprocessing to remove low quality reads.
Statistics of mapping results by BWA
| Data-set | Read type | Number of pre-processed reads | Uniquely mappeda | Uniquely & properly mappedb | Multiplec | Unmappedd |
|---|---|---|---|---|---|---|
| NIAS | SE, 36bp | 59,344,842 | 38,193,181 (64.4%) | - | 18,463,375 (31.1%) | 302,057 (0.5%) |
| [40,579,410 (68.4%)]e | ||||||
| SE, 51bp | 50,981,527 | 34,124,172 (66.9%) | - | 14,613,663 (28.7%) | 385,876 (0.8%) | |
| [35,981,988 (70.6%)] | ||||||
| CSHL | PE, 76bp | 246,779,548 | 191,891,774 (77.8%) | 185,630,272 (75.2%) | 40,901,655 (16.6%) | 13,986,119 (5.7%) |
| [190,608,407 (77.2%)] | ||||||
| Unpaired PEf, 76bp | 10,316,471 | 7,235,730 (70.1%) | - | 1,909,048 (18.5%) | 819,735 (7.9%) | |
| [7,587,688 (73.5%)] |
a Number of uniquely mapped reads on the assembled pseudomolecules.
b Number of uniquely mapped reads with proper distances between paired end reads.
c Number of reads that mapped to multiple positions on the assembled pseudomolecules with the same score.
d Number of reads that could not be mapped to the assembled pseudomolecules.
e The numbers of reads in the square brackets includes mapped reads with lower mapping quality (MAPQ <20).
f Unpaired reads for which one of the pair was discarded in the preprocessing and thus was mapped as a single-end read.
Figure 1Depth of coverage of Illumina reads on the assembled genome. The depth of coverage of the Illumina reads at ≥1 read, ≥5 reads, and ≥10 reads are shown for the (A) NIAS, (B) CSHL, and (C) NIAS + CSHL datasets.
Coverage and number of effective sites for detection of variation
| Chr | length (bp)a | NIAS | CSHL | NIAS + CSHL | |||
|---|---|---|---|---|---|---|---|
| Covb (%) | Effc (%) | Cov (%) | Eff (%) | Cov (%) | Eff (%) | ||
| chr01 | 43,270,899 | 35,791,098 (82.7%) | 18,246,140 (42.2%) | 39,956,568 (92.3%) | 38,068,615 (88.0%) | 39,961,540 (92.4%) | 38,175,168 (88.2%) |
| chr02 | 35,937,247 | 30,154,356 (83.9%) | 15,638,850 (43.5%) | 33,524,321 (93.3%) | 32,062,712 (89.2%) | 33,528,628 (93.3%) | 32,148,332 (89.5%) |
| chr03 | 36,413,819 | 31,059,089 (85.3%) | 15,977,692 (43.9%) | 34,365,834 (94.4%) | 32,996,819 (90.6%) | 34,369,143 (94.4%) | 33,082,516 (90.9%) |
| chr04 | 35,502,790 | 26,736,290 (75.3%) | 12,921,248 (36.4%) | 31,176,439 (87.8%) | 29,064,227 (81.9%) | 31,186,864 (87.8%) | 29,153,050 (82.1%) |
| chr05 | 29,958,438 | 23,400,797 (78.1%) | 11,810,143 (39.4%) | 26,853,884 (89.6%) | 25,256,988 (84.3%) | 26,857,426 (89.6%) | 25,327,281 (84.5%) |
| chr06 | 31,248,789 | 24,651,481 (78.9%) | 12,398,636 (39.7%) | 28,084,165 (89.9%) | 26,516,110 (84.9%) | 28,088,052 (89.9%) | 26,594,708 (85.1%) |
| chr07 | 29,697,629 | 23,447,981 (79.0%) | 11,725,346 (39.5%) | 26,880,079 (90.5%) | 25,363,522 (85.4%) | 26,883,654 (90.5%) | 25,438,043 (85.7%) |
| chr08 | 28443,027 | 22,122,986 (77.8%) | 11,013,746 (38.7%) | 25,508,624 (89.7%) | 23,953,312 (84.2%) | 25,512,422 (89.7%) | 24,024,380 (84.5%) |
| chr09 | 23,012,721 | 18,240,332 (79.3%) | 9,052,990 (39.3%) | 20,858,168 (90.6%) | 19,681,323 (85.5%) | 20,860,908 (90.6%) | 19,739,054 (85.8%) |
| chr10 | 23,208,246 | 18,119,906 (78.1%) | 8,873,977 (38.2%) | 20,931,379 (90.2%) | 19,673,976 (84.8%) | 20,934,165 (90.2%) | 19,732,853 (85.0%) |
| chr11 | 29,021,139 | 22,586,659 (77.8%) | 11,082,625 (38.2%) | 25,987,560 (89.5%) | 24,468,775 (84.3%) | 25,990922 (89.6%) | 24,546,522 (84.6%) |
| chr12 | 27,531,905 | 20,923,037 (76.0%) | 10,270,579 (37.3%) | 24,427,691 (88.7%) | 22,818,854 (82.9%) | 24,432,616 (88.7%) | 22,891,243 (83.1%) |
| Total | 373,246,649 | 297,234,012 (79.6%) | 149,011,972 (39.9%) | 338,554,712 (90.7%) | 31,9925,233 (85.7%) | 338,606,340 (90.7%) | 320,853,150 (86.0%) |
a Length before error correction.
b Cov: Number of sites covered with at least one read.
c Eff: Number of effective sites covered with at least 10 high quality (base quality ≥20) reads.
Figure 2Frequency of allelic sites per 10 kb among the 12 rice chromosomes.