| Literature DB >> 36192701 |
Kentaro Mishima1, Hideki Hirakawa2, Taiichi Iki3, Yoko Fukuda4, Tomonori Hirao5, Akira Tamura5, Makoto Takahashi5.
Abstract
BACKGROUND: Japanese larch (Larix kaempferi) is an economically important deciduous conifer species that grows in cool-temperate forests and is endemic to Japan. Kuril larch (L. gmelinii var. japonica) is a variety of Dahurian larch that is naturally distributed in the Kuril Islands and Sakhalin. The hybrid larch (L. gmelinii var. japonica × L. kaempferi) exhibits heterosis, which manifests as rapid juvenile growth and high resistance to vole grazing. Since these superior characteristics have been valued by forestry managers, the hybrid larch is one of the most important plantation species in Hokkaido. To accelerate molecular breeding in these species, we collected and compared full-length cDNA isoforms (Iso-Seq) and RNA-Seq short-read, and merged them to construct candidate gene as reference for both Larix species. To validate the results, candidate protein-coding genes (ORFs) related to some flowering signal-related genes were screened from the reference sequences, and the phylogenetic relationship with closely related species was elucidated.Entities:
Keywords: Flowering signal-related genes; Isoform sequencing; L. gmelinii var. japonica; Larix kaempferi; Short-read sequences
Mesh:
Substances:
Year: 2022 PMID: 36192701 PMCID: PMC9531402 DOI: 10.1186/s12870-022-03862-9
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 5.260
Summary of collections of PacificBio transcript isoform data for Japanese larch and Kuril larch
| 1–2 kb (3 cells) | 2–3 kb (2 cells) | 3–6 kb (2 cells) | 5–10 kb (2 cells) | Total | 1–2 kb (3 cells) | 2–3 kb (3 cells) | 3–4 kb (2 cells) | 4–10 kb (2 cells) | Total | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Read of Insert (ROI) | 150,023 | 122,662 | 114,064 | 114,537 | 501,286 | 124,921 | 141,921 | 102,575 | 89,851 | 459,268 | |
| Read of bases insert | 277,644,841 | 313,765,905 | 421,270,147 | 553,831,092 | 1,566,511,985 | 189,871,361 | 342,758,385 | 342,431,605 | 349,381,956 | 1,224,443,307 | |
| Mean read length of insert (bases) | 1,850 | 2,557 | 3,693 | 4,835 | 3,125 | 1,519 | 2,415 | 3,338 | 3,888 | 2,666 | |
| Mean read quality of insert | 0.9236 | 0.9153 | 0.8953 | 0.8579 | 4 | 0.9256 | 0.9241 | 0.9059 | 0.8881 | 4 | |
| Mean number of passes | 10 | 8 | 5 | 3 | 26 | 12 | 8 | 5 | 4 | 29 | |
| Number of filtered short reads of insert | 7,192 | 3,558 | 1,374 | 1,075 | 13,199 | 6,125 | 3,233 | 1,365 | 1,184 | 11,907 | |
| Number of non-full-length reads of insert | 62,498 | 50,252 | 51,889 | 77,378 | 242,017 | 49,722 | 53,965 | 45,424 | 49,310 | 198,421 | |
| Number of full-length reads of insert | 80,333 | 68,852 | 60,801 | 36,084 | 246,070 | 69,074 | 84,723 | 55,786 | 39,357 | 248,940 | |
| Number of full-length non-chimeric reads | 78,930 | 68,352 | 60,554 | 34,810 | 242,646 | 67,549 | 84,271 | 55,564 | 38,659 | 246,043 | |
| Average full-length non-chimeric read length | 1,745 | 2,608 | 4,002 | 5,798 | - | 1,359 | 2,445 | 3,599 | 4,411 | - | |
| Number of consensus isoforms | 46,124 | 43,436 | 28,889 | 13,958 | 132,407 | 33,508 | 45,851 | 29,364 | 24,534 | 133,257 | |
| Average consensus isoforms read length (bases) | 1,753 | 2,567 | 4,023 | 5,246 | 13,589 | 1,439 | 2,583 | 3,809 | 4,253 | 12,084 | |
| Number of polished high-quality isoforms | 36,281 | 29,509 | 17,489 | 8,435 | 91,714 | 26,093 | 30,438 | 15,888 | 10,607 | 83,026 | |
| Number of polished low-quality isoforms | 9,843 | 13,927 | 11,400 | 5,523 | 40,693 | 7,415 | 15,410 | 13,476 | 13,924 | 50,225 | |
| Non-redundant transcripts | - | - | - | - | 79,832 | - | - | - | - | 66,002 | |
| Min. isoform length (bases) | - | - | - | - | 300 | - | - | - | - | 307 | |
| Max. isoform length (bases) | - | - | - | - | 8,880 | - | - | - | - | 10,117 | |
| Mean isoform length (bases) | - | - | - | - | 2,715 | - | - | - | - | 2,446 | |
| BUSCO v3 (odb10; 1,375) | Complete (%) | - | - | - | - | 65.1 | - | - | - | - | 71.9 |
| Complete and single-copy (%) | - | - | - | - | 31.9 | - | - | - | - | 33.1 | |
| Complete and duplicated (%) | - | - | - | - | 33.2 | - | - | - | - | 38.8 | |
| Fragmented (%) | - | - | - | - | 7.8 | - | - | - | - | 6.7 | |
| Missing (%) | - | - | - | - | 27.1 | - | - | - | - | 21.4 | |
Open reading frame prediction of PacBio transcript isoform data estimated by ANGEL
| Number of ORFs | 80,557 | 67,332 |
| Total length (bases) | 109,217,543 | 78,919,963 |
| Average (bases) | 1,356 | 1,172 |
| Maximum (bases) | 7,956 | 6,609 |
| Minimum ((bases) | 145 | 146 |
| N50 (bases) | 1,752 | 1,551 |
| G + C% | 44.8 | 44.5 |
| Category of ANGEL | ||
| Confident | 37,508 | 29,372 |
| Confident-complete | 20,886 | 20,201 |
| Confident-5'partial | 16,277 | 8,866 |
| Confident-3'partial | 234 | 251 |
| Confident-internal | 111 | 54 |
| Likely-NA | 9,064 | 6,910 |
| Suspicious-NA | 16,834 | 15,718 |
| Dumb-complete | 16,992 | 15,187 |
| Dumb-5'partial | 11 | 3 |
| Dumb-3'partial | 148 | 142 |
| BUSCO v3 (odb10; 1,375) | ||
| Complete (%) | 63.2 | 68.2 |
| Complete and single-copy (%) | 30.5 | 32.5 |
| Complete and duplicated (%) | 32.7 | 35.7 |
| Fragmented (%) | 7.7 | 8.1 |
| Missing (%) | 29.1 | 23.7 |
Completeness of all confident ORFs (confident-complete, 5’_partial, 3’_partial, internal) and confident-complete ORFs estimated by BUSCO analysis
| Complete (%) | 46.8 | 52.3 | 35.1 | 42.4 |
| Complete and single-copy (%) | 23.8 | 26.8 | 19.9 | 23.1 |
| Complete and duplicated (%) | 23.0 | 25.5 | 15.2 | 19.3 |
| Fragmented (%) | 6.5 | 4.9 | 4.9 | 3.9 |
| Missing (%) | 46.7 | 42.8 | 60.0 | 53.7 |
Summary of collections to short-read of transcript data
| Sample name | Total reads | Total read bases | GC (%) | AT (%) | Q20 (%) | Q30 (%) | Accession | Sample name | Total reads | Total read bases | GC (%) | AT (%) | Q20 (%) | Q30 (%) | Accession | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LK | GFE33389 | 144,905,924 | 14,635,498,324 | 45.26 | 54.74 | 96.43 | 94.07 | DRA: 011937, Experiment: DRX279213 | LG | GFF08127 | 149,916,032 | 15,141,519,232 | 45.45 | 54.55 | 95.30 | 92.09 | DRA: 011937, Experiment: DRX279223 |
| LK | GFE02991 | 159,873,888 | 16,146,262,688 | 45.22 | 54.78 | 95.46 | 92.34 | DRA: 011937, Experiment: DRX279214 | LG | GFF03187 | 178,050,660 | 17,983,116,660 | 45.05 | 54.95 | 94.61 | 91.03 | DRA: 011937, Experiment: DRX279221 |
| LK | GEF32200 | 124,904,528 | 12,615,357,328 | 45.49 | 54.51 | 99.00 | 97.34 | DRA: 011937, Experiment: DRX279215 | LG | GFF03199 | 171,178,302 | 17,289,008,502 | 44.79 | 55.21 | 95.55 | 92.45 | DRA: 011937, Experiment: DRX279220 |
| LK | GEF32203 | 127,808,704 | 12,908,679,104 | 45.96 | 54.04 | 99.01 | 97.34 | DRA: 011937, Experiment: DRX279216 | LG | GFF08097 | 180,587,588 | 18,239,364,388 | 45.33 | 54.67 | 95.38 | 92.17 | DRA: 011937, Experiment: DRX279222 |
| LK | GFE02901 | 131,108,822 | 13,241,991,022 | 45.08 | 54.92 | 99.02 | 97.37 | DRA: 011937, Experiment: DRX279217 | LG | GFF03179 | 154,611,148 | 15,615,725,948 | 44.87 | 55.13 | 96.16 | 93.65 | DRA: 011937, Experiment: DRX279224 |
| LK | GFE02910 | 135,559,950 | 13,691,554,950 | 45.29 | 54.71 | 98.94 | 97.18 | DRA: 011937, Experiment: DRX279218 | LG | GFF03183 | 137,362,464 | 13,873,608,864 | 44.87 | 55.13 | 96.18 | 93.69 | DRA: 011937, Experiment: DRX279225 |
| LK | GFE02911 | 135,061,888 | 13,641,250,688 | 45.28 | 54.72 | 99.03 | 97.39 | DRA: 011937, Experiment: DRX279219 | LG | GFF08058 | 123,067,288 | 12,429,796,088 | 45.06 | 54.94 | 96.04 | 93.69 | DRA: 011937, Experiment: DRX279226 |
| Total | 959,223,704 | 96,880,594,104 | - | - | - | - | - | Total | 1,094,773,482 | 110,572,139,682 | - | - | - | - | - | ||
| After normalization | 182,578,084 | - | - | - | - | - | - | After normalization | 209,039,560 | - | - | - | - | - | - | ||
Summary of unigenes constructed from short-read transcript data
| Contigs | Unitranscript | Unigene | Contigs | Unitranscript | Unigene | |
|---|---|---|---|---|---|---|
| Number of sequences | 912,369 | 118,141 | 58,396 | 1,133,931 | 72,294 | 36,972 |
| Total length (bases) | 619,249,181 | 154,148,063 | 68,361,245 | 744,563,879 | 98,330,037 | 49,327,511 |
| Average (bases) | 679 | 1,305 | 1,171 | 657 | 1,360 | 1,334 |
| Maximum (bases) | 17,703 | 17,703 | 17,703 | 19,838 | 18,429 | 18,429 |
| Minimum (bases) | 172 | 172 | 188 | 170 | 180 | 185 |
| N50 (bases) | 1,112 | 2,311 | 2,246 | 1,016 | 2,374 | 2,474 |
| G + C% | 44.2 | 41.7 | 41.8 | 43.7 | 41.8 | 41.7 |
| A | 172,604,111 | 44,911,691 | 19,905,530 | 209,250,990 | 28,589,958 | 14,364,447 |
| T | 172,848,343 | 44,883,640 | 19,908,628 | 209,910,416 | 28,611,870 | 14,387,526 |
| G | 135,206,494 | 31,849,992 | 14,133,013 | 160,755,371 | 20,361,148 | 10,199,216 |
| C | 138,590,233 | 32,502,740 | 14,414,074 | 164,647,102 | 20,767,061 | 10,376,319 |
Statistics and completeness of the ORFs predicted from RNA-seq short-reads
| Number of ORFs | 27,130 | 20,207 |
| Total length (bp) | 27,027,003 | 22,448,592 |
| Average (bp) | 996 | 1,111 |
| Maximum (bp) | 14,493 | 14,493 |
| Minimum (bp) | 255 | 255 |
| N50 (bp) | 1,341 | 1,500 |
| G + C% | 45.0 | 44.6 |
| A | 7,714,114 | 6,465,651 |
| T | 7,152,243 | 5,973,339 |
| G | 6,610,707 | 5,484,799 |
| C | 5,549,939 | 4,524,803 |
| BUSCO v3 (odb10; 1375) | ||
| Complete (%) | 82.0 | 87.5 |
| Complete and single-copy (%) | 79.5 | 84.7 |
| Complete and duplicated (%) | 2.5 | 2.8 |
| Fragmented (%) | 6.7 | 2.8 |
| Missing (%) | 11.3 | 9.7 |
Statistics and completeness of merged ORFs from Iso-seq and RNA-Seq short-reads
| Merged ORFs (Isoseq + RNA seq) | Merged ORFs after clustering | Merged ORFs (Isoseq + RNA seq) | Merged ORFs after clustering | |
|---|---|---|---|---|
| Number of ORFs | 107,687 | 50,690 | 87,539 | 38,684 |
| Total length (bases) | 136,244,546 | 53,804,534 | 101,368,555 | 39,053,584 |
| Average (bases) | 1,265 | 1,061 | 1,158 | 1,010 |
| Maximum (bases) | 14,493 | 14,493 | 14,493 | 14,493 |
| Minimum (bases) | 145 | 145 | 146 | 146 |
| N50 (bases) | 1,665 | 1,473 | 1,539 | 1,413 |
| G + C% | 44.8 | 46.4 | 44.5 | 45.1 |
| A | 39,163,057 | 15,014,796 | 29,209,051 | 11,107,507 |
| T | 35,977,224 | 13,798,508 | 27,009,101 | 10,313,639 |
| G | 33,156,639 | 13,227,788 | 24,804,297 | 9,554,898 |
| C | 27,947,626 | 11,763,442 | 20,346,106 | 8,077,540 |
| Category of ANGEL | ||||
| Confident | - | 14,264 | - | 120,23 |
| Confident- complete | - | 8,933 | - | 8,315 |
| Confident-5' partial | - | 5,174 | - | 3,589 |
| Confident-3' partial | - | 122 | - | 105 |
| Confident- internal | - | 35 | - | 14 |
| Likely-NA | - | 4,143 | - | 3,142 |
| Suspicious-NA | - | 6,991 | - | 6,611 |
| Dumb-complete | - | 7,126 | - | 5,644 |
| Dumb-5'partial | - | 6 | - | 3 |
| Dumb-3'partial | - | 80 | - | 67 |
| Category of Transdecorder | ||||
| Complete | - | 10,099 | - | 6,489 |
| 5' Prime_partial | - | 2,946 | - | 1,552 |
| 3' Prime_partial | - | 1,885 | - | 1,023 |
| Internal | - | 3,150 | - | 2,130 |
| BUSCO v3 (odb10; 1375) | ||||
| Complete (%) | 90.6 | 90.5 | 92.1 | 92.1 |
| Complete and single-copy (%) | 39.8 | 83.3 | 32.8 | 84.8 |
| Complete and duplicated (%) | 50.8 | 7.2 | 59.3 | 7.3 |
| Fragmented (%) | 3.1 | 3 | 1.5 | 1.5 |
| Missing (%) | 6.3 | 6.5 | 6.4 | 6.0 |
Fig. 1Venn diagram showing the overlap between open reading frames obtained from Japanese larch and Kuril larch in this study
Interspecific comparison of Japanese larch and Kuril larch ORFs
| Intersection of Venn diagram | Specific to | Specific to | ||
|---|---|---|---|---|
| Number of clusters | 19,813 | 7 | 5 | |
| Number of ORFs | ||||
| PacBio Iso-seq | 13,150 | 14,232 | 12 | 8 |
| Illumina RNA-seq | 9,421 | 8,435 | 21 | 7 |
| Total | 22,571 | 22,667 | 33 | 15 |
| Single clusters | ||||
| PacBio Iso-seq | - | - | 19,448 | 13,250 |
| Illumina RNA-seq | - | - | 8,638 | 2,752 |
| Total | - | - | 28,086 | 16,002 |
| Homologs to gbpln in NCBI | ||||
| PacBio Iso-seq | 11,908 | 12,917 | 13,098 | 9,613 |
| Illumina RNA-seq | 8,427 | 7,547 | 5,821 | 1,921 |
| Total | 20,335 | 20,464 | 18,919 | 11,534 |
| No hits against NR database | ||||
| PacBio Iso-seq | 932 | 1,052 | 5,057 | 3,354 |
| Illumina RNA-seq | 743 | 665 | 2,280 | 759 |
| Total | 1,675 | 1,717 | 7,337 | 4,113 |
| BUSCO v3 (odb10; 1,375) | ||||
| Complete (%) | 88.5 | 90.1 | 16.5 | 9.9 |
| Complete and single-copy (%) | 84.6 | 84.9 | 14.5 | 9.2 |
| Complete and duplicated (%) | 3.9 | 5.2 | 2.0 | 0.7 |
| Fragmented (%) | 3 | 1.7 | 13.1 | 13.5 |
| Missing (%) | 8.5 | 8.2 | 70.4 | 76.6 |
Fig. 2Phylogenetic tree showing the relationships between known MADS-box genes and a set of other angiosperm and gymnosperm sequences. Japanese larch open reading frames are shown in green. Kuril larch open reading frames are shown in blue. Numbers adjacent to some nodes show bootstrap percentages
Fig. 3Phylogenetic tree showing the relationships between known LEAFY and NEEDLY genes and a set of other angiosperm and gymnosperm sequences. Japanese larch open reading frames are shown in green. Kuril larch open reading frames are shown in blue. Numbers adjacent to some nodes show bootstrap percentages
Fig. 4Phylogenetic tree showing the relationships between known FT/FT-like and MFT genes and a set of other angiosperm and gymnosperm sequences. Japanese larch open reading frames are shown in green. Kuril larch open reading frames are shown in blue. Numbers adjacent to some nodes show bootstrap percentages
Fig. 5Phylogenetic tree showing the relationships between known CONSTANS genes and a set of other angiosperm and gymnosperm sequences. Japanese larch open reading frames are shown in green. Kuril larch open reading frames are shown in blue. Numbers adjacent to some nodes show bootstrap percentages