| Literature DB >> 32442239 |
Eliandro Espindula1, Edilena Reis Sperb1, Evelise Bach1, Luciane Maria Pereira Passaglia1.
Abstract
In Dual RNA-Seq experiments the simultaneous extraction of RNA and analysis of gene expression data from both interacting organisms could be a challenge. One alternative is separating the reads during in silico data analysis. There are two main mapping methods used: sequential and combined. Here we present a combined approach in which the libraries were aligned to a concatenated genome to sort the reads before mapping them to the respective annotated genomes. A comparison of this method with the sequential analysis was performed. Two RNA-Seq libraries available in public databases consisting of a eukaryotic (Zea mays) and a prokaryotic (Herbaspirillum seropediceae) organisms were mixed to simulate a Dual RNA-Seq experiment. Libraries from real Dual RNA-Seq experiments were also used. The sequential analysis consistently attributed more reads to the first reference genome used in the analysis (due to cross-mapping) than the combined approach. More importantly, the combined analysis resulted in lower numbers of cross-mapped reads. Our results highlight the necessity of combining the reference genomes to sort reads previously to the counting step to avoid losing information in Dual RNA-Seq experiments. Since most studies first map the RNA-Seq libraries to the eukaryotic genome, much prokaryotic information has probably been lost.Entities:
Year: 2020 PMID: 32442239 PMCID: PMC7249662 DOI: 10.1590/1678-4685-GMB-2019-0215
Source DB: PubMed Journal: Genet Mol Biol ISSN: 1415-4757 Impact factor: 1.771
Figure 1Mapping strategies for Dual RNA-Seq analysis. (A) Sequential analysis aligning libraries to the eukaryotic genome first- Eukaryote 1st; (B) Sequential analysis aligning libraries to the prokaryotic genome first- Prokaryote 1st; (C) Combined analysis.
Library features and number of total reads attributed to the Herbaspirillum seropedicae or Zea mays genomes according to the mapping approach. The analyses were performed with the genomes without annotations, with the mapping parameters of 0.8 of minimum length fraction and 0.8 of minimum similarity fraction. Values for sensitivity, specificity, accuracy, and precision were determined according to Table S1.
| Library | Total Reads | Total reads after trimming | Number of Reads After Library filtration | Cross-Mapping | Mapping Strategy | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sequential Analysis | Combined Analysis | |||||||||||
| Eukaryote 1st | Prokaryote 1st | |||||||||||
|
|
|
|
|
|
|
|
| unmapped | ||||
|
| 158,053,843 | 92,987,843 | 44,469,308 | - | 13,847,693 | - | - | - | - | 43,661,668 | 779,556 | 28,084 |
|
| 24,300,211 | 24,255,170 | 22,200,875 | 7,659 | - | - | - | - | - | 394 | 22,200,465 | 16 |
|
| - | - | 66,670,183 | - | - | 30,621,615 | 36,048,568 | 44,476,967 | 22,193,216 | 43,662,066 | 22,980,017 | 28,100 |
| Sensitivity | - | - | - | - | - | 0.6886 | 1.0000 | 1.0000 | 0.9997 | 0.9825 | 1.0000 | - |
| Specificity | - | - | - | - | - | 1.0000 | 0.6886 | 0.9997 | 1.0000 | 1.0000 | 0.9825 | - |
| Accuracy | - | - | - | - | - | 0.7923 | 0.7923 | 0.9999 | 0.9999 | 0.9883 | 0.9883 | - |
| Precision | - | - | - | - | - | 1.0000 | 0.6159 | 0.9998 | 1.0000 | 1.0000 | 0.9661 | - |
Cross-mapping: reads of each individual library were mapped to the reference genome of the other organism.
Sequential Analysis: The library was first mapped to one reference genome, reads that fail to map to the first genome were mapped to the other genome. Eukaryote 1st/Prokaryote 1st indicates the first reference used.
Combined analysis: libraries were mapped to a merged file containing both reference genomes (Combined Reference).
Number of reads mapped to tRNA, rRNA, and coding loci (CDS) according to the mapping methodology used, with the mapping parameters of 0.8 of minimum length fraction and 0.8 of minimum similarity fraction.
| Mapping strategy | Library | Reference used to count the reads | Number of reads mapped to | Unmapped reads | Proportion of multi-reads from total | |||
|---|---|---|---|---|---|---|---|---|
| tRNA | rRNA | CDS loci | ||||||
| Unique | Multi | |||||||
| Direct Mapping |
|
| 1,423,990 | 31,630,385 | 9,005,409 | 79,429 | 2,330,095 | 0.18% |
|
|
| 1,692 | 3,003 | 20,163,387 | 888,259 | 1,144,534 | 4.00% | |
| Eukaryote 1st | Chimera Library |
| 1,181,068 | 21,550,448 | 6,052,838 | 31,666 | 1,805,595 | 0.10% |
|
| 89,216 | 3,254,994 | 24,843,247 | 2,591,042 | 5,270,069 | 7.19% | ||
| Prokaryote 1st | Chimera Library |
| 1,423,992 | 31,631,168 | 9,010,911 | 80,116 | 2,330,780 | 0.18% |
|
| 1,686 | 2,255 | 20,157,930 | 887,162 | 1,144,183 | 4.00% | ||
| Combined Analysis | Chimera Library |
| 1,419,674 | 31,304,115 | 8,591,366 | 59,339 | 2,287,572 | 0.14% |
|
| 1,971 | 79,917 | 20,530,853 | 1,052,003 | 1,315,273 | 4.58% | ||
Comparison of the number of reads incorrectly mapped due to cross-mapping, with the mapping parameters of 0.8 of minimum length fraction and 0.8 of minimum similarity fraction. Reads that incorrectly mapped to the reference genome were counted using the annotated genome indicated on the table. The unmapped reads are a result of the counting parameters that eliminate reads that mapped in more than five loci and of the intergenic regions.
| Library | Reference Used to Map the Reads | Reference Used to Count the Cross-Mapped Reads | Number of Reads Mapped to | CDS* | Unmapped reads | ||
|---|---|---|---|---|---|---|---|
| tRNA | rRNA | CDS Loci | |||||
|
|
|
| 242,922 | 10,079,937 | 3,000,334 | 4,553 | 524,500 |
|
| 87,524 | 3,251,991 | 6,382,643 | 23,216 | 4,125,535 | ||
| Combined Reference |
| 4,156 | 320,514 | 413,822 | 3,299 | 41,064 | |
|
| 279 | 77,231 | 531,276 | 3,298 | 170,770 | ||
|
|
|
| 6 | 748 | 6,554 | 72 | 351 |
|
| 2 | 783 | 6,189 | 65 | 685 | ||
| Combined Reference |
| 0 | 308 | 57 | 43 | 29 | |
|
| 0 | 329 | 59 | 49 | 6 | ||
Figure 2Percentage of reads mapped to (A) Bradyrhizobium elkanii or (B) Glycine max depending on the methodology used in the Glycine max – Bradyrhizobium elkanii experiment. Bars indicate twice the Standard Error. BR16 and ER48: soybean varieties BR16 and Embrapa 48, respectively.
Figure 3Percentage of reads mapped to (A) Fusarium verticillioides or (B) Zea mays depending on the methodology used in the Zea mays –Fusarium verticillioides experiment. Bars indicate twice the Standard Error.
Number of reads mapped to tRNA, rRNA, and coding loci (CDS) according to the mapping methodology, with the mapping parameters of 0.8 of minimum length fraction and 0.8 of minimum similarity fraction, and experiment used. BR16 and ER48: soybean varieties BR16 and Embrapa 48, respectively. CO354: susceptible maize variety CO354 inoculated with F. verticillioides from Lanubile .
| Samples | Biological Repetition | Mapping Strategy | Reference Used to Count the Reads | Number of Reads Mapped to | Unmapped reads | Proportion of Multireads from total | |||
|---|---|---|---|---|---|---|---|---|---|
| tRNA | rRNA | CDS loci | |||||||
|
| Unique | Multi | |||||||
| BR16 | I | Eukaryote 1st |
| 9,262 | 458,414 | 1,386,492 | 5,505,200 | 275,078 | 72.11% |
| II |
| 14,526 | 524,794 | 1,553,300 | 2,218,202 | 298,956 | 48.12% | ||
| I |
| 6,140 | 137,368 | 18,423 | 408 | 1,349 | 0.25% | ||
| II |
| 8,431 | 153,677 | 20,804 | 318 | 2,025 | 0.17% | ||
| ER48 | I |
| 7,486 | 284,116 | 853,250 | 34,198,215 | 161,811 | 96.32% | |
| II |
| 12,566 | 400,613 | 1,496,084 | 4,115,974 | 263,977 | 65.44% | ||
| I |
| 5,423 | 87,283 | 8,217 | 194 | 1,057 | 0.19% | ||
| II |
| 5,219 | 64,293 | 16,609 | 524 | 1,401 | 0.60% | ||
| BR16 | I | Prokaryote 1st |
| 7,784 | 337,378 | 1,381,270 | 5,493,766 | 271,697 | 73.33% |
| II |
| 11,509 | 383,520 | 1,547,288 | 5,502,769 | 293,615 | 71.11% | ||
| I |
| 8,597 | 262,024 | 30,995 | 919 | 3,704 | 0.30% | ||
| II |
| 13,152 | 301,072 | 34,724 | 1,183 | 6,201 | 0.33% | ||
| ER48 | I |
| 5,803 | 209,395 | 848,055 | 3,406,326 | 158,528 | 73.60% | |
| II |
| 10,145 | 321,792 | 1,485,462 | 4,091,851 | 258,616 | 66.34% | ||
| I |
| 8,443 | 165,541 | 20,871 | 703 | 4,387 | 0.35% | ||
| II |
| 9,329 | 147,323 | 43,393 | 2,016 | 7,363 | 0.96% | ||
| BR16 | I | Combined Analysis |
| 8,571 | 420,015 | 1,384,623 | 5,504,514 | 273,168 | 72.51% |
| II |
| 13,227 | 477,056 | 1,550,710 | 5,517,511 | 295,728 | 70.25% | ||
| I |
| 7,097 | 172,756 | 18,760 | 493 | 1,595 | 0.25% | ||
| II |
| 10,299 | 200,206 | 21,413 | 429 | 2,236 | 0.18% | ||
| ER48 | I |
| 6,617 | 252,998 | 850,854 | 3,418,620 | 159,801 | 72.91% | |
| II |
| 11,597 | 378,032 | 1,493,400 | 4,114,632 | 260,854 | 65.74% | ||
| I |
| 6,907 | 118,911 | 8,806 | 251 | 1,166 | 0.18% | ||
| II |
| 6,872 | 89,370 | 17,570 | 631 | 1,613 | 0.54% | ||
|
| |||||||||
| CO354 | I | Maize 1st |
| 257 | 46,053 | 63,605,211 | 5,020,302 | 1,973,035 | 7.11% |
| II |
| 280 | 43,537 | 64,060,395 | 5,074,624 | 2,015,545 | 7.13% | ||
| III |
| 279 | 36,277 | 55,320,528 | 4,407,818 | 1,690,684 | 7.17% | ||
| I |
| 47 | 509 | 2,620,154 | 218,485 | 150,957 | 7.31% | ||
| II |
| 23 | 293 | 1,424,365 | 119,595 | 83,640 | 7.35% | ||
| III |
| 53 | 526 | 3,381,869 | 282,827 | 196,770 | 7.32% | ||
| CO354 | I |
|
| 257 | 35,577 | 63,061,802 | 4,983,688 | 1,678,420 | 7.14% |
| II |
| 280 | 35,190 | 63,559,272 | 5,042,606 | 1,713,630 | 7.17% | ||
| III |
| 279 | 26,101 | 54,799,352 | 4,372,544 | 1,446,308 | 7.21% | ||
| I |
| 48 | 583 | 3,139,676 | 276,241 | 458,718 | 7.13% | ||
| II |
| 23 | 369 | 1,903,321 | 170,812 | 396,794 | 6.91% | ||
| III |
| 53 | 601 | 3,879,192 | 339,538 | 453,663 | 7.27% | ||
| CO354 | I | Combined Analysis |
| 257 | 38,312 | 63,519,081 | 5,007,715 | 1,960,084 | 7.10% |
| II |
| 280 | 37,882 | 63,994,094 | 5,066,322 | 2,006,113 | 7.13% | ||
| III |
| 279 | 28,336 | 55,232,851 | 4,393,751 | 1,677,822 | 7.16% | ||
| I |
| 48 | 516 | 2,629,064 | 221,907 | 166,912 | 7.35% | ||
| II |
| 23 | 291 | 1,417,789 | 121,161 | 94,107 | 7.42% | ||
| III |
| 53 | 533 | 3,403,439 | 287,282 | 213,560 | 7.36% | ||