| Literature DB >> 32127542 |
João P Marques1,2,3, Graciela Sotelo1, Juan Galindo4, Pragya Chaube5, Diana Costa1, Sandra Afonso1, Marina Panova6, Katja Nowick7, Roger Butlin5,6, Johan Hollander8,9, Rui Faria10,11,12.
Abstract
The flat periwinkles, Littorina fabalis and L. obtusata, comprise two sister gastropod species that have an enormous potential to elucidate the mechanisms involved in ecological speciation in the marine realm. However, the molecular resources currently available for these species are still scarce. In order to circumvent this limitation, we used RNA-seq data to characterize the transcriptome of four individuals from each species sampled in different locations across the Iberian Peninsula. Four de novo transcriptome assemblies were generated, as well as a pseudo-reference using the L. saxatilis reference transcriptome as backbone. After transcripts' annotation, variant calling resulted in the identification of 19,072 to 45,340 putatively species-diagnostic SNPs. The discriminatory power of a subset of these SNPs was validated by implementing an independent genotyping assay to characterize reference populations, resulting in an accurate classification of individuals into each species and in the identification of hybrids between the two. These data comprise valuable genomic resources for a wide range of evolutionary and conservation studies in flat periwinkles and related taxa.Entities:
Mesh:
Year: 2020 PMID: 32127542 PMCID: PMC7054417 DOI: 10.1038/s41597-020-0408-8
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Flat periwinkles dwelling on Fucus spp. in the rocky intertidal of Galicia, Spain.
Fig. 2Flowchart of our pipeline. Overview of the two main approaches employed to analyse the RNA-seq data. In orange the processes common to both pipelines.
Sampling information. Shown is the location and respective coordinates, collection date, sample size (N) and species for transcriptome sequencing (locations 1 to 8), as well as for SNP genotyping (locations 9 to 11).
| Location | Collection date | Species | Coordinates | ||
|---|---|---|---|---|---|
| Latitude | Longitude | ||||
| 1. Cangas | 26.11.2014 | 1 | 42°15′21″N | 8°47′16″W | |
| 2. Baiona | 21.11.2014 | 1 | 42°07′28″N | 8°50′52″W | |
| 3. Rio de Moinhos | 20.11.2014 | 1 | 41°34′00″N | 8°47′50″W | |
| 4. Mindelo | 09.12.2014 | 1 | 41°18′36″N | 8°44′33″W | |
| 5. Abelleira | 22.11.2014 | 1 | 42°47′53″N | 9°01′30“W | |
| 6. Muros | 22.11.2014 | 1 | 42°44′34″N | 8°58'53″W | |
| 7. Tirán | 26.11.2014 | 1 | 42°15′49″N | 8°45'16″W | |
| 8. Samil | 27.11.2014 | 1 | 42°13′22″N | 8°46′25″W | |
| 9. Redondela | 03.07.2015 | 12 | 42°17′15“N | 8°37′22″W | |
| 10. Canido | 18.10.2012 | 12 | 42°11′32″N | 8°48′19″W | |
| 11. Cabo do Mundo | 16.11.2012; 10.09.2014; 19.03.2015 | 12 | Admixed* | 41°13′33″N | 8°43′03″W |
*Previously analysed by Carvalho et al.[7] and Costa et al.[8].
Fig. 3Map of the sampling locations. Samples were collected from the Northwestern Iberia (a). Zoom in of the sampling area is shown in (b). L. obtusata sampling sites are shown in blue, L. fabalis sampling sites are shown in orange, and the site where hybrids were previously described is marked with both colours. Circles represent sampling sites for transcriptome sequencing, while squares represent sampling sites for SNP genotyping/validation.
Information of the samples sequenced for the transcriptome and deposited in the NCBI database, including sample ID, species and population of origin, tissues used for RNA extraction, number of raw reads obtained and NCBI BioSample ID.
| NCBI Biosample ID | Library ID | Species (population) | Tissue | Raw Reads |
|---|---|---|---|---|
| fSAM (SAMN12385853) | G03R12 | Whole body* | 119,228,416 | |
| fABE (SAMN12385854) | G04R13 | Whole body* | 79,516,812 | |
| fTIR (SAMN12385855) | G10R14 | Whole body* | 98,014,382 | |
| fMUR (SAMN12385856) | G11R15 | Whole body* | 93,801,878 | |
| oMOI (SAMN12385849) | G01R02 | Whole body* | 61,668,390 | |
| oCAN (SAMN12385850) | G07R05 | Whole body* | 69,638,066 | |
| oBAI (SAMN12385851) | G08R06 | Whole body* | 82,217,966 | |
| oMIN (SAMN12385852) | G09R07 | Whole body* | 132,446,602 |
*Includes all soft tissues except hepatopancreas.
Number and percentage of cleaned reads and mapping statistics summary.
| NCBI Biosample ID | Cleaned reads | Reference | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Pseudo-reference | (%) | Obt4ind | (%) | Fab4ind | (%) | Obt_oMIN G9R07 | (%) | Fab_oSAM G3R12 | (%) | ||
| fSAM (SAMN12385853) | 114,099,676 | 34,647,272 | 30 | 29,904,860 | 26 | 31,621,272 | 28 | 25,236,448 | 22 | 35,806,196 | 31 |
| fABE (SAMN12385854) | 76,361,344 | 25,830,522 | 34 | 18,191,808 | 24 | 20,130,386 | 26 | 16,197,512 | 21 | 23,067,304 | 30 |
| fTIR (SAMN12385855) | 94,063,264 | 33,652,190 | 36 | 22,255,784 | 24 | 25,529,056 | 27 | 20,562,192 | 22 | 29,081,596 | 31 |
| fMUR (SAMN12385856) | 89,686,822 | 36,978,396 | 41 | 25,514,228 | 28 | 30,045,284 | 34 | 23,484,314 | 26 | 34,513,156 | 38 |
| oMOI (SAMN12385849) | 59,319,362 | 22,235,674 | 37 | 15,294,270 | 26 | 17,299,842 | 29 | 13,750,790 | 23 | 18,962,054 | 32 |
| oCAN (SAMN12385850) | 66,962,082 | 20,748,762 | 31 | 15,014,942 | 22 | 17,287,946 | 26 | 12,765,034 | 19 | 17,849,418 | 27 |
| oBAI (SAMN12385851) | 78,796,744 | 26,999,034 | 34 | 18,335,338 | 23 | 22,215,720 | 28 | 16,190,684 | 21 | 23,677,716 | 30 |
| oMIN (SAMN12385852) | 125,795,370 | 52,877,348 | 42 | 34,999,924 | 28 | 42,037,964 | 33 | 33,964,752 | 27 | 47,661,904 | 38 |
| Total | 705,084,664 | 253,969,198 | 179,511,154 | 164,129,506 | 162,151,726 | 230,619,344 | |||||
Summary statistics for the transcriptome pseudo-reference assembly.
| Periwinkles Pseudo-reference | Value |
|---|---|
| Raw reads | 736,532,512 |
| Cleaned reads | 705,084,664 |
| Number of contigs | 37,873 |
| Largest (bp) | 18,165 |
| Smallest (bp) | 297 |
| N50 (bp) | 2,963 |
| Mean (bp) | 2,139 |
| Swiss-Prot annotated transcripts (%) | 29 |
Summary statistics of the Littorina obtusata and L. fabalis transcriptome assemblies.
| Samples | ||||
|---|---|---|---|---|
| fSAM_G03R12 | oMIN_G09R07 | 4ind# | 4ind# | |
| Raw reads | 119,228,416 | 132,446,602 | 390,561,488 | 345,971,024 |
| Clean reads | 114,099,676 | 125,795,370 | 374,211,106 | 330,873,558 |
| Number of contigs | 177,022 | 208,362 | 396,047 | 349,459 |
| Largest (bp) | 15,009 | 16,581 | 17,410 | 29,198 |
| Smallest (bp) | 201 | 201 | 201 | 201 |
| N50 (bp) | 715 | 1,159 | 832 | 1,133 |
| Mean (bp) | 566 | 743 | 612 | 721 |
| Number of contigs | 141,456 | 102,412 | 325,814 | 186,239 |
| Largest (bp) | 15,009 | 11,334 | 17,410 | 29,198 |
| Smallest (bp) | 201 | 201 | 201 | 201 |
| N50 (bp) | 696 | 773 | 676 | 668 |
| Mean (bp) | 558 | 607 | 547 | 546 |
| Number of contigs | 133,230 | 101,042 | 225,829 | 180,798 |
| Largest (bp) | 15,009 | 11,334 | 17,410 | 29,198 |
| Smallest (bp) | 201 | 201 | 201 | 201 |
| N50 (bp) | 660 | 775 | 799 | 665 |
| Mean (bp) | 540 | 607 | 612 | 544 |
| Number of contigs | 31,279 | 24,047 | 53,214 | 32,433 |
| Largest (bp) | 11,904 | 11,232 | 13,317 | 28,629 |
| Smallest (bp) | 255 | 258 | 255 | 258 |
| N50 (bp) | 801 | 912 | 1,080 | 861 |
| Mean (bp) | 685 | 740 | 818 | 707 |
| 17 | 25 | 27 | 34 | |
*Due to the complexity of the L. fabalis assembly based on multiple individuals, the order of the curation steps was reversed.
#Based on the data from the four individuals from each species described in Table 2.
Fig. 4Principal Component Analysis (PCA) for the eight flat periwinkle samples sequenced for the transcriptome based on a total of 7,061 SNPs randomly sampled from independent transcripts.
Fig. 5Admixture plot showing the membership of each individual sequenced for the transcriptome to the two genetic clusters. Sample codes are the same as in Table 1. No signatures of admixture were found in these individuals. The analysis was based on 7,061 random SNPs from different transcripts.
Fig. 6Structure plot showing the membership of each individual genotyped for a subset of putatively diagnostic SNPs to each genetic cluster. Each column represents one individual. Signatures of admixture were only found in Cabo do Mundo, whereas individuals from Redondela (L. obtusata) and Canido (L. fabalis) represent two different genetic clusters.
| Measurement(s) | RNA • sequence_assembly • sequence feature annotation • SNP |
| Technology Type(s) | RNA sequencing • sequence assembly process • sequence annotation • single-nucleotide polymorphism analysis |
| Factor Type(s) | geographic location • species |
| Sample Characteristic - Organism | Littorina fabalis • Littorina obtusata |
| Sample Characteristic - Environment | marine biome |
| Sample Characteristic - Location | Iberian Peninsula |