| Literature DB >> 28140388 |
Daniel Blande1, Pauliina Halimaa1, Arja I Tervahauta1, Mark G M Aarts2, Sirpa O Kärenlampi1.
Abstract
Noccaea caerulescens of the Brassicaceae family has become the key model plant among the metal hyperaccumulator plants. Populations/accessions of N. caerulescens from geographic locations with different soil metal concentrations differ in their ability to hyperaccumulate and hypertolerate metals. Comparison of transcriptomes in several accessions provides candidates for detailed exploration of the mechanisms of metal accumulation and tolerance and local adaptation. This can have implications in the development of plants for phytoremediation and improved mineral nutrition. Transcriptomes from root and shoot tissues of four N. caerulescens accessions with contrasting Zn, Cd and Ni hyperaccumulation and tolerance traits were sequenced with Illumina Hiseq2000. Transcriptomes were assembled using the Trinity de novo assembler and were annotated and the protein sequences predicted. The comparison against the BUSCO plant early release dataset indicated high-quality assemblies. The predicted protein sequences have been clustered into ortholog groups with closely related species. The data serve as important reference sequences in whole transcriptome studies, in analyses of genetic differences between the accessions and other species, and for primer design.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28140388 PMCID: PMC5283065 DOI: 10.1038/sdata.2016.131
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Overview of data processing.
Raw reads (1) were assembled using the Trinity Assembler (2) at two kmer values: 25 and 32. Assembly quality was assessed using BUSCO and TransRate (3) utilising external sequence and protein data along with initial raw read sequences. A final assembly was then chosen for each accession (4). For MP accession, reads were also subsampled to the same read depth using seqtk (5) and assembled at both read depths. The predicted protein sequences were obtained using Transdecoder (6). Blast searches were carried out on the protein and transcript sequences against the uniprot and uniref databases (7). These were then combined into an annotation using Trinotate (8). Protein sequences were also clustered into orthogroups using OrthoFinder (9) and protein sequences from other plant species. A multiple alignment was produced from each orthogroup using Muscle (10). Key—Yellow, input data; blue, processing steps; orange, intermediate data/files produced during the process; green, data from public databases; red, final output data.
Raw number of reads for each accession.
| Ganges (GA) | 104697851 |
| La Calamine (LC) | 103109619 |
| Lellingen (LE) | 105026919 |
| Monte Prinzera (MP) | 219339925 |
Assembly quality metrics.
| A subset of the assembly quality metrics calculated by TransRate using the trimmed read sequences and | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| No of transcripts | 73,139 | 40,440 | 23,725 | 65,998 | 37,718 | 71,508 | 41,307 | 108,623 | 48,400 | 74,623 | 46,505 |
| % fragments mapped | 92 | 94 | 82 | 92 | 95 | 92 | 95 | 91 | 86 | 90 | 93 |
| % good mappings | 82 | 84 | 71 | 83 | 84 | 83 | 84 | 79 | 74 | 79 | 80 |
| % bases uncovered | 24 | 0 | 6 | 25 | 0 | 23 | 0 | 15 | 0 | 15 | 0 |
| Comparative metrics | |||||||||||
| % contigs with CRBB | 48 | 51 | 76 | 51 | 55 | 46 | 50 | 27 | 44 | 39 | 44 |
| % refs with CRBB | 60 | 58 | 49 | 60 | 59 | 60 | 59 | 60 | 59 | 59 | 59 |
| Reference coverage | 60 | 59 | 37 | 60 | 59 | 60 | 60 | 61 | 59 | 60 | 59 |
| TransRate assembly score | 0.2343 | 0.4564 | 0.3438 | 0.2367 | 0.4666 | 0.2587 | 0.4607 | 0.2746 | 0.3795 | 0.2755 | 0.4183 |
| % good contigs | 66 | 80 | 81 | 70 | 77 | 71 | 81 | 75 | 76 | 68 | 75 |
| % complete | 92 | 90 | 62 | 93 | 91 | 93 | 90 | 93 | 91 | 93 | 90 |
| % duplicated | 47 | 21 | 18 | 45 | 20 | 44 | 21 | 41 | 23 | 39 | 23 |
| % fragmented | 1.5 | 1.4 | 16 | 1.6 | 2.0 | 1.4 | 2.4 | 1.1 | 1.7 | 0.9 | 2.3 |
| % missing | 5.5 | 7.7 | 20 | 4.6 | 6.7 | 5.2 | 7.2 | 5.7 | 6.9 | 5.3 | 7.1 |
BUSCO quality metrics after assembly filtering.
| No of transcripts | 40,440 | 28,885 | 37,718 | 28,655 | 41,307 | 28,745 | 46,505 | 28,599 |
| % complete | 90 | 90 | 91 | 90 | 90 | 90 | 90 | 90 |
| % duplicated | 21 | 20 | 20 | 20 | 21 | 20 | 23 | 22 |
| % fragmented | 1.4 | 1.4 | 2 | 2.3 | 2.4 | 2.4 | 2.3 | 2.1 |
| % missing | 7.7 | 7.9 | 6.7 | 6.9 | 7.2 | 7.4 | 7.1 | 7.4 |
Description of samples that have been submitted to the NCBI Sequence Read Archive.
| 1 | GA Root | SRR3742999 | SAMN05335705 | GA3KR |
| 2 | SRR3743000 | SAMN05335706 | GA4KR | |
| 3 | SRR3743011 | SAMN05335707 | GA6KR | |
| 4 | GA Shoot | SRR3743016 | SAMN05335708 | GA3KS |
| 5 | SRR3743017 | SAMN05335709 | GA4KS | |
| 6 | SRR3743018 | SAMN05335710 | GA6KS | |
| 7 | LC Root | SRR3743019 | SAMN05335711 | LC3KR |
| 8 | SRR3743020 | SAMN05335712 | LC4KR | |
| 9 | SRR3743021 | SAMN05335713 | LC6KR | |
| 10 | LC Shoot | SRR3743022 | SAMN05335714 | LC3KS |
| 11 | SRR3743001 | SAMN05335715 | LC4KS | |
| 12 | SRR3743002 | SAMN05335716 | LC6KS | |
| 13 | LE Root | SRR3743003 | SAMN05335717 | LE3KR |
| 14 | SRR3743004 | SAMN05335718 | LE4KR | |
| 15 | SRR3743005 | SAMN05335719 | LE6KR | |
| 16 | LE Shoot | SRR3743006 | SAMN05335720 | LE3KS |
| 17 | SRR3743007 | SAMN05335721 | LE4KS | |
| 18 | SRR3743008 | SAMN05335722 | LE6KS | |
| 19 | MP Root | SRR3743009 | SAMN05335723 | MP3KR |
| 20 | SRR3743010 | SAMN05335724 | MP4KR | |
| 21 | SRR3743012 | SAMN05335725 | MP6KR | |
| 22 | MP Shoot | SRR3743013 | SAMN05335726 | MP3KS |
| 23 | SRR3743014 | SAMN05335727 | MP4KS | |
| 24 | SRR3743015 | SAMN05335728 | MP6KS |
Description of the Accession numbers for the sequences that have been submitted to the NCBI Transcriptome Shotgun Assembly Sequence Database.
| GA assembly | 1–6 | SRR3742999SRR3743000SRR3743011SRR3743016SRR3743017SRR3743018 | GEVI00000000 |
| LC Assembly | 7–12 | SRR3743019SRR3743020SRR3743021SRR3743022SRR3743001SRR3743002 | GEVK00000000 |
| LE Assembly | 13–18 | SRR3743003SRR3743004SRR3743005SRR3743006SRR3743007SRR3743008 | GEVL00000000 |
| MP Assembly | 19–24 | SRR3743009SRR3743010SRR3743012SRR3743013SRR3743014SRR3743015 | GEVM00000000 |
Comparison of assembled sequences to sequences available in Genbank.
| Pairwise amino acid identity and the length of the longest and shortest sequence are reported. | ||||
|---|---|---|---|---|
| nicotianamine synthaseGA_TR9812_c0_g1_i1_m.31802 gi|27528464|emb|CAC82913.1| | 2 | 99.7 | 322 | 321 |
| ZIP-like zinc transporter ZNT1 GA_TR13622|c0_g1_i1|m.43014 gi|1003366144|gb|AMO45683.1| | 2 | 99.3 | 408 | 408 |
| YSL transporter 2GA_TR17962_c0_g1_i1_m.57647 gi|82468793|gb|ABB76762.1| gi|86559333|gb|ABD04074.1| | 3 | 99.8699.86 | 716 | 716 |
| YSL transporter 3GA_TR18642_c0_g1_i1_m.60069 gi|82468795|gb|ABB76763.1| gi|86559335|gb|ABD04075.1| | 3 | 99.799.85 | 672 | 672 |
| YSL transporter 1GA_TR19192_c0_g1_i1_m.61490 gi|82468791|gb|ABB76761.1| gi|86559337|gb|ABD04076.1| | 3 | 100100 | 693 | 693 |
| heavy metal ATPase 4GA_TR19259_c0_g1_i1_m.62343 gi|391225627|gb|AFM38012.1| gi|391225629|gb|AFM38013.1| gi|391225631|gb|AFM38014.1| | 4 | 83.9283.8981.81 | 1194 | 1090 |
| heavy metal transporterGA_TR20593_c0_g1_i1_m.68485 gi|66394766|gb|AAY46197.1| | 2 | 100 | 387 | 387 |
| hypothetical proteinGA_TR21001_c0_g1_i1_m.69807 gi|91680661|emb|CAI77926.2| | 2 | 86.8 | 352 | 349 |
| putative Fe(II) transporter—IRT1GA_TR21885_c0_g1_i1_m.72011 gi|16304676|emb|CAC86382.1| | 2 | 90.1 | 346 | 312 |
| ZIP-like zinc transporter—ZNT1 (ATZIP4 homolog)LC_TR1212_c10_g1_i1_m.3330gi|14582255|gb|AAK69429.1|AF275751_1 gi|1003366140|gb|AMO45681.1| | 3 | 10099.51 | 408 | 408 |
| metal transporter NRAMP3LC_TR1754_c0_g1_i1_m.5997gi|149688670|gb|ABR27746.1| | 2 | 99.2 | 512 | 512 |
| heavy metal ATPase 4LC_TR10517_c0_g1_i1_m.37057gi|391225623|gb|AFM38010.1|gi|391225625|gb|AFM38011.1| | 3 | 98.999.7 | 1187 | 1186 |
| ZIP-like zinc transporter ZNT2 (ATZIP4 homolog)LC_TR11232_c0_g1_i1_m.39479gi|14582257|gb|AAK69430.1|AF275752_1 | 2 | 100 | 422 | 422 |
| nicotianamine synthase 4LC_TR12807|c0_g1_i1|m.44700gi|333733184|gb|AEF97346.1|, | 2 | 100 | 322 | 322 |
| chloroplast carbonic anhydrase precursorLC_TR15339_c0_g1_i1_m.51902gi|45451864|gb|AAS65454.1| | 2 | 99.1 | 336 | 333 |
| metal transporter NRAMP4LC_TR15506_c0_g1_i1_m.53093gi|149688672|gb|ABR27747.1|, | 2 | 99.6 | 511 | 497 |
| zinc transporter—ZTP1 (ATMTP1 homolog)LC_TR19215_c0_g1_i1_m.64186gi|14582253|gb|AAK69428.1|AF275750_1 | 2 | 99.7 | 396 | 396 |