| Literature DB >> 25101286 |
Umberto Rosani1, Stefania Domeneghetti1, Alberto Pallavicini2, Paola Venier1.
Abstract
Next generation sequencing (NGS) allows fast and massive production of both genome and transcriptome sequence datasets. As the genome of the Mediterranean mussel Mytilus galloprovincialis is not available at present, we have explored the possibility of reducing the whole genome sequencing efforts by using capture probes coupled with PCR amplification and high-throughput 454-sequencing to enrich selected genomic regions. The enrichment of DNA target sequences was validated by real-time PCR, whereas the efficacy of the applied strategy was evaluated by mapping the 454-output reads against reference transcript data already available for M. galloprovincialis and by measuring coverage, SNPs, number of de novo sequenced introns, and complete gene sequences. Focusing on a target size of nearly 1.5 Mbp, we obtained a target coverage which allowed the identification of more than 250 complete introns, 10,741 SNPs, and also complete gene sequences. This study confirms the transcriptome-based enrichment of gDNA regions as a good strategy to expand knowledge on specific subsets of genes also in nonmodel organisms.Entities:
Mesh:
Year: 2014 PMID: 25101286 PMCID: PMC4101229 DOI: 10.1155/2014/538549
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Genome size of sequenced mollusk species.
| Species name |
| Length (Gbp) |
|---|---|---|
|
| 0.91 | 1.00 |
|
| 0.43 | 0.35 |
|
| / | 1.20 |
|
| 1.8/2.0 | 0.74 |
|
| 1.42 | 1.40 |
C-values (pg) are summarized according to http://www.genomesize.com.
Sequencing output data and summary of de novo assembling results.
| Sequencing output | RUN_1 | RUN_2 | RUN_1 + 2 | |||
|---|---|---|---|---|---|---|
| Total reads | 472,122 | 473,409 |
| |||
| Total high quality reads | 287,362 | 339,407 |
| |||
| Average length (bp) | 380 | 114 |
| |||
| On-target reads (number and %) | 179,201 | 62% | 175,432 | 52% |
|
|
| Covered targets (number and %) | 1,262 | 83% | 1,032 | 68% |
|
|
|
| ||||||
|
| On-target reads | Off-target reads | All dataset | |||
|
| ||||||
| Total contigs | 5,547 | 12,423 |
| |||
| Total assembled reads (number and %) | 279,922 | 79% | 347,439 | 45% |
|
|
| Average contig length (bp) | 490 | 476 |
| |||
| N50 (bp) | 557 | 523 |
| |||
| N75 (bp) | 388 | 402 |
| |||
| Longest contig (bp) | 2,234 | 3,538 |
| |||
| Contigs with blast annotation | 28% | 21% |
| |||
Total raw and HQ reads, average length (bp), number of mapped reads (on-target), and covered contigs are reported for the subsets (RUN_1, RUN_2) and total sequenced data (RUN_1 + 2).
De novo assembly of the reads on-target and off-target and of the whole dataset. Number of resulting contigs and related reads, contig length, quality parameters, longest contig, and percentage of annotated contigs are reported.
Figure 1(a) Target coverage distribution. Number of targets per coverage class. (b) Length percentage of the genomic contigs covered by RNA-seq reads.
Enrichment fold and coverage of selected transcripts.
| ID | Status | Enrichment fold (RT_PCR) | Sequenced reads (NGS) |
|---|---|---|---|
| MGC04518 | On-target | 32 | 22 |
| MGC00300 | On-target | 60 | 45 |
| MCG05878 | On-target | 2 | 7 |
| Target 4 | Off-target | −867 | / |
Enrichment real-time analysis was performed on 3 targets and on 1 not selected transcript (target 4).
Enrichment fold was measured in qRT-PCR by comparing the DNA library before and after enrichment and, subsequently, by reporting the number of reads that mapped uniquely on the targets.
Figure 2(a) Genomic contigs resulting from de novo assembling and evaluated for confirmatory Sanger sequencing. Exon and intron regions are depicted in green and blue, respectively. Pink arrows represent the primer positions. (b) Predicted mytilin C gene: (A) gene structure with exons (green), introns (red), and primer positions (pink); (B) mapping of genomic reads with related coverage graph; (C) mapping of transcriptomic reads with related coverage graph; (D) variant positions detected on genomic sequences; (E) variant positions detected on transcriptomic sequences.
Fully sequenced introns.
| Total contigs with introns | 204 |
| Total introns | 263 |
| Total intron length (bp) | 110,643 |
| Average intron length (bp) | 434 |
| Maximum intron length (bp) | 1,008 |
| Minimum intron length (bp) | 100 |
Figure 3Partial LITAF gene structure. Alignment of the three targeted LITAF transcripts, translated into amino acids. Conservation graph shows the high conservation of the 3′ region, located entirely in one exon. The 5′ region displays a lower conservation level.
SNP identification in genomic and transcriptomic data of M. galloprovincialis.
| Total SNPs | Total contigs with SNPs | SNP frequency (%) | SNPs in exons (%) | |
|---|---|---|---|---|
| Genome | 10,741 | 2,326 | 0.71 | 0.58 |
| Transcriptome | 13,821 | 2,057 | 0.96 | 0.87 |
| Common | 1,135 | 447 | 0.31 | / |
Figure 4SNP frequency in mussel exons. Predicted frequency of SNPs in exons using transcriptomic data (T, in blue), genomic data (G, in red), and the common SNP dataset (C, in green).
Overview on Mytilus AMPs data.
| AMP name | ID NCBI | Reads | Sequence length (NCBI) (bp) | Sequence extension (bp) | SNPs (genomic) | SNPs (transcriptomic) | Common SNPs |
|---|---|---|---|---|---|---|---|
| Mytilin B | AF177540 | 271 | 3,125 | 0 | 17 | 9 | 3 |
| Mytilin C | / | 130 | / | 1,834 | 7 | 23 | 4 |
| Mytilin D | EU810204 | 53 | / | 1,165 | 8 | 5 | 3 |
| Myticin A | / | 95 | / | 1,650 | 87 | 54 | 17 |
| Myticin B | EU088427 | 72 | 2,775 | 0 | 20 | 30 | 14 |
| Myticin C | EU927419 | 163 | 1,409 | 466 | 61 | 78 | 26 |
Selection of mussel AMPs listed by name, NCBI ID (if present), number of aligned reads, length of public available sequences (bp), sequence elongation (bp), and number of genomic, transcriptomic, and common SNCs.