| Literature DB >> 32532215 |
Jingping Fang1,2,3,4, Andrew Michael Wood4, Youqiang Chen1,2, Jingjing Yue3, Ray Ming5,6.
Abstract
BACKGROUND: The safety of genetically transformed plants remains a subject of scrutiny. Genomic variants in PRSV resistant transgenic papaya will provide evidence to rationally address such concerns.Entities:
Keywords: Carica papaya L.; Genomic variation; Nuclear mitochondria DNA (NUMT); Nuclear plastid DNA (NUPT); Whole-genome resequencing
Year: 2020 PMID: 32532215 PMCID: PMC7291442 DOI: 10.1186/s12864-020-06804-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Papaya Sunset genome-wide sequencing and mapping statistics
| Sunset genome wide | ||
|---|---|---|
| Total read count | 74,169,662 | |
| Read length (bp) | 124 | |
| Total read length (Gb) | 9.197 | |
| Average coverage (×) | 24.72 | |
| Remove multiple mapping and duplicates | Total read count | 48,170,821 |
| Mapped read count | 48,154,999 | |
| Mapped read rate (%) | 99.97 | |
| Unmapped read count | 15,822 | |
| Properly paired read count | 46,139,627 | |
| Properly paired read rate (%) | 95.78 |
Number of homo/hetero SNPs and InDels detected before and after data filtering
| Raw | DP10-100Q50 | |
|---|---|---|
| Homo SNPs | 83,926 | 40,871 |
| Hetero SNPs | 603,970 | 269,493 |
| 687,896 | ||
| Homo InDels | 41,218 | 19,135 |
| Hetero InDels | 29,504 | 14,936 |
| 70,722 | ||
| 758,618 |
Notes: (a): Validated depth and quality. DP10-100Q50: The variant calls with read depths of < 10 or > 100 and polymorphism sites of quality < 50 were filtered out
Summary of polymorphisms between SunUp and Sunset
| Chrom. | Total size(bp) | No.of SNPs | No.of InDels | SNP per 1 kb | In/Del per 1 kb |
|---|---|---|---|---|---|
| CHROM_1 | 22,976,894 | 16,246 | 2214 | 0.71 | 0.10 |
| CHROM_2 | 28,675,255 | 6842 | 1893 | 0.24 | 0.07 |
| CHROM_3 | 29,397,938 | 18,294 | 2630 | 0.62 | 0.09 |
| CHROM_4 | 27,056,416 | 12,813 | 2426 | 0.47 | 0.09 |
| CHROM_5 | 24,352,217 | 13,952 | 2150 | 0.57 | 0.09 |
| CHROM_6 | 30,516,430 | 50,463 | 3821 | 1.65 | 0.13 |
| CHROM_7 | 22,375,162 | 17,294 | 2361 | 0.77 | 0.11 |
| CHROM_8 | 21,952,264 | 12,610 | 2001 | 0.57 | 0.09 |
| CHROM_9 | 27,303,179 | 12,021 | 1986 | 0.44 | 0.07 |
| Unanchored scaffolds | 135,176,073 | 149,829 | 12,589 | 1.11 | 0.09 |
| Genome-wide | 369,781,828 | 0.84 | 0.09 |
Pattern of homozygous and heterozygous SNPs
| SNP pattern | Homo SNPs | Hetero SNPs | Total SNPs | |
|---|---|---|---|---|
| Transition | A/G | 5315 | 45,067 | 50,382 |
| T/C | 5768 | 44,871 | 50,639 | |
| G/A | 4701 | 47,543 | 52,244 | |
| C/T | 4908 | 47,160 | 52,068 | |
| Transversion | A/C | 2329 | 12,114 | 14,443 |
| A/T | 2327 | 11,999 | 14,326 | |
| T/A | 2310 | 12,199 | 14,509 | |
| T/G | 2274 | 12,193 | 14,467 | |
| G/C | 2509 | 6589 | 9098 | |
| G/T | 3020 | 11,576 | 14,596 | |
| C/A | 3104 | 11,522 | 14,626 | |
| C/G | 2306 | 6660 | 8966 | |
Fig. 1Histogram of InDels number and length in Sunset genome compared to SunUp reference genome
Fig. 2Annotation of single-nucleotide polymorphisms (SNPs) and InDels in Sunset genome compared to SunUp reference genome. a. Distribution of SNPs and InDels in intergenic, upstream and downstream regions. b. Distribution of SNPs in different genic regions. c. Distribution of InDels in genic regions. The number of synonymous and non-synonymous SNPs detected within the CDS region has also been shown
Prediction of the effects of SNPs and InDels
| Impact (count, percentage in Sunset) | Effect type | Count | Percentage (%) |
|---|---|---|---|
| frameshift_variant | 1033 | 0.1762 | |
| frameshift_variant+splice_region_variant | 66 | 0.0113 | |
| frameshift_variant+start_lost | 12 | 0.0020 | |
| frameshift_variant+stop_gained | 9 | 0.0015 | |
| frameshift_variant+stop_gained+splice_region_variant | 1 | 0.0002 | |
| frameshift_variant+stop_lost | 1 | 0.0002 | |
| frameshift_variant+stop_lost+splice_region_variant | 15 | 0.0026 | |
| splice_acceptor_variant+intron_variant | 75 | 0.0128 | |
| splice_acceptor_variant+splice_region_variant+intron_variant | 2 | 0.0003 | |
| splice_donor_variant+intron_variant | 87 | 0.0148 | |
| splice_donor_variant+splice_region_variant+intron_variant | 1 | 0.0002 | |
| start_lost | 24 | 0.0041 | |
| start_lost+splice_region_variant | 1 | 0.0002 | |
| stop_gained | 185 | 0.0316 | |
| stop_gained+disruptive_inframe_insertion | 1 | 0.0002 | |
| stop_gained+splice_region_variant | 6 | 0.0010 | |
| stop_lost | 23 | 0.0039 | |
| stop_lost+inframe_insertion+splice_region_variant | 1 | 0.0002 | |
| stop_lost+splice_region_variant | 48 | 0.0082 | |
| missense_variant+splice_region_variant | 130 | 0.0222 | |
| disruptive_inframe_deletion | 3 | 0.0005 | |
| disruptive_inframe_insertion | 7 | 0.0012 | |
| inframe_deletion | 17 | 0.0029 | |
| inframe_insertion | 22 | 0.0038 | |
| missense_variant | 7354 | 1.2544 | |
| initiator_codon_variant | 9 | 0.0015 | |
| splice_region_variant+intron_variant | 833 | 0.1421 | |
| splice_region_variant+stop_retained_variant | 13 | 0.0022 | |
| splice_region_variant+synonymous_variant | 100 | 0.0171 | |
| stop_retained_variant | 4 | 0.0007 | |
| synonymous_variant | 5155 | 0.8793 | |
| downstream_gene_variant | 128,197 | 21.8663 | |
| intergenic_region | 278,076 | 47.4308 | |
| intron_variant | 36,054 | 6.1497 | |
| upstream_gene_variant | 128,712 | 21.9541 |
Notes: Variants (SNPs and InDels) that may affect protein function were categorized into 35 types. These types were further grouped into HIGH, MODERATE, LOW, and MODIFIER according to potential severity. The assignment criteria were pre-defined in the annotation program (SNPEff)
Fig. 3Pipeline of SunUp-specific genomic integration of nuclear organelle DNA fragments. a. Quality control of raw sequenced data. b. Searches for SunUp nuclear organelle junction sites by BLASTN [25]. The BLASTN algorithm was used to search SunUp genome for nuclear plastid DNA (NUPT) and nuclear mitochondria DNA (NUMT) integrations with papaya organelle genomes as databases. Only hits with ≥30 bp mapped to organelle genomes were considered. c. Alignment between Sunset reads and SunUp reference genome. Unmapped reads were removed after subsequent analysis. d. Nuclear organelle junction sites shared by SunUp and Sunset. A junction site was supposed to be shared by SunUp and Sunset genomes when there were reads mapped to and spanning its position in the SunUp reference genome. e. Extraction of reliable shared junction sites. The mixture of reads that aligned back to the reference genome may originate from different sources of DNA in the Sunset genome, including nuclear DNA (nuDNA), nuclear organelle DNA (norgDNA) and organelle DNA (orgDNA). In order to discriminate these three categories of reads and extract the reliable junction sites shared by SunUp and Sunset, the flanking regions (5 bp upstream and downstream) of the junction sites are used as an indicator. Reliable norgDNA reads were selected if those reads were spanning the junction sites and mapped to at least 5 bp of norgDNA or nuDNA. f. Junction sites specific in SunUp. If there were no reads mapped to or no reliable norgDNA reads spanning the junction site, we considered this junction site as a SunUp-specific norgDNA junction site
Fig. 4Pipeline of Sunset-specific genomic integration of nuclear organelle DNA fragments. a. Alignment between Sunset reads and organelle reference genome. Unmapped reads were removed after subsequent analysis. Soft-clipped reads were shown in the red box, which refers to reads with mismatches at the extremities. b. Extraction of reads with at least 5 bp mismatches (≥5 bp) at the extremities. c. de novo assembly of norgDNA by SOAPdenovo. d. Extraction of reliable Sunset norgContigs. Only blast hits of norg contigs with ≥30 bp mapped to organelle genomes and ≥ 5 bp unmatched on the edges were considered as reliable norgContigs. e. Junction sites specific in Sunset. The Sunset-specific norg sequences were obtained when no hits were determined using BLAST against the SunUp reference genome. f. Identity between the six organelle-like borders of transgenic insertions in SunUp and Sunset norgDNA
Junction site numbers and identities of NUPT and NUMT
| Junction site type | NUPT | NUMT | ||||
|---|---|---|---|---|---|---|
| Count | Percentage | Identity (nupt/pt)a | Count | Percentage | Identity (numt/mt)a | |
| SunUp | 3430 | 100.00% | 2764 | 100.00% | ||
| Shared | 3327 | 97.00% | 91.92% | 2642 | 95.59% | 92.97% |
| Specific in SunUp | 103 | 3.00% | 94.03% | 122 | 4.41% | 93.77% |
| Sunset | 3346 | 100.00% | 2745 | 100.00% | ||
| Shared | 3327 | 99.43% | 91.92% | 2642 | 95.50% | 92.97% |
| Specific in Sunset | 19 | 0.57% | 95.64% | 103 | 4.50% | 96.95% |
Notes: (a): the identity between nupt/numt and corresponding organelle genome. Chloroplast (pt); mitochondria (mt)
The chromosome information for organelle DNA integration sites
| Chromosome | Specific junction sites in SunUp | |||
|---|---|---|---|---|
| NUPT | NUMT | |||
| CHROM_1 | 3 | 2.91% | 7 | 5.74% |
| CHROM_2 | 12 | 11.65% | 12 | 9.84% |
| CHROM_3 | 2 | 1.94% | 8 | 6.56% |
| CHROM_4 | 9 | 8.74% | 10 | 8.20% |
| CHROM_5 | 6 | 5.83% | 6 | 4.92% |
| CHROM_6 | 9 | 8.74% | 13 | 10.66% |
| CHROM_7 | 6 | 5.83% | 10 | 8.20% |
| CHROM_8 | 9 | 8.74% | 11 | 9.02% |
| CHROM_9 | 8 | 7.77% | 8 | 6.56% |
| Unanchored scaffolds | 39 | 48.75% | 37 | 30.33% |
| Total | 103 | 100.00% | 122 | 100.00% |
Fig. 5Workflow for the identification of the origin of the flanking norgDNA of transgenic inserts. a. Sequences of three SunUp transformation plasmid derived inserts with borders and the bwa alignment process. b. A strategy using high-throughput and massive paired-end mapping to identify deletions in Sunset relative to the reference genome. Insertions in SunUp were predicted from paired-end spans larger than a specified cutoff (size of a transgenic insert). c. Histogram plots exhibiting the inner distance of mapped paired-ends in regions of three inserts with borders
Comparative analysis of 6 organelle-like borders of 3 transgenic insertions
| Insertion | Border | Sequence type | Length (bp) | Identity with orgDNA (%) | Sunset matches | ||
|---|---|---|---|---|---|---|---|
| Identity with inserts (%) | Count | Combined length (bp) | |||||
| A | pt | 4000 | 100.00 | 97.01 | 12 | 4180 | |
| B | pt | 1790 | 99.94 | 99.09 | 6 | 1944 | |
| A | pt | 363 | 100.00 | 97.96 | 1 | 49 | |
| B | pt | 827 | 100.00 | 93.68 | 5 | 1231 | |
| A | pt | 6299 | 98.60 | 95.09 | 43 | 4242 | |
| B | mt | 1708 | 98.18 | 99.09 | 2 | 1738 | |
Notes: chloroplast (pt); mitochondria (mt)