| Literature DB >> 31480756 |
Boas Pucker1, Christian Rückert2, Ralf Stracke3, Prisca Viehöver3, Jörn Kalinowski2, Bernd Weisshaar3.
Abstract
Arabidopsis thaliana is one of the best studied plant model organisms. Besides cultivation in greenhouses, cells of this plant can also be propagated in suspension cell culture. At7 is one such cell line that was established about 25 years ago. Here, we report the sequencing and the analysis of the At7 genome. Large scale duplications and deletions compared to the Columbia-0 (Col-0) reference sequence were detected. The number of deletions exceeds the number of insertions, thus indicating that a haploid genome size reduction is ongoing. Patterns of small sequence variants differ from the ones observed between A. thaliana accessions, e.g., the number of single nucleotide variants matches the number of insertions/deletions. RNA-Seq analysis reveals that disrupted alleles are less frequent in the transcriptome than the native ones.Entities:
Keywords: copy number variations; long read sequencing; next generation sequencing; variant calling
Mesh:
Year: 2019 PMID: 31480756 PMCID: PMC6770967 DOI: 10.3390/genes10090671
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Genome-wide coverage of the Columbia-0 (Col-0) reference sequence. Hemizygous regions in At7 were revealed by a read coverage of approximately 50-fold when combining Illumina and ONT sequencing reads. Different multiples of this values can be observed, revealing the presence of large scale multiplications.
Figure 2Genome-wide distribution of small sequence variants between At7 and Col-0. Homozygous variants (magenta) and variants with deviating frequency were counted in genomic blocks of 100 kb on all five chromosomes of the Col-0 reference sequence.
Figure 3Illumina sequencing coverage depth at variant positions. Only Illumina reads were considered in this analysis, because the variant calling was limited to this set of high quality sequences. Therefore, the average coverage of 25-fold is about half the coverage observed for the complete sequence read data set.
Figure 4Gene copy numbers. Read coverage per gene was calculated as proxy for the copy number of the respective gene. Magenta lines indicate the central position of coverage valleys enclosed by peaks. Distances between these coverage valleys are not identical due to differences in peak height and resulting width differences. Absolute coverage values might be underestimated due to the removal of read pairs, which appeared as the result of PCR duplicates. Values above 400 are included in the largest bin to allow accommodation in one figure.
Figure 5Allele specific transcript abundance at high impact variant positions. The number of RNA-Seq reads supporting the reference (Col-0) and alternative allele, respectively, were determined at high impact variant positions.