| Literature DB >> 29597328 |
Sergei Kliver1,2, Mike Rayko3, Alexey Komissarov4, Evgeny Bakin5, Daria Zhernakova6, Kasavajhala Prasad7, Catherine Rushworth8, R Baskar9, Dmitry Smetanin10, Jeremy Schmutz11,12, Daniel S Rokhsar13, Thomas Mitchell-Olds14, Ueli Grossniklaus15, Vladimir Brukhin16,17.
Abstract
Closely related to the model plant Arabidopsis thaliana, the genus Boechera is known to contain both sexual and apomictic species or accessions. Boechera retrofracta is a diploid sexually reproducing species and is thought to be an ancestral parent species of apomictic species. Here we report the de novo assembly of the B. retrofracta genome using short Illumina and Roche reads from 1 paired-end and 3 mate pair libraries. The distribution of 23-mers from the paired end library has indicated a low level of heterozygosity and the presence of detectable duplications and triplications. The genome size was estimated to be equal 227 Mb. N50 of the assembled scaffolds was 2.3 Mb. Using a hybrid approach that combines homology-based and de novo methods 27,048 protein-coding genes were predicted. Also repeats, transfer RNA (tRNA) and ribosomal RNA (rRNA) genes were annotated. Finally, genes of B. retrofracta and 6 other Brassicaceae species were used for phylogenetic tree reconstruction. In addition, we explored the histidine exonuclease APOLLO locus, related to apomixis in Boechera, and proposed model of its evolution through the series of duplications. An assembled genome of B. retrofracta will help in the challenging assembly of the highly heterozygous genomes of hybrid apomictic species.Entities:
Keywords: Boechera; Brassicaceae; annotation; apomixis; assembly; genome
Year: 2018 PMID: 29597328 PMCID: PMC5924527 DOI: 10.3390/genes9040185
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Plant (a) and flower (b) of Boechera retrofracta.
Sequencing scheme of Boechera retrofracta genome.
| ID | Library Type | Platform | Read Length | Mean Insert Size (bp) | Number of Reads Pairs |
|---|---|---|---|---|---|
| LIB400 | paired ends | Illumina | 250 | 402 | 189788627 |
| LIB4000R | mate pairs | Roche | - | 4014 | 3259085 |
| LIB5000 | mate pairs | Illumina | 150 | 4877 | 19083787 |
| LIB7000 | mate pairs | Illumina | 150 | 6882 | 34066282 |
| LIB24000R | mate pairs | Roche | - | 24,332 | 672098 |
| BES | BAC end sequencing | Sanger | - | 147,708 | 17775 |
Abbreviations: BAC, bacterial artificial chromosome; BES, BAC end sequencing.
General statistics for all stages of the assembly pipeline.
| Parameter | Contigs | Extended Contigs | Raw Scaffolds | Intermediate Scaffolfs | Gap Closed Scaffolds | Final Scaffolds |
|---|---|---|---|---|---|---|
| Longest contig | 791,985 | 792,340 | 8,101,256 | 9,045,706 | 9,049,080 | 9,049,080 |
| Ns | 28,100 | 28,100 | 11,890,519 | 16,366,994 | 12,409,189 | 12,409,189 |
| Total length | 225,649 216 | 226,402,628 | 236,469,041 | 240,945,496 | 241,014,839 | 222,253,471 |
Figure A1Pipeline used to assembly genome of Boechera retrofracta.
Figure 4Phylogenetic tree of the isoforms of APOLLO locus (exonuclease NEN) in seven species of interest and alleles of APOLLO locus of apomictic Boechera species from Corral et al (2013) [56]. Sequences of Populus trichocarpa, Vitus vinifera and Glycine max were used as outgroup. The clade related to the APOLLO locus is shown in green, with apo-alleles shown in red. Numbers near nodes represent corresponding bootstrap support.
Figure 2Distribution of 23-mers for PE LIB400 library. Only one major peak at 371× coverage is present, however there are detectable duplications and triplications at 737× and 1120× coverage (upper plot, Y axis is on a logarithmic scale).
N50 values for all stages of the assembly pipeline and several different cutoffs for minimal scaffold length.
| Scaffold Length Cutoff | Contigs | Extended Contigs | Raw Scaffolds | Intermediate Scaffolfs | Gap Closed Scaffolds | Final Scaffolds |
|---|---|---|---|---|---|---|
| all | 85,286 | 84,648 | 1,256,534 | 1,898,006 | 1,898,985 | 2,297,899 |
| ≥100 | 85,286 | 84,648 | 1,256,534 | 1,898,006 | 1,898,985 | 2,297,899 |
| ≥250 | 101,388 | 100,393 | 1,442,421 | 2,296,484 | 2,297,899 | 2,297,899 |
| ≥500 | 115,732 | 115,486 | 1,538,795 | 2,678,857 | 2,680,941 | 2,680,941 |
| ≥1000 | 122,300 | 121,678 | 1,704,064 | 2,678,857 | 2,680,941 | 2,680,941 |
Repeats found by RepeatMasker.
| Class | Number of Elements | Total Length (bp) | Fraction of Assembly (%) |
|---|---|---|---|
| SINEs | 577 | 125,298 | 0.06 |
| LINEs | 7075 | 4,351,241 | 1.96 |
| LTR elements | 51,040 | 40,608,195 | 18.27 |
| DNA elements | 31,638 | 12,868,684 | 5.79 |
| Unclassified | 82,693 | 24,363,135 | 10.96 |
| Total interspersed repeats | - | 82,316,553 | 37.04 |
| Small RNA | 5461 | 1,599,354 | 0.72 |
| Satellites | 1541 | 573,026 | 0.26 |
| Simple repeats | 2044 | 363,642 | 0.16 |
| Low complexity | 56 | 7456 | 0 |
Results of repeat masking performed by three different tools: RepeatMasker [32], TRF [34], WindowMasker [35].
| Tool | Number of Repeats | Total Length (Mbp) |
|---|---|---|
| RepeatMasker | 173,023 | 82.31 |
| TRF | 100,593 | 17.41 |
| Windowmasker | 1,104,650 | 64.20 |
Annotated transfer RNAs (tRNAs).
| tRNA Type | Number |
|---|---|
| tRNAs decoding standard 20 AA | 1126 |
| Selenocysteine tRNAs (TCA) | 0 |
| Possible suppressor tRNAs (CTA,TTA) | 3 |
| tRNAs with undetermined isotypes | 5 |
| Resolution of Brassicaceae Phylogeny Using Nuclear Genes | 32 |
| Total tRNAs | 1166 |
Annotated ribosomal RNAs (rRNAs).
| rRNA | Complete (≥80% of Expected Length) | Partial (<80% of Expected Length) |
|---|---|---|
| 5.8S | 178 | 53 |
| 5S | 601 | 104 |
| 28S | 0 | 1782 |
| 18S | 1 | 1458 |
| 12S | 0 | 173 |
| 16S | 0 | 607 |
Figure 3Phylogenetic tree of seven Brassicaceae species used for analysis. Maximum likelihood tree was reconstructed by RAxML using 8959 single copy orthologs and was tested with 1000 bootstrap replicates. Numbers near nodes represent corresponding bootstrap support.
Comparison of genome characteristics of Boechera retrofracta with previously sequenced Boechera stricta and Arabidopsis thaliana genomes. Source for B.retrofracta—this paper, B.stricta, Arabidopsis lyrata and A.thaliana—Phytozome v12.1 database [57].
| Total length | 227 M | 184 M | 207 Mb | 135 Mb |
| Chromosomes | ||||
| Protein-coding loci | 27,048 | 27,416 | 31,073 | 27,416 |
| Transcripts | 28,269 | 29,812 | 33,132 | 35,386 |