| Literature DB >> 31554715 |
Thu-Phuong Nguyen1, Cornelia Mühlich2, Setareh Mohammadin1, Erik van den Bergh1, Adrian E Platts3, Fabian B Haas2, Stefan A Rensing4,5, M Eric Schranz6.
Abstract
The genus Aethionema is a sister-group to the core-group of the Brassicaceae family that includes Arabidopsis thaliana and the Brassica crops. Thus, Aethionema is phylogenetically well-placed for the investigation and understanding of genome and trait evolution across the family. We aimed to improve the quality of the reference genome draft version of the annual species Aethionema arabicum Second, we constructed the first Ae. arabicum genetic map. The improved reference genome and genetic map enabled the development of each other. We started with the initially published genome (version 2.5). PacBio and MinION sequencing together with genetic map v2.5 were incorporated to produce the new reference genome v3.0. The improved genome contains 203 MB of sequence, with approximately 94% of the assembly made up of called (non-gap) bases, assembled into 2,883 scaffolds (with only 6% of the genome made up of non-called bases (Ns)). The N50 (10.3 MB) represents an 80-fold increase over the initial genome release. We generated a Recombinant Inbred Line (RIL) population that was derived from two ecotypes: Cyprus and Turkey (the reference genotype. Using a Genotyping by Sequencing (GBS) approach, we generated a high-density genetic map with 749 (v2.5) and then 632 SNPs (v3.0) was generated. The genetic map and reference genome were integrated, thus greatly improving the scaffolding of the reference genome into 11 linkage groups. We show that long-read sequencing data and genetics are complementary, resulting in an improved genome assembly in Ae. arabicum They will facilitate comparative genetic mapping work for the Brassicaceae family and are also valuable resources to investigate wide range of life history traits in Aethionema.Entities:
Keywords: Aethionema arabicum; Brassicaceae; Genotyping by Sequencing; MinION; PacBio; genetic map; genome improvement
Mesh:
Year: 2019 PMID: 31554715 PMCID: PMC6829135 DOI: 10.1534/g3.119.400657
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Overview of the analyses performed in this study. In filled boxes are data sets, approaches and companying tools are in open boxes.
(Mérai et al. 2019): Overview of the Aethionema arabicum PacBio reads
| Total reads | Cyprus | Turkey | |
|---|---|---|---|
| Number of reads | 381,069 | 152,415 | 228,654 |
| Length variation | 11 - 57,910 | 11 - 55,919 | 11 - 57,910 |
| Average length | 5,845 | 5,795 | 5,879 |
| Average quality | 10.5 | 10.1 | 10.7 |
The lengths are given in nucleotides and the quality as phred score.
Figure 2Problem arising from applying PBjelly2 on vAM. Scaffold borders are visualized in blue and extensions of scaffolds introduced by PBjelly2 are shown in brown. Assuming the true order of the scaffolds is shown on top of the figure, but scaffold X and scaffold Z were already combined in the vAM assembly (second bar from top) this could lead to a partial filling of the N-stretch and maybe an extension of scaffold Y. However, PBjelly2 would not be able to place scaffold Y between the two other scaffolds (middle bar). If the scaffolds were thus split again (second bar from bottom), it is possible that the connections are made correctly applying PBjelly2 on the split version (bottom bar). This only visualizes a theoretical case, in this work it appeared every time that scaffold X and Y were connected by PBjelly2 and scaffold Z had to be reconnected afterward.
Overview of gene liftover: GFF migration statistics
| Lifted only by flo | 4,346 |
| Lifted only by GeMoMa | 36 |
| Lifted with both programs | 18,177 |
| Manually lifted | 34 |
| Partially lifted | 14 |
| Number of corrected CDS | 10,259 |
| Marked as pseudo | 3,230 |
Statistic overview of Aethionema arabicum genome versions
| Genome version | Draft | v2.5 | vAM | v2.6 | v3.0 |
|---|---|---|---|---|---|
| # Bases | 196,005,095 | 196,022,695 | 203,150,143 | 203,449,326 | |
| # Scaffolds | 59,101 | 3,166 | 2,990 | 2,895 | 2,883 |
| # Scaffolds containing Ns | 1,910 | 1,734 | 1,542 | 1,539 | |
| # Ns | 25,768,296 | 25,785,896 | 13,946,922 | 13,790,434 | |
| N50 | 115,195 | 564,741 | 10,141,718 | 10,328,388 | 10,328,388 |
| L50 | 56 | 9 | 9 | 9 |
The vAM assembly includes all scaffolds; a total of 199 of the 3,166 v2.5 scaffolds were scaffolded via the genetic map into the 11 Linkage Groups (LG) of vAM (included in the 2,990 scaffolds), the 11 LG comprise 125,484,166 bp (64% of vAM).
Mapping efficiency of PBjelly2’s mapping step. The percentages in brackets give the percentage of the total number of reads (CYP, TUR or CYP + TUR). The line “# covered scaffolds” gives the number of scaffolds in which at least one read was mapped. Here, the number in brackets gives the percentage of the total number of scaffolds
| Setup | PacBio | PacBio | MinION | MinION |
|---|---|---|---|---|
| 198,675 (86.9%) | 198,629 (86.9%) | 14,098 (45.6%) | 15,886 (51.4%) | |
| 131,976 (86.6%) | 131,942 (86.6%) | — | — | |
| 330,651 (86.8%) | 330,571 (86.8%) | 14,098 (45.6%) | 15,886 (51.4%) | |
| 50,371 (13.2%) | 50,451 (13.2%) | 16,837 (54.4%) | 15,049 (48.6%) | |
| 3,166 | 2,990 | 3,166 | 2,895 | |
| 2,971 (93.8%) | 2.804 (93.8%) | 1.689 (53.3%) | 1.429 (49.4%) |
Overview of the PBjelly2 result statistics for the different setups
| Setup | v2.5 assembly | PacBio | MinION |
|---|---|---|---|
| 3,166 | 3,066 | 3,123 | |
| 196,005,095 | 203,024,676 | 196,600,700 | |
| 564,741 | 542,490 | 564,741 | |
| 56 | 58 | 56 |
Gap/N analysis of different genome versions
| Setup | v2.5 | PacBio | MinION |
|---|---|---|---|
| 1,910 (60.3%) | 1,711 (56.0%) | 1,901 (60.0%) | |
| 25,768,296 (13.2%) | 13,940,203 (7.1%) | 25,142,571 (12.8%) |
Figure 3Aethionema arabicum genetic map v2.5. Genetic map version 2.5 consists of eleven linkage groups. On each linkage group, genetic distance in cM is present on the left and SNP markers on the right.
Figure 4Aethionema arabicum genetic map v3.0. Genetic map version 3.0 consists of eleven linkage groups. On each linkage group, genetic distance in cM is present on the left and SNP markers on the right.
Figure 5The alignment of genetic map v2.5, v3.0 and physical map. The alignment of the genetic map v2.5 and v3.0 were based on relative SNPs. The left ruler indicates genetic distance in cM and the right indicates physical distance in bp according to genome v3.0.