| Literature DB >> 32883205 |
Sara Lado1, Jean P Elbers2, Mark F Rogers3, José Melo-Ferreira4,5, Adiya Yadamsuren6, Jukka Corander7,8,9, Petr Horin10,11, Pamela A Burger12.
Abstract
BACKGROUND: Immune-response (IR) genes have an important role in the defense against highly variable pathogens, and therefore, diversity in these genomic regions is essential for species' survival and adaptation. Although current genome assemblies from Old World camelids are very useful for investigating genome-wide diversity, demography and population structure, they have inconsistencies and gaps that limit analyses at local genomic scales. Improved and more accurate genome assemblies and annotations are needed to study complex genomic regions like adaptive and innate IR genes.Entities:
Keywords: Chromosome conformation capture; Chromosome mapping; Dromedary; Genetic diversity; Genome annotation; Genome assembly; Immune response genes; Scaffolding
Mesh:
Substances:
Year: 2020 PMID: 32883205 PMCID: PMC7468183 DOI: 10.1186/s12864-020-06990-4
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Frequency polygons of query sequence length (predicted proteins) divided by subject (UniProt/TrEMBL) sequence length for DIAMOND [13] mapped MAKER [14] predicted proteins against UniProt/TrEMBL release 2018_07 database for: (red line) the original North African dromedary genome (CamDro1), ([8]; GenBank accession: GCA_000803125.1); (green line) the North African dromedary genome after adding ~11x PacBio sequencing reads (CamDro2); and (blue line) CamDro3
Fig. 2Cumulative proportion of transcripts with specific or lower annotation edit distance (AED) for CamDro2 (solid line) and CamDro3 (dashed line). CamDro2 had AED ≤ 0.50 for 78.4% transcripts, whilst MAKER run 2 had 79.1% transcripts with AED ≤ 0.50. Note that having a larger proportion of lower AED values indicates a genome annotation that is more congruent with the evidence used during the annotation process
Assembly statistics for the CamBac1 (GCF_000767855.1) and CamFer1 (GCF_000311805.1) and after improvement (CamBac2 and CamFer2, respectively) with reference-guided assembly with Ragout [16] using Progressive Cactus [17] alignments to CamDro3 then filling in gaps with GapFiller [18]
| Assembly | ||||
|---|---|---|---|---|
| CamBac1 | CamBac2 | CamFer1 | CamFer2 | |
| Total size | 1,992,663,268 | 2,039,590,309 | 2,009,194,609 | 2,086,258,888 |
| Gap length | 13,666,687 | 57,965,943 | 23,778,176 | 99,159,843 |
| Number | 35,455 | 33,593 | 13,334 | 9158 |
| Longest | 46,538,883 | 122,729,119 | 15,735,958 | 123,639,755 |
| N90a | 1,821,536 | 24,994,512 | 341,469 | 25,431,863 |
| L90b | 255 | 29 | 1167 | 30 |
| N50a | 8,812,066 | 68,446,253 | 2,005,940 | 69,671,486 |
| L50b | 68 | 11 | 274 | 11 |
| Number | 67,435 | 56,044 | 68,872 | 66,352 |
| Longest | 1,143,031 | 2,938,098 | 853,441 | 1,096,594 |
| N90 | 29,656 | 43,365 | 16,267 | 16,886 |
| L90 | 15,603 | 10,214 | 25,475 | 23,951 |
| N50 | 139,019 | 219,031 | 90,263 | 97,198 |
| L50 | 3963 | 2415 | 5814 | 5272 |
| Single-copy BUSCOsd | 3827 | 3835 | 3796 | 3816 |
| Duplicated BUSCOs | 22 | 18 | 48 | 32 |
| Fragmented BUSCOs | 164 | 157 | 175 | 168 |
| Missing BUSCOs | 91 | 94 | 85 | 88 |
aN90/N50 are the scaffold or contig lengths such that the sum of the lengths of all scaffolds or contigs of this size or larger is equal to 90/50% of the total assembly length
bL90/L50 are the smallest number of scaffolds or contigs that make up at least 90/50% of the total assembly length
cUsing minimum gap length of 10 bp
dBUSCOs: Benchmarking Universal Single-Copy Orthologs [19] are mammalian BUSCOs from OrthoDB v. 9.1 genes [20]
Mean coverage and number of different types of variants per sample. DC for domestic Bactrian camel (Camelus bactrianus), Drom for dromedary (Camelus dromedarius), and WC for wild camel (Camelus ferus). SD for standard deviation
| Sample | Mean Coverage | Total_SNPs | Synonymous SNPs | Non-synonymous SNPs | Insertions | Deletions |
|---|---|---|---|---|---|---|
| DC158 | 41.42 | 3,713,662 | 16,761 | 18,352 | 258,367 | 237,987 |
| DC269 | 14.25 | 3,238,412 | 14,206 | 15,473 | 230,164 | 205,242 |
| DC399 | 13.80 | 3,199,637 | 14,370 | 16,112 | 226,223 | 199,701 |
| DC400 | 14.54 | 3,213,008 | 14,130 | 15,608 | 226,945 | 200,953 |
| DC402 | 14.84 | 3,130,745 | 13,756 | 15,296 | 218,205 | 193,720 |
| DC408 | 15.11 | 3,328,223 | 14,592 | 16,693 | 234,064 | 209,759 |
| DC423 | 14.46 | 3,738,504 | 17,182 | 17,866 | 250,856 | 227,449 |
| Drom439 | 14.30 | 1,929,784 | 8528 | 9135 | 163,100 | 147,765 |
| Drom795 | 11.78 | 1,907,261 | 8600 | 9679 | 186,969 | 158,190 |
| Drom796 | 14.23 | 1,991,649 | 8476 | 9193 | 170,719 | 156,795 |
| Drom797 | 13.76 | 1,992,724 | 8945 | 9576 | 178,917 | 160,938 |
| Drom800 | 40.73 | 1,500,998 | 6844 | 7255 | 140,148 | 122,312 |
| Drom802 | 14.59 | 2,006,825 | 9311 | 10,122 | 188,392 | 166,360 |
| Drom806 | 9.52 | 1,854,989 | 7944 | 8692 | 164,993 | 149,508 |
| Drom816 | 10.33 | 1,929,982 | 8476 | 9263 | 173,380 | 154,757 |
| Drom820 | 9.66 | 1,881,945 | 7694 | 8162 | 167,680 | 152,220 |
| WC214 | 14.43 | 2,517,749 | 9919 | 10,071 | 157,630 | 162,297 |
| WC216 | 12.86 | 2,654,274 | 11,040 | 10,871 | 170,009 | 176,405 |
| WC218 | 14.22 | 1,825,617 | 7396 | 8026 | 109,795 | 107,655 |
| WC219 | 14.04 | 2,707,996 | 11,187 | 11,038 | 173,685 | 179,297 |
| WC220 | 14.92 | 2,707,716 | 11,067 | 10,982 | 170,579 | 179,365 |
| WC247 | 14.06 | 2,956,856 | 11,567 | 11,235 | 189,010 | 196,986 |
| WC303 | 41.54 | 2,937,692 | 11,625 | 11,313 | 189,408 | 204,838 |
| WC304 | 14.67 | 2,748,380 | 11,047 | 10,844 | 180,435 | 186,048 |
| WC305 | 14.05 | 2,704,263 | 10,599 | 10,520 | 176,820 | 181,412 |
| Drom mean | 15.43 | 1,888,462 | 8313 | 9009 | 170,478 | 152,094 |
| Drom SD | 9.7 | 154,355 | 729 | 867 | 14,512 | 12,552 |
| DC mean | 18.35 | 3,366,027 | 15,000 | 16,486 | 234,975 | 210,687 |
| DC SD | 10.2 | 252,904 | 1376 | 1210 | 14,409 | 16,125 |
| WC mean | 17.20 | 2,640,060 | 10,605 | 10,544 | 168,597 | 174,923 |
| WC SD | 9.1 | 334,004 | 1307 | 1017 | 24,154 | 28,002 |
Fig. 3Means with 95% bootstrap confidence intervals (see Methods) of nucleotide diversity for alignments made with non-synonymous and synonymous SNPs and indels (a) and only non-synonymous SNPs (b) for: dromedary (C. dromedarius; top panel), domestic Bactrian camel (C. bactrianus; middle panel), and wild camel (C. ferus; bottom panel) gene groups. AD for adaptive genes, IN for innate genes, MHC for MHC class I and II genes, and RG for rest-of-genome genes. Rest-of-genome genes are those not classified as adaptive or innate genes (see Methods). Uppercase letters above upper 95% confidence limits indicate groups have different (non-matching letters) or not different (matching letters) means based on non-overlapping confidence intervals
Means with 95% bootstrap confidence limits (CL, see Methods) of nucleotide diversity for alignments made with non-synonymous and synonymous SNPs and indels and only non-synonymous SNPs for: DROM (dromedary; Camelus dromedarius), DC (domestic Bactrian camel; Camelus bactrianus), and WC (wild camel; Camelus ferus) gene groups. AD for adaptive genes, IN for innate genes, MHC for MHC class I and II genes, and RG for rest of genome genes. Rest-of-genome-genes correspond to those genes which are not classified as adaptive or innate IR genes (see Methods)
| Variant type | Species | Gene groups | Mean | 95% lower CL | 95% upper CL |
|---|---|---|---|---|---|
| SNPs and indels | DROM | MHC | 6.26E-04 | 1.83E-04 | 9.65E-04 |
| SNPs and indels | DROM | AD | 8.81E-05 | 5.70E-05 | 1.14E-04 |
| SNPs and indels | DROM | IN | 6.81E-05 | 4.74E-05 | 8.49E-05 |
| SNPs and indels | DROM | RG | 6.55E-05 | 6.22E-05 | 6.87E-05 |
| SNPs and indels | DC | MHC | 1.35E-03 | 5.58E-04 | 2.04E-03 |
| SNPs and indels | DC | AD | 2.97E-04 | 2.11E-04 | 3.64E-04 |
| SNPs and indels | DC | IN | 1.94E-04 | 1.61E-04 | 2.23E-04 |
| SNPs and indels | DC | RG | 1.66E-04 | 1.60E-04 | 1.71E-04 |
| SNPs and indels | WC | MHC | 2.73E-04 | 9.06E-06 | 4.77E-04 |
| SNPs and indels | WC | AD | 1.06E-04 | 4.52E-05 | 1.51E-04 |
| SNPs and indels | WC | IN | 8.36E-05 | 5.45E-05 | 1.08E-04 |
| SNPs and indels | WC | RG | 6.71E-05 | 6.24E-05 | 7.13E-05 |
| Non synonymous SNPs | DROM | MHC | 1.72E-04 | -7.09E-05 | 3.22E-04 |
| Non synonymous SNPs | DROM | AD | 1.58E-05 | −8.83E-06 | 2.80E-05 |
| Non synonymous SNPs | DROM | IN | 4.79E-06 | 1.29E-06 | 7.42E-06 |
| Non synonymous SNPs | DROM | RG | 1.28E-05 | 1.13E-05 | 1.42E-05 |
| Non synonymous SNPs | DC | MHC | 2.07E-04 | 6.94E-05 | 3.27E-04 |
| Non synonymous SNPs | DC | AD | 2.63E-05 | 1.04E-05 | 3.80E-05 |
| Non synonymous SNPs | DC | IN | 2.26E-05 | 9.31E-06 | 3.17E-05 |
| Non synonymous SNPs | DC | RG | 2.97E-05 | 2.70E-05 | 3.25E-05 |
| Non synonymous SNPs | WC | MHC | 7.23E-05 | −1.52E-05 | 1.45E-04 |
| Non synonymous SNPs | WC | AD | 2.61E-05 | −2.17E-06 | 4.37E-05 |
| Non synonymous SNPs | WC | IN | 1.23E-05 | 4.52E-06 | 1.87E-05 |
| Non synonymous SNPs | WC | RG | 1.72E-05 | 1.45E-05 | 1.99E-05 |
Assembly statistics for the CamDro2; CamDro3 (Pilon) using one round of Pilon [51] for polishing; and CamDro3 (BBMap) using one round of variant calling with BBMap (https://sourceforge.net/projects/bbmap/) for polishing. Note that CamDro3 (BBMap) was chosen over CamDro3 (Pilon) as the final version of CamDro3 because of better BUSCO and RNA-Seq mapping percentages
| Assembly | |||
|---|---|---|---|
| CamDro2 | CamDro3 | CamDro3 | |
| Total size | 2,154,386,959 | 2,194,229,671 | 2,169,346,739 |
| Gap length | 20,603,579 | 17,930,821 | 17,043,352 |
| Number | 23,439 | 21,070 | 21,070 |
| Longest | 124,992,380 | 125,472,505 | 124,715,342 |
| N90a | 4,922,612 | 25,062,887 | 24,767,672 |
| L90b | 31 | 32 | 32 |
| N50a | 75,021,453 | 70,557,636 | 70,369,702 |
| L50b | 11 | 12 | 11 |
| Number | 45,969 | 41,934 | 53,085 |
| Longest | 9,490,880 | 14,412,615 | 2,012,572 |
| N90 | 177,587 | 202,272 | 49,444 |
| L90 | 1944 | 1436 | 10,023 |
| N50 | 1,333,162 | 1,961,815 | 236,380 |
| L50 | 423 | 303 | 2637 |
| Single-copy BUSCOsd | 3851 | 3853 | 3852 |
| Duplicated BUSCOs | 24 | 23 | 25 |
| Fragmented BUSCOs | 133 | 132 | 134 |
| Missing BUSCOs | 96 | 96 | 93 |
| RNA-Seq Mapping Percentagee | 88.30 | 90.36 | 92.04 |
aN90/N50 are the scaffold or contig lengths such that the sum of the lengths of all scaffolds or contigs of this size or larger is equal to 90/50% of the total assembly length
bL90/L50 are the smallest number of scaffolds or contigs that make up at least 90/50% of the total assembly length
cUsing minimum gap length of 25 bp
dBUSCOs: Benchmarking Universal Single-Copy Orthologs [19] are mammalian BUSCOs from OrthoDB v. 9.1 genes [20]
eOverall mapping rates using HiSat v. 2.1.0 [53] of dromedary RNA-Seq reads from Sequence Read Archive accession: SRP017619 and Alim et al. [54]