| Literature DB >> 31821505 |
Arnab Ghosh1, Matthew G Johnson1, Austin B Osmanski1, Swarnali Louha2, Natalia J Bayona-Vásquez2, Travis C Glenn2, Jaime Gongora3, Richard E Green4, Sally Isberg3,5, Richard D Stevens6, David A Ray1.
Abstract
Crocodilians are an economically, culturally, and biologically important group. To improve researchers' ability to study genome structure, evolution, and gene regulation in the clade, we generated a high-quality de novo genome assembly of the saltwater crocodile, Crocodylus porosus, from Illumina short read data from genomic libraries and in vitro proximity-ligation libraries. The assembled genome is 2,123.5 Mb, with N50 scaffold size of 17.7 Mb and N90 scaffold size of 3.8 Mb. We then annotated this new assembly, increasing the number of annotated genes by 74%. In total, 96% of 23,242 annotated genes were associated with a functional protein domain. Furthermore, multiple noncoding functional regions and mappable genetic markers were identified. Upon analysis and overlapping the results of branch length estimation and site selection tests for detecting potential selection, we found 16 putative genes under positive selection in crocodilians, 10 in C. porosus and 6 in Alligator mississippiensis. The annotated C. porosus genome will serve as an important platform for osmoregulatory, physiological, and sex determination studies, as well as an important reference in investigating the phylogenetic relationships of crocodilians, birds, and other tetrapods.Entities:
Keywords: zzm321990 Crocodylus porosuszzm321990 ; evolution; selection
Year: 2020 PMID: 31821505 PMCID: PMC6946029 DOI: 10.1093/gbe/evz269
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Quality Statistics for Available Assemblies of C. porosus, including Our Draft and the Current HiRise Assembly
| Cpor_2.0 AllPaths-LGG CA_000768395 | CroPor_comp1 Ragout GCA_001723895 | Cpor_3.0 Chicago-HiRise MDVP00000000 | |
|---|---|---|---|
| Total length (Mb) | 2,120.6 | 2,049.5 | 2,125.62 |
| Scaffold N50 (Mb) | 0.205 | 84.4 | 7.6 |
| Scaffold L50 | 2,891 | 7 | 87 |
| Scaffold N90 (Mb) | 0.051 | 18.28 | 1.8 |
| Scaffold L90 | 10,845 | 26 | 300 |
| Longest scaffold (Mb) | 2.117 | 270.7 | 33.35 |
| Number of scaffolds | 23,365 | 70 | 2,430 |
| Number of scaffolds >1 kb | 23,296 | 70 | 2,361 |
| Contig N50 (kb) | 32.7 | 34.1 | 32.9 |
| Contig L50 | 18,929 | 17,096 | 18,837 |
| Number of contigs | 112,407 | 97,109 | 112,088 |
BUSCO Summary Stats When Searching for 3,950 Orthologous Genes from Tetrapods
| Cpor_2.0 | CroPor_comp1 | Cpor_3.0 | |
|---|---|---|---|
| Total BUSCOs (comp + frag) | 3,723 (94.3) | 3,682 (93.2) | 3,808 (96.4) |
| Complete single copy | 3,338 (84.5) | 3,435 (87.0) | 3,535 (89.5) |
| Complete duplicated | 22 (0.6) | 20 (0.5) | 23 (0.6) |
| Total complete BUSCOs | 3,360 (85.1) | 3,455 (87.5) | 3,558 (90.1) |
| Fragmented BUSCOs | 385 (9.7) | 247 (6.3) | 250 (6.3) |
| Missing BUSCOs | 205 (5.2) | 248 (6.2) | 142 (3.6) |
Note.—The percentage of genes relative to the total in the database are given in parentheses.
. 1.—Synteny analyses between our Chicago-HiRise assembly and the highly contiguous Ragout assembly from Rice et al. (2017). (A) Jupiter plot of correspondence between assemblies considering the total length of both reference and query genomes. (B) Dot plot (MUMmer plot) of the percent identity in the alignment generated by MUMmer. The blue dots along the slope demonstrate that both assemblies are highly colinear. Blue dots represent forward matches and purple dots represent reverse matches.
. 2.—Representation of total number of unique genes as percentage of their corresponding AED scores as analyzed by MAKER2 pipeline form the Crocodylus porosus genome assembly.
. 3.—Histogram of the branch length ratio of Alligator mississippiensis and Crocodylus porosus with chicken as the outgroup. The two tails of the histogram correspond to the 2.5% of the genes in the A. mississippiensis and C. porosus, respectively, that are under potential selection. Vertical lines indicate the 2.5% cutoff limits in the histogram.
. 4.—Histogram of dn/ds values for all genes of Crocodylus porosus using the M0 model with Alligator mississippiensis. A majority of the 2,357 single-copy orthologous genes are expectedly under purifying selection.
List of 16 Genes under Potential Selection (and Overlap of Two Selection Tests) in C. porosus and A. mississippiensis
| Query | Species | Abbreviated | Annotation |
|---|---|---|---|
| amisp005461 |
| CHST7 | Carbohydrate sulfotransferase 7 |
| amisp005516 |
| HOXC13 | Homeobox protein Hox-C13a-like |
| amisp016775 |
| IFT122 | Intraflagellar transport protein 122 homolog isoform X1 |
| amisp017613 |
| JMJD4-1 | jmjC domain-containing protein 4 |
| amisp032461 |
| RNF126 | E3 ubiquitin-protein ligase RNF126 |
| amisp034033 |
| POLDIP3 | Polymerase delta-interacting protein 3 |
| cPor_01965-RA |
| Zinc finger protein 143 | |
| cPor_06982-RA |
| GALK1 | Galactokinase |
| cPor_09447-RA |
| CTBP2 | C-terminal-binding protein 2-like isoform X1; belongs to the |
| cPor_11586-RA |
| KCNK10 | Potassium channel subfamily K member 10; belongs to the two pore domain potassium channel (TC 1.A.1.8) family |
| cPor_15403-RA |
| TLX1-1 | T-cell leukemia homeobox protein 1 |
| cPor_15737-RA |
| SLC25A17 | Peroxisomal membrane protein PMP34; belongs to the mitochondrial carrier (TC 2.A.29) family |
| cPor_15755-RA |
| NPTXR | Neuronal pentraxin receptor |
| cPor_18091-RA |
| GRAMD2 | GRAM domain-containing protein 2 |
| cPor_18867-RA |
| HNRNPLL | Heterogeneous nuclear ribonucleoprotein L like |
| cPor_19471-RA |
| UBXN2B | UBX domain-containing protein 2B |
. 5.—Representation of gene networking pathways for 16 genes found in Crocodylus porosus and Alligator mississippiensis that are under potential selection. Analysis was performed in REACTOME (v. 69) with Gallus gallus and Homo sapiens as ortholog species comparison. The networking pathways signify interacting genes and pathways as predicted from the 16 input genes. The yellow color gradient (intensity) corresponds to a probability of overlap of the query genes with that of the gene networking pathways on the REACTOME server. Darker colors signify a higher probability of overlap (closer to P = 0.05), whereas a lighter yellow signifies a lower probability of overlap with a networking pathway (P = 0).