| Literature DB >> 34599334 |
Aundrea K Westfall1, Rory S Telemeco1,2, Mariana B Grizante3, Damien S Waits1, Amanda D Clark1, Dasia Y Simpson1, Randy L Klabacka1, Alexis P Sullivan4, George H Perry4,5,6, Michael W Sears7, Christian L Cox8,9, Robert M Cox10, Matthew E Gifford11, Henry B John-Alder12, Tracy Langkilde4, Michael J Angilletta3, Adam D Leaché13,14, Marc Tollis3,15, Kenro Kusumi3, Tonia S Schwartz1.
Abstract
BACKGROUND: High-quality genomic resources facilitate investigations into behavioral ecology, morphological and physiological adaptations, and the evolution of genomic architecture. Lizards in the genus Sceloporus have a long history as important ecological, evolutionary, and physiological models, making them a valuable target for the development of genomic resources.Entities:
Keywords: genome; reptile; squamate; transcriptome
Mesh:
Year: 2021 PMID: 34599334 PMCID: PMC8486681 DOI: 10.1093/gigascience/giab066
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Summary statistics across genome assemblies for Sceloporus undulatus
| Metric | Supernova Assembly (10X Chromium) | HiRise Assembly (10X Chromium + Hi-C) | PBJelly Assembly (SceUnd1.0) (10X Chromium + Hi-C + PacBio) |
|---|---|---|---|
| Coverage | 46× | 4,859× | 4,859× |
| Contig N50 | 0.049 Mb | 0.073 Mb | 0.193 Mb |
| Scaffold N50 | 2.55 Mb | 265.4 Mb | 275.6 Mb |
| Scaffold N90 | 0.241Mb | 35.4 Mb | 37.1 Mb |
| Scaffold L50 | 218 Scaffolds | 3 Scaffolds | 3 Scaffolds |
| Scaffold L90 | 987 Scaffolds | 9 Scaffolds | 9 Scaffolds |
| Tetrapoda BUSCO (n = 3,950) | |||
| On whole genome | 89.5% Complete; 6.4% Fragmented; 4.1% Missing | 90.2% Complete; 5.5% Fragmented; 4.3% Missing | 90.9% Complete; 5.0% Fragmented; 4.1% Missing |
| On top 24 scaffolds | 90.7% Complete; 4.9% Fragmented; 4.4% Missing | ||
| On predicted proteins from top 24 scaffolds | 79.1% Complete; 13.7% Fragmented; 7.2% Missing | ||
| Assembly size | 1.61 Gb | 1.836 Gb | 1.9056 Gb with gaps; 1.8586 Gb without gaps |
| Annotation: 21,050 of our predicted proteins had hits in ENSEMBL |
N50 (N90): The contig or scaffold length such that the sum of the lengths of all scaffolds of this size or larger is equal to 50% (90%) of the total assembly length; L50 (L90): The smallest number of scaffolds that make up 50% (90%) of the total assembly length.
Figure 1:Adult male Sceloporus undulatus (eastern fence lizard) from Andalusia, AL, pictured outside of Samford Hall at Auburn University, (a) profile, (b) ventral, (c) dorsal view. This specimen was used for genome sequencing at DoveTail Genomics. Photo credit: R. Telemeco.
Figure 2:An evaluation of Sceloporus undulatus genome assembly quality. (a) Comparison of the contiguity of the 3 S. undulatus genome assemblies (fence lizard) relative to other squamate genome assemblies based on the log10 of the scaffold length. The X axis is the N(x) with the N50 and the N90 emphasized with a vertical line, representing the scaffold size that contains 50% or 90% of the data, respectively. The legend lists the assemblies in the order of the lines from most contiguous (top) to least contiguous (bottom). Note the Fence Lizard PBJelly (dark blue, SceUnd1.0) and Fence Lizard HiRise (light green) assemblies are the second and third from the top and are nearly indistinguishable. (b–d) Scaffold size distribution of SceUnd1.0 and the number of BUSCO genes that mapped to each scaffold. (b) The length of the first 24 scaffolds, where the first 11 scaffolds likely represent the haploid N = 11 chromosomes (6 macrochromosomes and 5 microchromosomes). The numbers above each bar represent scaffold length to the nearest Mb. The number of BUSCO genes that mapped to each scaffold based on (c) the genome assembly, and (d) the predicted proteins from the annotation. The 11 large scaffolds inferred to correspond to chromosomes have many unique and complete BUSCO genes (green), whereas the smaller contigs have duplicated BUSCOs (purple), suggesting that they are the result of reads not mapping correctly to the chromosomes.
Sceloporus undulatus de novo transcriptome assembly statistics
| Assembly | 1 Tissue [ | 3 Tissues | 4 Tissues |
|---|---|---|---|
| Total of Trinity transcripts | 158,323 | 492,249 | 547,370 |
| Total of Trinity “genes" | 138,031 | 422,687 | 467,658 |
| GC% | 43.81 | 42.85 | 42.76 |
| Contig N50 | 1,720 | 1,648 | 1,438 |
| Contig E90N50 | 2,254 | 2,640 | 2,550 |
| Mean contig length (bp) | 833.0 | 822.4 | 781.5 |
| Transcripts with the longest ORFs | 86,630 (54.7%) | 212,172 (43.1%) | 217,756 (39.8%) |
The 4 tissues comprise 3 tissues first reported in this study (brain, skeletal, and embryos) from gravid females collected in Edgefield County, SC, plus liver tissue previously reported by McGaugh et al. 2015 [51].
BUSCO results for transcriptomes of 2 lizard species
| Parameter |
|
| ||
|---|---|---|---|---|
| 1 tissue | 3 tissues | 4 tissues | 14 tissues | |
| Complete genes (%) | 72.5 | 91.7 | 92.3 | 96.7 |
| Duplicated genes (%) | 25 | 43.8 | 43.9 | 37.9 |
| Fragmented genes (%) | 9.2 | 4.8 | 4.8 | 1.1 |
| Missing genes (%) | 18.3 | 3.5 | 2.9 | 2.2 |
| Reference | McGaugh et al. 2015 [ | This study | This study | Eckalbar et al. 2013 [ |
For Sceloporus undulatus, the 4 tissues are the 3 tissues (brain, skeletal muscle, and embryos) first reported here with the addition of 1 tissue (liver) from McGaugh et al. 2015 [51]. For Anolis carolinensis, see Eckalbar et al. 2013 [59] for the complete list of tissues used.
Annotation of Sceloporus undulatus de novo transcriptome assembly using 4 tissues
| Annotation | Value |
|---|---|
| Annotated genes | 467,658 |
| Annotated transcript isoforms | 547,370 |
| Annotated isoforms/genes | 1.17 |
| Transcripts with Swiss-Prot annotation | (71,944) |
| Transcripts with PFAM annotation | 51,018 (46,432) |
| Transcripts with KEGG annotation | 65,694 (21,520) |
| Transcripts with GO annotation | 73,936 (66,554) |
Parentheses indicate unique annotation numbers.
RNAseq datasets used for training the genome annotation pipeline
| Dataset | Data type | NCBI SRA Accession No. | Tissue | Age | Sex | Treatment/Condition |
|---|---|---|---|---|---|---|
| 1. This article | 100 bp PE | SAMN06312743 | Skeletal muscle | Adult | Female | Post-reproductive |
| SAMN06312741 | Brain | Adult | Female | Post-reproductive | ||
| SAMN06312742 | Whole embryo | Embryo | N/A | |||
| 2. McGaugh et al. 2015 [ | 100 bp PE | SRR629640 | Liver | Juvenile | Male | Control lab |
| 3. Cox et al. submitted for publication | 125 bp PE | SAMN14774299–321 | Liver | Juvenile | Female | Blank |
| Male | Castrated | |||||
| Male | Control | |||||
| Female | Testosterone | |||||
| Male | Testosterone | |||||
| 4. Simpson et al. in preparation | 150 bp PE | SAMN08687228–45 | Liver | Adult | Male | Control lab |
| Acute heat stress | ||||||
| Fire ant bitten |
Datasets 1 and 2 were also used in the de novo transcriptome assembly. Data are accessible through NCBI BioProjects: 1. PRJNA371829; 3. PRJNA629371; 4. PRJNA437943.
Figure 3:Age distributions of the major repetitive elements found in the Anolis carolinensis (AnoCar2.0) and Sceloporus undulatus (SceUnd1.0) genome assemblies. The repeat landscapes depict the relative abundance of repeat types in the genome vs their Kimura divergence from their consensus. DNA: DNA transposons; LINE: long interspersed nuclear element; LTR: long terminal repeat retrotransposon; RC: rolling circle Helitron; SINE: short interspersed nuclear element.
Comparison of each genome assembly type as a reference for population-level analyses for RNAseq and WGS of Sceloporus undulatus individuals from Alabama (AL, either low or high coverage), Tennessee (TN), and Arkansas (AR)
| Assembly | Parameter | RNAseq, AL | Low coverage, AL | High coverage, AL | High coverage, TN | High coverage, AR |
|---|---|---|---|---|---|---|
| SuperNova | QC-passed reads | 3.28E7 ± 6.83E6 | 5.11E7 ± 3.36E7 | 3.33E8 ± 2.66E7 | 3.47E8 ± 9.39E7 | 3.33E8 ± 6.14E7 |
| Reads mapped | ||||||
| No. | 2.68E7 ± 6.19E6 | 5.07E7 ± 3.34E7 | 3.30E8 ± 2.65E7 | 3.43E8 ± 9.13E7 | 3.23E8 ± 6.69E7 | |
| % | 81.49 ± 0.09 | 99.29 ± 0.11 | 99.29 ± 0.08 | 98.80 ± 0.60 | 96.84 ± 4.75 | |
| Whole-genome (×) | NA | 3.56 ± 2.95 | 23.02 ± 10.52 | 23.33 ± 11.25 | 22.27 ± 10.81 | |
| HET SNP sensitivity | NA | 0.58 | 0.93 | 0.91 | 0.91 | |
| HiRise | QC-passed reads | 3.30E7 ± 6.86E6 | 5.11E7 ± 3.36E7 | 3.33E8 ± 2.66E7 | 3.47E8 ± 9.39E7 | 3.33E8 ± 6.14E7 |
| Reads mapped | ||||||
| No. | 2.71E7 ± 6.30E6 | 5.07E7 ± 3.34E7 | 3.30E8 ± 2.65E7 | 3.43E8 ± 9.13E7 | 3.23E8 ± 6.69E7 | |
| % | 82.37 ± 0.09 | 99.29 ± 0.11 | 99.29 ± 0.08 | 98.80 ± 0.60 | 96.84 ± 4.75 | |
| Whole genome (×) | NA | 3.56 ± 2.95 | 23.02 ± 10.52 | 23.33 ± 11.25 | 22.27 ± 10.81 | |
| HET SNP sensitivity | NA | 0.58 | 0.93 | 0.91 | 0.91 | |
| PBJelly | QC-passed reads | 3.29E7 ± 6.84E6 | 5.09E7 ± 3.35E7 | 3.31E8 ± 2.64E7 | 3.45E8 ± 9.29E7 | 3.31E8 ± 6.09E7 |
| Reads mapped | ||||||
| No. | 2.71E7 ± 6.25E6 | 5.06E7 ± 3.33E7 | 3.29E8 ± 2.63E7 | 3.41E8 ± 9.05E7 | 3.22E8 ± 6.66E7 | |
| % | 82.28 ± 0.09 | 99.46 ± 0.11 | 99.47 ± 0.08 | 98.97 ± 0.61 | 97.00 ± 4.78 | |
| Whole-genome (×) | NA | 3.36 ± 2.97 | 21.75 ± 11.46 | 22.04 ± 12.14 | 21.04 ± 11.64 | |
| HET SNP sensitivity | NA | 0.55 | 0.88 | 0.87 | 0.86 |
Datasets were mapped to either the SuperNova Assembly containing only the 10X Genomics Chromium data, the HiRise Assembly containing 10X Genomics Chromium and Hi-C data, or the PBJelly assembly (SceUnd1.0) containing 10X Genomics Chromium, Hi-C, and PacBio data. Mean SAMTOOLS QC-passed reads, reads mapped, and percentage of mapped QC-passed reads for every sequencing depth and population are shown along with mean whole-genome coverage and theoretical HET SNP sensitivity for every assembly and population. Data are available in NCBI BioProject: PRJNA656311.
Figure 4:Relationship between divergence time and effectiveness of using the Sceloporus undulatus assembly for reference-based mapping. (a) Phylogenetic relationships and divergence times of selected Sceloporus species, according to Leaché et al. [99]. For the purpose of illustration only the species used in our analysis are shown. (b) Relationship between percent reads mapped to the S. undulatus reference genome (SceUnd1.0) and time of divergence from S. undulatus with a linear regression. The color of the dots represents the percent of the genome that is covered, which was affected by the number of redundant sequences in the reduced representation library for a particular species.
Sceloporus species with partial genomic sequence assemblies updated using SceUnd1.0 as a reference
| Species | SRA Accession | Gigabases | Original | Reference-based assembly, % | |||||
|---|---|---|---|---|---|---|---|---|---|
| Coverage | BUSCO Comp | BUSCO Frag | Mapped | Coverage | BUSCO Comp | BUSCO Frag | |||
|
| SRX545583 | 40.88 | 61.01 | 16.2 | 32.8 | 96.59 | 88.68 | 90.2 | 5.7 |
|
| SRX542351 | 6.14 | 0.88 | 0 | 0 | 94.18 | 63.2 | 25.8 | 23.3 |
|
| SRX542352 | 5.9 | 1.18 | 0.1 | 1.1 | 74.73 | 46.43 | 33.0 | 27.7 |
|
| SRX542353 | 5.1 | 1.74 | 0.2 | 1.6 | 92.52 | 42.26 | 7.0 | 19.5 |
|
| SRX542354 | 7.96 | 1.38 | 0.2 | 1.2 | 75.11 | 46.47 | 31.7 | 31.1 |
|
| SRX542380 | 3.92 | 0.08 | 0.0 | 0.0 | 86.84 | 15.71 | 0.8 | 3.0 |
|
| SRX542355 | 4.93 | 3.78 | 0.2 | 3.1 | 97.88 | 60.17 | 13.7 | 21.6 |
|
| SRX542356 | 4.57 | 1.37 | 0.1 | 1.4 | 95.94 | 58.21 | 13.8 | 20.8 |
|
| SRX542357 | 3.57 | 0.04 | 1.7 | 0.3 | 80.2 | 52.16 | 6.0 | 16.3 |
|
| SRX542358 | 6.5 | 1.81 | 0.1 | 1.7 | 96.19 | 70.49 | 39.1 | 27.1 |
|
| SRX542359 | 5.82 | 1.06 | 0.2 | 0.9 | 87.34 | 40.13 | 4.4 | 14.8 |
|
| SRX542383 | 4.53 | NA | 0.1 | 0.4 | 84.72 | 7.13 | 0.1 | 0.4 |
|
| SRX542360 | 4.76 | 1.81 | 0.1 | 1.7 | 92.92 | 52.8 | 12.2 | 20.7 |
|
| SRX542361 | 3.74 | 0.17 | 0.2 | 0.9 | 95.92 | 37.49 | 1.6 | 7.0 |
|
| SRX542362 | 4.42 | 1.14 | 1.8 | 0.9 | 83.3 | 38.41 | 2.8 | 10.6 |
|
| SRX542363 | 6.96 | 1.5 | 0.0 | 0.0 | 88.12 | 56.49 | 34.4 | 31.0 |
|
| SRX542364 | 3.38 | 0.95 | 1.4 | 1.0 | 93.31 | 36.81 | 2.1 | 9.1 |
|
| SRX542365 | 3.5 | 0.8 | 1.7 | 0.7 | 84.26 | 31.74 | 1.2 | 5.6 |
|
| SRX542384 | 4.55 | 0.11 | 0.1 | 0.4 | 91.15 | 22.27 | 0.9 | 4.2 |
|
| SRX542366 | 5.54 | 1.25 | 0.2 | 1.4 | 94.23 | 60.02 | 20.9 | 25.3 |
|
| SRX542367 | 6.63 | 1.57 | 0.3 | 2.5 | 78.84 | 46.78 | 17.6 | 21.6 |
|
| SRX542368 | 3.14 | 1.11 | 1.2 | 0.9 | 95.38 | 35.89 | 1.4 | 8.2 |
|
| SRX542369 | 3.88 | 0.99 | 1.8 | 0.9 | 81.14 | 35.79 | 1.9 | 8.8 |
|
| SRX542370 | 6.59 | 1.58 | 0.1 | 1.5 | 90.49 | 42.11 | 3.4 | 11.3 |
|
| SRX542371 | 6.56 | 1.04 | 0.2 | 1.8 | 89.93 | 65.53 | 47.0 | 24.9 |
|
| SRX542373 | 4.75 | 1.18 | 0.1 | 0.8 | 77.35 | 39.47 | 7.7 | 16.8 |
|
| SRX542374 | 5.91 | 1.51 | 0.1 | 1.1 | 96.8 | 69.15 | 36.0 | 26.9 |
|
| SRX542382 | 3.68 | 0.14 | 0.1 | 0.4 | 88.58 | 22.35 | 0.9 | 3.7 |
|
| SRX542375 | 6.78 | 1.75 | 0.3 | 2.2 | 90.15 | 57.36 | 20.1 | 21.4 |
|
| SRX542376 | 5.36 | 4.67 | 0.3 | 3.4 | 98.29 | 62.09 | 17.4 | 22.8 |
|
| SRX542381 | 4.13 | 0.06 | 0.0 | 0.3 | 63.97 | 17.42 | 1.1 | 3.7 |
|
| SRX542377 | 7.59 | 1.5 | 0.2 | 1.2 | 76.93 | 52.22 | 38.8 | 30.2 |
|
| SRX542378 | 3.52 | 0.7 | 1.7 | 0.8 | 94.64 | 52.36 | 6.4 | 17.9 |
|
| SRX542379 | 2.71 | 0.62 | 1.3 | 0.9 | 93.48 | 29.39 | 0.7 | 5.3 |
| Mean (excluding | 1.23 | 44.4 | |||||||
Genomic resources for 34 of the species were obtained using reduced representation libraries [93], while 1 species, S. occidentalis, was sequenced using whole-genome shotgun sequencing [92]. The data were downloaded from the SRA (Study Accession SRP041983 [93]). Gigabases refer to the amount of sequence data for each library.
Figure 5:Marker-based synteny painting of fence lizard (Sceloporus undulatus) scaffolds/chromosomes onto the tegu (Salvator merianae), green anole (Anolis carolinensis), and python (Python bivittatus) assemblies. The color indicates synteny for that scaffold. The linkage groups representing microchromosomes in the green anole are lettered and expanded to visualize the colors. The white areas did not have a high-confidence match between the anole and the fence lizard to paint. Putative sex chromosomes are indicated with uppercase letters.