| Literature DB >> 26442285 |
Kun Zhou1, Beibei Huang1, Ming Zou2, Dandan Lu1, Shunping He3, Guoxiu Wang1.
Abstract
Two sets of LSGs were identified using BLAST: Caenorhabditis elegans species-specific genes (SSGs, 1423), and Caenorhabditis genus-specific genes (GSGs, 4539). The data contained in this article show SSGs and GSGs have significant differences in evolution and that most of them were formed by gene duplication and integration of transposable elements (TEs). Subsequent observation of temporal expression and protein function presents that many SSGs and GSGs are expressed and that genes involved with sex determination, specific stress, immune response, and morphogenesis are most represented. The data are related to research article "Genome-wide identification of lineage-specific genes within Caenorhabditis elegans" in Journal of Genomics [1].Entities:
Year: 2015 PMID: 26442285 PMCID: PMC4552949 DOI: 10.1016/j.dib.2015.07.032
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Procedure for identifying lineage-specific genes within C. elegans. BLASTP was primarily used in this pipeline with an e-value less than 1e−5. C. elegans proteins were marked in orange, the proteins of species excluding C. elegans in green, and the results of SSGs, GSGs, and ECs (evolutionarily conserved genes) in blue. “Hit” in this figure represents a C. elegans protein has BLAST hits in BLASTP, whereas “No hit” represents a C. elegans protein without any hit.
Characteristics of SSGs, GSGs, and ECs.
| 1057.39±30.22 | 142.47±3.02 | 3.12±0.06 | 37.84±0.15 | 32.04 | |
| 1828.92±37.48 | 282.21±2.97 | 4.45±0.04 | 36.93±0.06 | 58.69 | |
| 4744.42±41.00 | 500.43±3.80 | 7.45±0.04 | 37.07±0.03 | 80.97 |
Categories of C. elegans LSGs.
| Mechanism of Formation of | ||
|---|---|---|
| SSGs | GSGs | |
| Exaptation from TEs | 374 (26.28%) | 1588 (34.99%) |
| Gene duplication | 355 (24.95%) | 2527 (55.67%) |
| Exaptation from TEs and gene duplication | 107 (7.52%) | 887 (19.54%) |
| Total | 622 (43.71%) | 3228 (71.12%) |
Fig. 2The proportion of LSGs expressed in different developmental stages. The vertical axis represents the proportion of genes with read supports, whereas, the abscissa axis represents the different developmental stages.
Fig. 3Results of RT-PCR. The first column represents the trans2k DNA Marker, and the other columns represent the 16 LSGs of WBGene00007393, WBGene00018297, WBGene00077563, WBGene00017986, WBGene00013778, WBGene00045297, WBGene00015229, WBGene00013713, WBGene00044080, WBGene00009308, WBGene00008603, WBGene00003764, WBGene00015821, WBGene00015597, WBGene00021289, and WBGene00002117, respectively.
| Subject area | |
| More specific subject area | |
| Type of data | |
| How data was acquired | |
| Data format | |
| Experimental factors | |
| Experimental features | |
| Data source location | |
| Data accessibility |
| • | The data in our study shows the genetic features of SSGs and GSGs and that their expression profiles at different developmental stages. |
| • | The data of the origin analysis of SSGs and GSGs indicated that gene duplication and exaptation from TEs mainly generating these genes. |
| • | The data derived from protein function prediction in silico suggests SSGs and GSGs may be involved in specific stress, morphogenesis, and immune response, which indicates these genes might be relevant to some essential processes to adapt extreme environment. |