| Literature DB >> 30723633 |
Abstract
The pig is a well-studied model animal of biomedical and agricultural importance. Genes of this species, Sus scrofa, are known from experiments and predictions, and collected at the NCBI reference sequence database section. Gene reconstruction from transcribed gene evidence of RNA-seq now can accurately and completely reproduce the biological gene sets of animals and plants. Such a gene set for the pig is reported here, including human orthologs missing from current NCBI and Ensembl reference pig gene sets, additional alternate transcripts, and other improvements. Methodology for accurate and complete gene set reconstruction from RNA is used: the automated SRA2Genes pipeline of EvidentialGene project.Entities:
Keywords: Agricultural genomics; Biomedical genomics; Genome informatics pipeline; Model organism; Precision genomics; Transcriptome assembly
Year: 2019 PMID: 30723633 PMCID: PMC6361002 DOI: 10.7717/peerj.6374
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Sus scrofa (pig) gene set numbers, summary output of SRA2Genes, version Susscr4EVm.
| 39,879 gene loci, all supported by RNA-seq, most also have protein homology evidence |
Sus scrofa gene sets compared for gene evidence recovery: (A) conserved vertebrate genes in pig gene sets (BUSCO), (B) human reference genes (Homo sapiens RefSeq).
| Geneset | Align | Miss | Frag | Best |
|---|---|---|---|---|
| A. Vertebrate conserved genes | ||||
| Evigene | 447 aa | 8 | 10 | 776 |
| NCBI | 440 aa | 17 | 2 | 80 |
| Ensembl | 431 aa | 14 | 20 | na |
| B. Human reference genes | ||||
| Evigene | 97% | 0.7% | 1.4% | 30% |
| NCBI | 96% | 0.7% | 0.7% | 7% |
| Ensembl | 95% | 0.9% | 1.1% | 3% |
Note:
Scores are the count for (A), and percent of reference count (n = 37,883) for (B). Align = alignment to reference proteins, as percent (B) or amino average (A), Frag = fragment alignment, size <50% of reference, Miss = no alignment, Best = percent (B), or count (A) of greater alignments in pairwise match to each reference gene.
Assembler method effects on Human reference gene recovery in Pig gene sets: (A) sample Pig1a (PRJNA416432) and (B) sample Pig2b (PRJNA353772).
| Method | Miss (%) | Frag (%) | Short (%) |
|---|---|---|---|
| A. Sample Pig1a | |||
| Velvet | 5 | 7 | 23 |
| Idba | 8 | 12 | 30 |
| Soap | 12 | 16 | 36 |
| Trinity | 20 | 28 | 49 |
| B. Sample Pig2b | |||
| Illumina_all | 4 | 6 | 20 |
| Illum_velvet | 5 | 7 | 23 |
| PacBio+ | 12 | 15 | 33 |
Note:
Scores are percent of reference count (n = 37,883) for Miss = no alignment, Frag = fragment alignment, size <50% of reference, Short = percent with size <95% of reference.
Conserved genes in model animals, mis-modeled by three methods.
| Gene set | Pig | Cow | Mouse | Rat | Fish | Human |
|---|---|---|---|---|---|---|
| Evigene | 18 | – | – | – | 14 | – |
| NCBI | 19 | 22 | 9 | 24 | 25 | 1 |
| Ensembl | 34 | 58 | 5 | 32 | 79 | 1 |
Notes:
Mis-model is Missing + Fragmented, calculated for 2,586 vertebrate conserved genes, as per Table 2A. Gene sets of year 2018 from NCBI, Ensembl, and two of Evigene are as listed in “Data and Software Citations.”