| Literature DB >> 28830355 |
Harry D Dawson1, Celine Chen2, Brady Gaynor3, Jonathan Shao3, Joseph F Urban2.
Abstract
BACKGROUND: The use of swine in biomedical research has increased dramatically in the last decade. Diverse genomic- and proteomic databases have been developed to facilitate research using human and rodent models. Current porcine gene databases, however, lack the robust annotation to study pig models that are relevant to human studies and for comparative evaluation with rodent models. Furthermore, they contain a significant number of errors due to their primary reliance on machine-based annotation. To address these deficiencies, a comprehensive literature-based survey was conducted to identify certain selected genes that have demonstrated function in humans, mice or pigs.Entities:
Keywords: Comparative genomics; Database; Porcine
Mesh:
Year: 2017 PMID: 28830355 PMCID: PMC5568366 DOI: 10.1186/s12864-017-4009-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Porcine Translation Research Database (PTR) Construction Flowchart
Current Database Statistics (07/12/2017)
| Parameter | Metric |
|---|---|
| Number of Entries | 13,054 |
| Number of Full-Length RNA Sequences (5′ and 3′ Representation) | 9720 |
| Number of Genes with Full-Length RNA Sequences | 9165 |
| Dawson Lab Full Length Submissions to Genbank | 1351 |
| Percent of Genome in Database with RNA Sequences | 41.7 |
| Number of Protein Coding Genes with Full-Length RNA Sequences | 7805 |
| Number of Protein Coding Gene Splice Variants | 667 |
| Number of Genes in Database with Full-Length Protein Sequences | 8099 |
| Percent of Genome in Database with Protein Sequences | 42.6 |
| Percent of Proteins in Database with Full-Length RNA Sequences | 0.964 |
| Number of Unigene Numbers Assigned | 10,232 |
| Unigene/Gene | 1.45 |
| Percentage of Entries with a Unigene Assignment | 0.770 |
| Entries with a Unigene Assignment | 7056 |
| Entries without a Unigene Assignment | 2109 |
| Number of NCBI Loci Represented | 9967 |
| NCBI Loci/Gene | 1.088 |
| Percentage of Entries with a NCBI loci Assignment | 0.824 |
| Entries with a NCBI loci Assignment | 7549 |
| Entries without a NCBI loci Assignment | 1616 |
Functional Annotations for 1041 Protein-Coding Genes that are Missing from Ensembl build 10.2
| Category | Term | # |
| Benjamini |
|---|---|---|---|---|
| GOTERM MF DIRECT | Nucleic acid binding transcription factor activity | 68 | 2.6E-3 | 3.0E-1 |
| GOTERM MF DIRECT | Interleukin-1 receptor binding | 7 | 1.8E-5 | 7.4E-3 |
| GOTERM MF DIRECT | Cytokine activity | 24 | 1.9E-5 | 5.2E-3 |
| INTERPRO | Ly-6 antigen / uPA receptor -like | 9 | 2.2E-6 | 2.9E-3 |
| INTERPRO | Homeodomain-like | 34 | 9.9E-5 | 3.2E-2 |
| INTERPRO | DNA binding HTH domain, Psq-type | 6 | 2.5E-4 | 6.2E-2 |
| PFAM | CENP-B N-terminal DNA-binding domain | 6 | 9.8E-5 | 3.5E-2 |
| GOTERM BP | RNA biosynthetic process | 241 | 4.5E-5 | 4.7E-2 |
Fig. 2Chromosomal Locations of 1307 Duplicated Gene Artifacts (2889 Loci)
Fig. 3a–d Sample Database Entry
Number and Types of Errors Located in Publically-available Porcine Databases
| Parameter | Metric |
|---|---|
| Number of Errors | 8187 |
| Number of Entries with Errors | 5337 |
| Number of Genes not identified in Ensembl Build 10.2. | 1354 |
| Missing from Genome | 1019 |
| Present but not Annotated | 335 |
| Artifactually Duplicated Loci | 1400 |
| Truncated proteins | 2291 |
| Elongated proteins | 199 |
Extensive Gene Fragmentation/Truncation Frequently Occurs Among Proteins of Extreme Size
| Protein | Accession | # of Exons | Nucleotides | Amino Acids | NCBI Loci | Ensembl Loci |
|---|---|---|---|---|---|---|
| TTN | Predicted | 312 | 103,020 | 33,921 | 9 | 0 |
| SYNE1 | Predicted | 152 | 27,499 | 8798 | 3 | 3 |
| OBSCN | Predicted | 106 | 26,424 | 8755 | 5 | 2 |
| MACF1 | Predicted | 141 | 23,519 | 7353 | 1 (truncated) | 1 (truncated) |
| SYNE2 | Predicted | 116 | 21,767 | 6911 | 1 | 1 |
| MUC6 | Predicted | 33 | 19,628 | 5692 | 1 (truncated) | 1 (truncated) |
| MDN1 | Predicted | 102 | 17,684 | 5600 | 2 | 2 |
| KMT2D | Predicted | 56 | 17,324 | 5584 | 0 | 1 (truncated) |
| HMCN1 | Predicted | 107 | 17,839 | 5519 | 9 | 8 |
| RNF213 | Predicted | 72 | 17,574 | 5245 | 4 | 3 |
| UBR4 | JAA53804.1 | 106 | 15,865 | 5182 | 4 | 4 |
| RYR1 | NP_001001534.1 | 106 | 15,384 | 5035 | 3 | 1 (truncated) |
| FAT4 | Predicted | 18 | 17,651 | 4983 | 4 | 1 (elongated) |
| RYR2 | Predicted | 107 | 16,588 | 4967 | 1 (truncated) | 1 (truncated) |
| KMT2C | Predicted | 64 | 15,669 | 4960 | 4 | 3 |
| RYR3 | Predicted | 107 | 15,574 | 4870 | 8 | 1 (truncated) |
| BIRC6 | Predicted | 78 | 15,159 | 4861 | 2 | 1 (truncated) |
| HERC1 | XP_001927286.4 | 80 | 15,199 | 4859 | 2 | 2 |
| HERC2 | JAG69485.1 | 98 | 15,070 | 4847 | 2 | 2 |
| DNHD1 | XP_013844910.1 | 41 | 16,014 | 4737 | 1 | 1 (truncated) |
| DNAH8 | XP_001924974.2 | 97 | 14,418 | 4729 | 1 | 1 |
| MYCBP2 | Predicted | 89 | 15,037 | 4675 | 1 (truncated) | 1 (truncated) |
| DYNC1H1 | Predicted | 78 | 14,323 | 4646 | 3 | 1 (truncated) |
| LRP1B | Predicted | 91 | 16,455 | 4590 | 13 | 3 |
| FAT1 | Predicted | 29 | 14,904 | 4588 | 1 | 1 (elongated) |
| APOB | Predicted | 31 | 14,158 | 4573 | 4 | 4 |
| FAT3 | Predicted | 33 | 18,857 | 4557 | 1 (truncated) | 1 (elongated) |
| LRP1 | JAA53703.1 | 89 | 14,074 | 4544 | 2 | 2 |
| ABCA13 | Predicted | 75 | 15,009 | 4444 | 3 | 0 |
| SACS | Predicted | 12 | 15,381 | 4441 | 4 | 3 |
| ANK3 | XP_005671069.1 | 52 | 15,032 | 4376 | 1 | 1 (truncated) |
| HUWE1 | Predicted | 91 | 14,590 | 4373 | 1 (elongated) | 2 |
| VPS13D | Predicted | 70 | 16,126 | 4364 | 3 | 1 (truncated) |
| FAT2 | Predicted | 31 | 14,579 | 4350 | 2 | 1 (truncated) |
| PKD1 | NP_001233131.1 | 47 | 14,212 | 4305 | 1 | 1 |
| HECTD4 | Predicted | 76 | 19,834 | 4271 | 1 (truncated) | 1 (truncated) |
| PRKDC | Predicted | 86 | 14,362 | 4135 | 5 | 2 |
| ANK2 | Predicted | 55 | 14,520 | 4100 | 1 (truncated) | 1 (truncated) |
| VPS13B | JAG69054.1 | 80 | 13,584 | 3993 | 3 | 3 |
| KMT2A | JAG69421.1 | 37 | 16,597 | 3967 | 1 (truncated) | 1 (truncated) |
| DNAH12 | Predicted | 78 | 11,946 | 3961 | 3 | 1 (truncated) |
| AKAP9 | NP_001240753.1 | 55 | 12,489 | 3898 | 1 | 1 |
| LYST | JAA53665.1 | 61 | 12,677 | 3798 | 3 | 1 (truncated) |
| MUC4 | XP_005670193.1 | 25 | 11,665 | 3745 | 1 | 1 (truncated) |
| ZNF469 | Predicted | 3 | 12,517 | 3736 | 0 | 0 |
| VPS13C | JAA53695.1 | 88 | 11,772 | 3714 | 3 | 2 |
| ZFHX3 | Predicted | 11 | 15,821 | 3713 | 1 (truncated) | 1 (truncated) |
| DMD | NP_001012408.1 | 87 | 13,770 | 3674 | 5 | 2 |
| SMG1 | JAG69152.1 | 64 | 15,532 | 3659 | 2 | 0 |
| SPEN | JAG69140.1 | 15 | 12,261 | 3655 | 1 (truncated) | 1 (truncated) |
| CUBN | Predicted | 71 | 11,536 | 3620 | 4 | 2 |
| ZFHX4 | Predicted | 15 | 14,156 | 3611 | 1 (truncated) | 1 (truncated) |
| WDFY3 | XP_005656619.1 | 74 | 14,209 | 3594 | 1 | 1 |
| USP34 | JAA53700.1 | 80 | 11,327 | 3547 | 3 | 3 |
| UTRN | JAA53694.1 | 84 | 10,547 | 3432 | 4 | 5 |
| COL6A3 | XP_013840079.1 | 50 | 13,801 | 3199 | 1 | 1 (truncated) |
| VPS13A | Predicted | 76 | 11,078 | 3172 | 2 | 1 |
| CEP350 | JAA53656.1 | 40 | 9910 | 3121 | 2 | 2 |
| CELSR1 | Predicted | 38 | 11,081 | 3031 | 2 | 1 (truncated) |
| FRY | Predicted | 66 | 10,455 | 3016 | 3 | 2 |
Fig. 4Hemicentin (a) and Titin (b) Assembly Blasts
Fig. 5Analysis of MicroRNA Sequence Origin and Species Similarity. These 3 sources of information for our 1047 MicroRNA sequences have a significant amount of overlap (a) and include 81 that we have predicted based upon their presence in other species and other unfinished porcine genomes. Of these sequences, 454 are unique to pigs, 318 are shared among the four species (b), 55 are shared between humans and pigs but not mice and cows and 25 are shared between mice and pigs but not humans and cows