| Literature DB >> 26208029 |
Feng-Yun Xie1, Yu-Long Feng2, Hong-Hui Wang1, Yun-Feng Ma2, Yang Yang3, Yin-Chao Wang4, Wei Shen1, Qing-Jie Pan5, Shen Yin1, Yu-Jiang Sun3, Jun-Yu Ma1.
Abstract
Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus) for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR) protein database. We also compared the donkey protein sequences with those of the horse (E. caballus) and wild horse (E. przewalskii), and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement.Entities:
Mesh:
Year: 2015 PMID: 26208029 PMCID: PMC4514889 DOI: 10.1371/journal.pone.0133258
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Length distribution of the de novo assembled donkey RNAs and predicted donkey protein fragments.
The RNA and protein data for the horse and wild horse were downloaded from the NCBI genome database (ftp://ftp.ncbi.nih.gov/genomes).
Summary of the RNA/protein numbers and lengths of donkey white blood cells, horse, and wild horse.
| Species (sample) | Donkey ( | Horse ( | Wild horse ( |
|---|---|---|---|
| RNA number | 264,714 | 35,999 | 45,076 |
| Average RNA length | 594 nt | 2,711 nt | 2,854 nt |
| Total RNA length | 157,214,590 nt | 97,609,315 nt | 128,651,487 nt |
| Protein number | 38,949 | 32,352 | 38,416 |
| Average protein length | 168 aa | 573 aa | 575 aa |
| Total protein length | 6,538,837 aa | 18,540,986 aa | 22,124,456 aa |
Statistical summary of predicted donkey protein fragments associated with mammalian phenotypes.
| Mammalian Phenotype Term | Mouse symbol number | Mouse symbol number 2(percentage) | Horse protein number | Donkey protein fragment number | Donkey protein length (aa) | Donkey RNA length (nt) |
|---|---|---|---|---|---|---|
| adipose tissue phenotype | 663 | 289 | 732 | 885 | 196,803 | 1,704,605 |
| behavior/neurological phenotype | 2,701 | 954 (35.32%) | 2,765 | 3,012 | 624,409 | 5,874,119 |
| cardiovascular system phenotype | 2,053 | 893 (43.50%) | 2,257 | 2,858 | 609,864 | 6,025,705 |
| cellular phenotype | 2,702 | 1361 (50.37%) | 3,213 | 4,024 | 901,466 | 8,036,005 |
| craniofacial phenotype | 1,067 | 386 (36.18%) | 1,024 | 1,274 | 285,326 | 2,325,180 |
| digestive/alimentary phenotype | 1,021 | 393 (38.49%) | 1,077 | 1,138 | 215,507 | 1,830,532 |
| embryogenesis phenotype | 1,451 | 730 (50.31%) | 1,694 | 2,099 | 512,414 | 4,358,464 |
| endocrine/exocrine gland phenotype | 1,636 | 705 (43.09%) | 1,846 | 2,108 | 455,282 | 4,297,417 |
| growth/size/body phenotype | 3,649 | 1494 (40.94%) | 3,787 | 4,644 | 1,076,374 | 9,211,980 |
| hearing/vestibular/ear phenotype | 595 | 186 (31.26%) | 562 | 575 | 143,938 | 1,201,663 |
| hematopoietic system phenotype | 2,549 | 1299 (50.96%) | 2,784 | 4,198 | 928,483 | 8,584,817 |
| homeostasis/metabolism phenotype | 3,649 | 1586 (43.46%) | 3,977 | 5,096 | 1,046,174 | 9,943,604 |
| immune system phenotype | 2,604 | 1287 (49.42%) | 2,873 | 4,269 | 927,245 | 8,861,768 |
| integument phenotype | 1,699 | 588 (34.61%) | 1,541 | 1,911 | 413,608 | 3,736,364 |
| limbs/digits/tail phenotype | 986 | 298 (33.15%) | 790 | 891 | 227,258 | 1,696,015 |
| liver/biliary system phenotype | 899 | 476 (52.95%) | 1,055 | 1,318 | 298,347 | 2,662,206 |
| mortality/aging | 4,173 | 1903 (45.60%) | 4,576 | 6,156 | 1,400,869 | 12,975,182 |
| muscle phenotype | 1,070 | 466 (43.55%) | 1,305 | 1,456 | 309,052 | 3,014,111 |
| nervous system phenotype | 2,689 | 1068 (39.72%) | 3,108 | 3,267 | 702,258 | 6,064,300 |
| other phenotype | 160 | 73 (45.63%) | 173 | 248 | 43,176 | 551,780 |
| pigmentation phenotype | 582 | 150 (25.77%) | 371 | 544 | 127,213 | 1,149,165 |
| renal/urinary system phenotype | 913 | 372 (40.74%) | 972 | 1,195 | 265,898 | 2,162,417 |
| reproductive system phenotype | 1,649 | 658 (39.90%) | 1,760 | 2,020 | 462,698 | 3,716,326 |
| respiratory system phenotype | 1,024 | 426 (41.60%) | 1,127 | 1,251 | 256,160 | 2,294,374 |
| skeleton phenotype | 1,579 | 639 (40.47%) | 1,620 | 1,996 | 428,884 | 3,722,388 |
| taste/olfaction phenotype | 111 | 37 (33.33%) | 123 | 122 | 19,594 | 172,984 |
| tumorigenesis | 552 | 298 (53.99%) | 627 | 1,177 | 247,409 | 1,907,737 |
| vision/eye phenotype | 1,274 | 497 (39.01%) | 1,263 | 1,528 | 289,137 | 2,620,264 |
*Mouse symbol number 2 represents the number of mouse symbols whose donkey homologous protein fragments are detected in donkey white blood cells.
Fig 2Outer ear morphology-associated genes.
Only those genes whose homologous sequences were identified in the donkey white blood cell transcriptome are shown. Pink, mammalian phenotype term; green and red, mouse protein symbols associated with outer ear morphology. Red symbols represent the homologous donkey proteins that are not conserved between donkeys and horses.
Fig 3Multiple alignment of the HIC1 protein.
The protein IDs and corresponding GenBank accession numbers are listed in Table 3.
Fig 4Multiple alignment of the PRKRA protein.
The protein IDs and corresponding GenBank accession numbers are listed in Table 3.
Protein sequences used for multiple sequence alignment.
| Symbol / Full name | Abbreviation | Species | ID |
|---|---|---|---|
| HIC1 / hypermethylated in cancer 1 | D_HIC1 | Donkey | Unigene110778_NormalA |
| H_HIC1 | Horse | gi|545180679 | |
| M_HIC1-1 | Mouse | gi|148226885 | |
| M_HIC1-2 | Mouse | gi|148228529 | |
| H_HIC1-1 | Human | gi|61676186 | |
| H_HIC1-2 | Human | gi|148237270 | |
| P_HIC1-X1 | Pig | gi|311268073 | |
| P_HIC1-X2 | Pig | gi|545859615 | |
| PRKRA / protein kinase, interferon inducible double stranded RNA dependent activator | D_PRKRA | Donkey | Unigene69853_NormalA |
| H_PRKRA | Horse | gi|545191414 | |
| WH_PRKRA | Wild Horse | gi|664713499 | |
| M_PRKRA | Mouse | gi|755499568 | |
| M_PRKRA-X1 | Mouse | gi|755499566 | |
| M_PRKRA-X2 | Mouse | gi|6755162 | |
| Ho_PRKRA | Human | gi|49065476 | |
| P_PRKRA | Pig | gi|194043952 | |
| KMT2A / lysine-specific methyltransferase 2A | D_KMT2A | Donkey | Unigene1721_NormalA |
| H_KMT2A | Horse | gi|545221883 | |
| WH_KMT2A | Wild Horse | gi|664709525 | |
| M_KMT2A | Mouse | gi|124486682 | |
| H_KMT2A-1 | Human | gi|308199413 | |
| H_KMT2A-2 | Human | gi|56550039 | |
| P_KMT2A | Pig | gi|350588548 |