Literature DB >> 21729922

Analysis of muscle and ovary transcriptome of Sus scrofa: assembly, annotation and marker discovery.

Qinghua Nie1, Meixia Fang, Xinzheng Jia, Wei Zhang, Xiaoning Zhou, Xiaomei He, Xiquan Zhang.   

Abstract

Pig (Sus scrofa) is an important organism for both agricultural and medical purpose. This study aims to investigate the S. scrofa transcriptome by the use of Roche 454 pyrosequencing. We obtained a total of 558 743 and 528 260 reads for the back-leg muscle and ovary tissue each. The overall 1 087 003 reads give rise to 421 767 341 bp total residues averaging 388 bp per read. The de novo assemblies yielded 11 057 contigs and 60 270 singletons for the back-leg muscle, 12 204 contigs and 70 192 singletons for the ovary and 18 938 contigs and 102 361 singletons for combined tissues. The overall GC content of S. scrofa transcriptome is 42.3% for assembled contigs. Alternative splicing was found within 4394 contigs, giving rise to 1267 isogroups or genes. A total of 56 589 transcripts are involved in molecular function (40 916), biological process (38 563), cellular component (35 787) by further gene ontology analyses. Comparison analyses showed that 336 and 553 genes had significant higher expression in the back-leg muscle and ovary each. In addition, we obtained a total of 24 214 single-nucleotide polymorphisms and 11 928 simple sequence repeats. These results contribute to the understanding of the genetic makeup of S. scrofa transcriptome and provide useful information for functional genomic research in future.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21729922      PMCID: PMC3190955          DOI: 10.1093/dnares/dsr021

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


Introduction

As a predominant domestic animal, pig (Sus scrofa) not only provides us plenty meat, but also is an important model organism for medical research. Understanding the genetic principle of growth and reproduction traits is helpful for S. scrofa production and also has scientific significance for basic biology and human medicine.[1] Notable progress has been achieved to identify causative genes or single-nucleotide polymorphisms (SNPs) underlie complex traits in the past decades. A recessive missense mutation of S. scrofa ryanodine receptor (RYR1) gene was proved to induce malignant hyperthermia and halothane sensitivity in S. scrofa, which was effectively applied in selection program by many breeding farms.[2] In 2003, an SNP in intron 3 of insulin-like growth factor 2 was identified as a causative quantitative trait nucleotide for porcine muscle growth.[3] With the accomplishment of S. scrofa genome sequencing, it is realistic and efficient to identify genes or genetic markers underlying complex traits at whole-genome or transcriptome level. In 2005, a 0.66× coverage S. scrofa genome including 3.84 million shotgun sequences was generated from Hampshire, Yorkshire, Landrace, Duroc and ErHuaLian pigs, giving rise to 2.08 billion nucleotide bases, indicating that the S. scrofa genome is much closer to human than mouse does.[4] Later, the complete S. scrofa genome was available online in 2009 (November 2009, SGSC Sscrofa9.2/susScr2; http://genome.ucsc.edu/cgi-bin/hgBlat). From 4.8 million whole-genome shotgun sequences, 98 151 SNPs were predicted with one sequence representing the polymorphism, and most SNPs were confirmed by testing in three purebred boar lines and wild boar.[5] Recently, a high-density SNP chip (Illumina Porcine 60K + SNP iSelect Beadchip) was designed and supplied for commercial use, which included 64 232 SNPs after reliable validation.[6] The developed SNP chip was subsequently used in whole-genome association analysis to identify genes for body composition and structural soundness traits in S. scrofa.[7] Based on genomic background, investigation of S. scrofa transcriptome is realistic and extremely useful for identification of candidate genes account for quantitative traits at the global level. The cDNA microarray or gene chip is a common tool for transcriptomic analysis. Confirmed by real-time RT–PCR and association analyses, some candidate genes and SNP markers for S. scrofa reproduction traits have been identified with microarray profiling by Affymetrix Porcine Genechip™.[8] With the use of S. scrofa whole-genome 70-mer oligonucleotide microarray, 62 expression quantitative traits loci (eQTLs) were successfully identified from loin muscle tissue through global genome-wide linkage analysis.[9] Nevertheless, cDNA microarray has some limitations as it fails to recognize new genes (transcripts) and sequence variations. RNA sequencing (RNA-seq) is a new but efficient technology for the thorough investigation on transcriptome. With the rapid development of second-generation sequencing, RNA-seq becomes more efficient and less costive by some latterly developed platforms, i.e. Roche 454, Illuminate Solexa GA IIx, Life Technology SOLID, Helicos Biosciences tSMS and others.[10] The Roche 454 can generate long reads and is generally used in transcriptome analysis in human,[11,12] mammals,[13] insects,[14] fish,[15] plants[16,17] and microorganisms.[18] Until now, reports on S. scrofa transcriptome by RNA-seq technology are very limited. In this study, we performed 454 pyrosequencing of muscle and ovary tissues to characterize S. scrofa transcriptome and to identify potential markers for growth and reproduction traits.

Materials and methods

Animal and RNA preparation

One landrace female at 6-month age was subject to transcriptomic analysis of S. scrofa. Landrace is a typical commercial S. scrofa strain and widely used in domestic livestock production. The animal was slaughtered quickly to collect two tissues of the ovary and back-leg muscle. The fresh tissues were steeped in liquid nitrogen immediately after collection, and then kept at −80°C refrigerator (Thermo Forma, USA) for preservation before use. Trizol (Invitrogen, CA, USA) was used to isolate total RNA following the manufacturer's protocols.

Construction of cDNA library and 454 sequencing

Approximately 10-µg total RNA was delivered to Shanghai Majorbio Bio-pharm Biotechnology Co., Ltd. (Shanghai, China) for the construction of a cDNA library. RNA quality was assessed by 260/280 and 260/230 ratios with the Agilent 2100 Bioanalyzer. The SMART cDNA library construction kit (Clontech, Mountain View, CA, USA) was used to construct the cDNA library of the muscle and ovary tissues following the manufacturer's protocol step by step. cDNA was sheared by nebulization and DNA bands (500–800 bp) were extracted from gel after agarose gel electrophoresis. The obtained DNA was purified, blunt ended, ligated to adapters and finally small fragments were removed. The quality control of a double DNA library was performed using High Sensitivity Chip (Agilent Technologies). The concentration was examined by TBS 380 Fluorometer. One-plate whole run sequencing was performed on the GS FLX Titanium chemistry (Roche Diagnostics, Indianapolis, IN, USA) by Shanghai Majorbio Bio-pharm Biotechnology Co., Ltd. following the manufacturer's protocol.

Bioinformatic analysis

Reads trimming and assemble

For each of the sequencing reads, low-quality bases and the sequencing adapter were trimmed using LUCY and SeqClean. The remained 454 reads of the ovary and back-leg muscle were first assembled using the Newbler software using default parameters. The combined reads of the ovary and back-leg muscle were also assembled using the Newbler software.

Assemble of EST

To improve the assembler quality, we collected the ESTs of S. scrofa from PigEST database (http://pigest.ku.dk/download/index.html), which included 398 837 Pub EST sequences and 823 871 Sino-Danish S. scrofa EST sequences. All the ESTs from Roche 454 and PigEST databases were used to run the final assemble of S. scrofa. We chosen the trans-Abyss software for the final assemble, which assembled the ESTs through combining the results of different Kmer parameters. The unigenes with >100 bp in length were used for the subsequent analysis. Moreover, all ESTs were mapped to the S. scrofa draft genome version 9 (download from Ensembl ftp server http://asia.ensembl.org/info/data/ftp/index.html). ESTs were considered to be mapped successfully if it had over 95% identities to the corresponding genome sequences.

Transcriptome annotation

The unigenes were compared with the protein non-redundant database using BlastX[19] with E-value < 1.0 × 10−5 (E-values <1.0 × 10−5 were considered as a significant level). Gene ontology (GO) terms[20] were extracted from the best hits obtained from the BlastX against the nr database (E-value ≤ 1.0 × 10−6) using blast2go and then were sorted for the GO categories using in-house perl scripts. The metabolic pathway was performed using Kyoto Encyclopaedia of genes and genomes.

Expression analysis

EST reads from the ovary and back-leg muscle were mapped to a unigene sequence using the SSAHA software, respectively. The expression of each unigene was calculated using the numbers of reads with a specific match. Genes with different expression were identified using R package DGEseq.[21]

Bioinformatic mining of microsatellites and SNP makers

The unigene sequences were screened for microsatellites using software MISA (MicroSAtellite, http://pgrc.ipk-gatersleben.de/misa/). All the 454 reads were mapped to the unigenes using SSAHA. The SNPs were extracted using VarScan with the default parameter only when both alleles were detected from 454 reads. The released S. scrofa genome (download from Ensembl ftp server http://asia.ensembl.org/info/data/ftp/index.html) was used to confirm and locate the SNPs. Only those SNPs that both specifically match a certain genomic region and have a minor allelic frequency no less than 20% are included in analysis.

Data deposition

The Roche 454 reads of S. scrofa by this study are now available from Animal Genomics databases by National Animal Genome Research Program (NAGRP, USA) with URL of http://www.animalgenome.org/repository/pub/SCAU2011.0502/.

Results

454 sequencing and assembly

In this study, we constructed two cDNA libraries and subsequently obtained two sets of transcriptomic reads for the back-leg muscle and ovary each. The schematic of 454 EST analyses is showed by Fig. 1. For the back-leg muscle, Roche 454 sequencing yielded a total of 558 743 reads with a total nucleotide size of 219 021 745 bp, giving rise to average 392 bp per read (Table 1; Supplementary Fig. S1). For the ovary, meanwhile, we obtained 528 260 reads for 202 745 596 bp in total, giving rise to average 384 bp per read (Table 1; Supplementary Fig. S1).
Figure 1.

Schematic of 454 EST analyses. The steps include 454 sequencing, assembly of reads into contigs and isogroups, GO annotation, KEGG analysis and discovery of SNPs and SSRs.

Table 1.

Draft sequence data by 454 sequencing

TypesMuscle (RL4)Ovary (RL10)Combined tissues
Number of reads558 743528 2601 087 003
Total residues (bp)219 021 745202 745 596421 767 341
Smallest (bp)272323
Largest (bp)772813813
Average length (bp)392384388
Draft sequence data by 454 sequencing Schematic of 454 EST analyses. The steps include 454 sequencing, assembly of reads into contigs and isogroups, GO annotation, KEGG analysis and discovery of SNPs and SSRs. By assemble analysis, we obtained 71 327 ESTs (11 057 contigs averaging 787 bp and 60 270 singletons) for the back-leg muscle and 82 396 ESTs (12 204 contigs averaging 780 bp and 70 192 singletons) for the ovary, as well as 121 299 ESTs (18 938 contigs averaging 810 bp and 102 361 singletons) for combined tissues (Table 2; Supplementary Fig. S2). Most of these contigs distributed in the 401–1500-bp region, and about half of them distributed in 401–700 bp for each of the back-leg muscle (51.35%), ovary (51.79%) and combined tissue (49.62%) (Table 3; Supplementary Fig. S2).
Table 2.

Summary on assemble analysis

TypesMuscle (RL4)Ovary (RL10)Combined tissues
Num of contigs11 05712 20418 938
Smallest (bp)424442
Largest (bp)354034624218
Total length (bp)8 703 6459 517 25515 332 944
Average length (bp)787780810
Num of isogroups949610 44015 825
Num of isogroups (contigs ≥ 2)6627191267
Singleton60 27070 192102 361
Total71 32782 396121 299
Table 3.

Statistics of contigs by 454 sequencing

LengthMuscle
Ovary
Combined tissues
NumbersPer cent (%)NumbersPer cent (%)NumbersPer cent (%)
1–100120.11110.09120.06
101–4002252.033092.534112.17
401–700567851.35632051.79939749.62
701–1000281325.44305024.99476025.13
1001–1500175615.88192815.80318716.83
1501–20004494.064743.888644.56
>20001241.121120.923071.62
Total11 05710012 20410018 938100
Summary on assemble analysis Statistics of contigs by 454 sequencing Contigs were further assembled to 9496 isogroups averaging 662 bp for the back-leg muscle, 10 440 isogroups averaging 719 bp for the ovary and 15 825 isogroups averaging 1267 bp for combined tissues (Table 2). In average, each isogroup has 1.16, 1.17 and 1.20 contigs for the back-leg muscle (11 057 contigs for 9496 isogroups), ovary (12 204 contigs for 10 440 isogroups) and combined tissues (18 938 contigs for 15 825 isogroups), respectively. As far as all contigs were concerned, the concentration of A, T, C and G was 29.0, 28.7, 21.1 and 21.2%, respectively, giving rise to the overall GC content of 42.3% for S. scrofa transcriptome. By comparison, our original 454 sequence with the S. scrofa genome, 246 628 ESTs could match the genome, and among them, 216 739 specifically locate on one region and 29 889 locate on two or more regions. As far as different chromosomes are compared, chromosome 1 (23 237 or 10.72%), 4 (17 087 or 7.88%), 2 (16 957 or 7.82%), 14 (16 724 or 7.72%) and 13 (15 738 or 7.26%) contained more transcripts than others, whereas X (4495 or 2.07%) and mitochondrial DNA (mtDNA; 1665 or 0.77%) had very fewer EST (Supplementary Fig. S3).

Alternative splicing

More contigs than isogroups are found because some contigs or called isocontigs are attributed to the same isogroups due to alternative splicing (Tables 2 and 4). There are 7.0% (662 of 9496), 6.9% (719 of 10 440) and 8.0% (1267 of 15 825) isogroups have no less than two contigs in the back-leg muscle, ovary and combined tissues, respectively. The alternative-spliced isogroups in the back-leg muscle, ovary and combined tissues averagely have 3.4, 3.5, and 3.5 isocontigs, respectively (Table 4).
Table 4.

Variant transcripts by assemble analysis

No.aNumbers of isogroups (contigs)
MuscleOvaryCombined tissues
2452 (904)498 (996)857 (1714)
371 (142)76 (152)143 (286)
464 (128)58 (116)115 (230)
520 (40)12 (24)29 (58)
618 (36)20 (40)29 (58)
76 (12)8 (16)17 (34)
85 (10)10 (20)15 (30)
94 (8)8 (72)14 (28)
≥1022 (529)29 (644)48 (1106)
In total (≥2)662 (2228)719 (2488)1267 (4393)

aNumbers of contigs per isogroup.

Variant transcripts by assemble analysis aNumbers of contigs per isogroup.

GO assignments

A total of 56 589 transcripts of S. scrofa were assigned for GO analysis based on matches with sequences whose functions were known previously. Among these transcripts, 45 846 transcripts were successfully annotated with confident matches. As many as 38 563 transcripts are involved in biological process, including cellular process (34 217 transcripts with percentages of 16.98%), metabolic process (29 471; 14.62%), biological regulation (16 944; 8.41%), regulation of biological process (16 002; 7.94%), multicellular organismal process (10 148; 5.04%), response to stimulus (10 023; 4.97%), localization (9680; 4.80%), cellular component organization or biogenesis (9276; 4.60%), developmental process (8610; 4.27%), establishment of localization (8393; 4.16%), signalling (7083; 3.51%), positive regulation of biological process (6576; 3.26%), negative regulation of biological process (6487; 3.22%) and signalling process (5517; 2.74%), as well as other activities (23 100; 11.46%) (Fig. 2A).
Figure 2.

Functional classification of S. scrofa transcriptome. (A) GO: Biological process. (B) Cellular component. (C) GO: Molecular function. In some cases, one transcript or gene has multiple functions.

Functional classification of S. scrofa transcriptome. (A) GO: Biological process. (B) Cellular component. (C) GO: Molecular function. In some cases, one transcript or gene has multiple functions. Moreover, 35 787 transcripts are subject to a cellular component and could be divided into cell (33 486; 24.06%), cell part (33 483; 24.06%), organelle (25 767; 18.52%), organelle part (16 894; 12.14%), macromolecular complex (13 151; 9.45%), membrane-enclosed lumen (6469; 4.65%), extracellular region (5378; 3.86%), extracellular region part (3457; 2.48%) and others (1082; 0.78%) (Fig. 2B). GO analysis also showed that 40 916 transcripts had potential molecular function, such as binding (36 550; 48.15%), catalytic activity (21 865; 28.8%), structural molecule activity (4385; 5.78%), transporter activity (3188; 4.2%), molecular transducer activity (2379; 3.13%), enzyme regulator activity (2337; 3.08%), transcription regulator activity (2224; 2.93%), nucleic acid binding transcription factor activity (1317; 1.74%), electron carrier activity (948; 1.25%) and others (714; 0.94%) (Fig. 2C).

Metabolic pathways by KEGG analysis

A total of 4268 transcripts are involved in 132 predicted KEGG metabolic pathways, and the numbers of transcripts in different pathways ranged from 1 to 1352. The top 20 pathways with EST numbers are shown in Table 5, and the highest of the number of transcripts is involved in the biosynthesis of secondary metabolites. Ten biosynthesis pathways included biosynthesis of alkaloids derived from histidine and purine (417), biosynthesis of alkaloids derived from ornithine, lysine and nicotinic acid (368), biosynthesis of alkaloids derived from the shikimate pathway (380), biosynthesis of alkaloids derived from terpenoid and polyketide (399), biosynthesis of secondary metabolites (1352), biosynthesis of plant hormones (625), biosynthesis of phenylpropanoids (466), biosynthesis of terpenoids and steroids (401) and biosynthesis of unsaturated fatty acids (104), as well as biosynthesis of ansamycins (7).
Table 5.

The top 20 pathways with the highest EST numbers

No.PathwaysNumber of ESTs
1Biosynthesis of secondary metabolites1352
2Oxidative phosphorylation1111
3Microbial metabolism in diverse environments964
4Purine metabolism690
5Biosynthesis of plant hormones625
6Biosynthesis of phenylpropanoids466
7Biosynthesis of alkaloids derived from histidine and purine417
8Pyrimidine metabolism408
9Biosynthesis of terpenoids and steroids401
10Biosynthesis of alkaloids derived from terpenoid and polyketide399
11Methane metabolism399
12Biosynthesis of alkaloids derived from shikimate pathway380
13Glycolysis/gluconeogenesis379
14Biosynthesis of alkaloids derived from ornithine, lysine and nicotinic acid368
15Glutathione metabolism318
16Pyruvate metabolism268
17Arginine and proline metabolism261
18Glycerophospholipid metabolism222
19Fatty acid metabolism221
20Valine, leucine and isoleucine degradation218
The top 20 pathways with the highest EST numbers

Tissue-specific analysis for differentially expressed genes

Comparison of gene expression by DEGseq showed that a total of 336 genes expressed in the back-leg muscle with a significantly higher level than that of the ovary. These genes are involved in biological process (160 genes), cellular components (96) and molecular function (80) (Supplementary Fig. S4). In addition, another 553 genes significantly expressed in the ovary rather than the back-leg muscle. These genes are involved in biological process (295 genes), cellular components (130) and molecular function (128) (Supplementary Fig. S5).

Single-nucleotide polymorphisms

By excluding those that either could not specifically match the S. scrofa genome or had minor allele frequencies lower than 20%, we totally obtained 24 214 SNPs which comprised various substitutions of A-G (8484), C-T (8697), A-C (1635), A-T (1991), C-G (1325) and G-T (2082). The ratio of transitions (17 181) to transversions (7033) is ∼2.44. Except for 125 SNPs in mtDNA, all others (24 089) are in nucleic DNA including 1–18 autosomes (23 481) and X chromosome (608). The distribution of SNPs in each chromosome is described in Table 6.
Table 6.

Distribution of SNPs in the S. scrofa genome

ChromosomesaCounts
12358
21901
31467
41693
51133
61492
71922
81260
91415
10968
11533
121073
131418
142008
151086
16580
17680
18494
X608
MT125
Total24 214

aChromosomes 1–18 and X indicate 18 autosomes and sex chromosome each, whereas MT indicates mtDNA.

Distribution of SNPs in the S. scrofa genome aChromosomes 1–18 and X indicate 18 autosomes and sex chromosome each, whereas MT indicates mtDNA.

Simple sequence repeats or microsatellites

We obtained 11 928 simple sequence repeats (SSRs), of which 45.72% were di-nucleotide repeats (5453), followed by 36.06% tri-nucleotide repeats (4301) and 14.85% tetra-nucleotide repeats (1771), as well as 3.38% penta-nucleotide repeats (403) (Table 7). There are six types of di-nucleotide repeats, and among them (GT), (AC) and (AT) are three predominant types with frequencies of 28.5, 27.6 and 23.12%, respectively (Supplementary Table S1). Among tri-nucleotide repeats, the frequencies of 20 SSR types seem to vary moderately from 0.19 to 17.11%, and the most common repeats are (GTT) (17.11%) and (ACC) (10.14%) (Supplementary Table S2). As many as 45 SSRs present for tetra-nucleotide repeats, and five of them [15.70% for (ATTT), 14.85% for (AAAT), 14.00% for (GTTT), 12.70% for (CTTT) and 12.37% for (AAAC)] are major ones with frequencies over 10% (Supplementary Table S3). Among 21 penta-nucleotide repeats types, (GTTTT) (34.99%) and (AAAAC) (20.60%) are two predominant types, followed by 10.42% for (ATTTT), and the rest less than 10% (Supplementary Table S4).
Table 7.

Summary on microsatellite loci in S. scrofa transcriptome

Number of repeatsDi-nucleotide repeatsTri-nucleotide repeatsTetra-nucleotide repeatsPenta-nucleotide repeats
425111134303
5100438674
6185843411818
7986189316
85928020
938423261
102572217
112021691
12187116
13184613
14131211
151151
16100
1793
1849
19632
2054
2148
2241
2329
2418
2518
269
2710
2814
2911
Total545343011771403
Summary on microsatellite loci in S. scrofa transcriptome

Discussion

The obtained transcriptomic sequences by this study are useful for us to understand the genetic makeup of S. scrofa whole transcriptome, and to our knowledge, it is very limited until now. Even though the draft S. scrofa genome was released on November 2009, very fewer articles could be found regarding the S. scrofa genome or trancriptome based on large-scale sequence data. The 454 pyrosequencing yield 1 087 003 reads in total, including 558 743 for the back-leg muscle and 528 260 for the ovary and 421 Mb nucleotide residues for S. scrofa transcriptome by this study. A recent research obtained 1 253 361 454 sequences for the skeletal muscle (701 695) and heart (551 666) and showed the reproducibility within 454 sequencing and cDNA microarray; however, further analyses of S. scrofa transcriptome with 454 sequences were not reported thereby.[13] Most transcripts (78.7%) identified by this study could match the released S. scrofa genome, and they scattered on chromosome 1–18, X and mtDNA. The bigger chromosomes seem to contain more transcripts compared with others. The obtained overall GC content of S. scrofa transcriptome was 42.3%, which was lower than the reported GC content in 5′UTR (59.2%), coding (49.6%), but a little bit higher than 3′UTR (41.8%) in the S. scrofa genome.[4] It seemed that more 3′UTR than other regions (coding and 5′UTR) were included in analyses, as far as only assembled contigs rather than singletons were concerned. The genome-wide average GC content of the human genome is 41%, varying from different chromosomes and regions.[22] It is known that the mouse has a slightly higher overall GC content (42%), but the distribution is tighter.[23] Over 1 million 454 reads and 121 299 ESTs by this study are useful resource for further research on S. scrofa functional genomics. Both gene annotation and pathway analyses are helpful for us to predict potential genes and their functions at a whole-transcriptome level. In S. scrofa transcriptome, as discovered by this study, the predominant gene clusters are involved in the cellular process and metabolic process of biological process, the binding and catalytic activity of molecular function, as well as the cell, cell part and organelle of a cellular component. Similar results are found in European eel,[24] rainbow trout[15] and Red bugs.[14] Whereas, in Chickpea transcriptome, genes are predominantly involved in the protein metabolism of biological process and the transferase activity of molecular function, as well as the chloroplast of cellular component, which indicated notable differences between animals and plants.[25] In addition, we also predicted overall 4268 ESTs (or transcripts) that are involved in 132 predicted KEGG metabolic pathways, and two major pathways (biosynthesis of secondary metabolites and oxidative phosphorylation) comprised over 1000 ESTs. The predicted pathways altogether with gene annotation are useful for further investigation on gene function in future. The differentially expressed genes in the back-leg muscle and ovary tissues are probably related to their metabolism and functions. In this study, we found 336 and 553 genes that were significantly expressed in the back-leg muscle and ovary, and they were involved in biological process, cellular components and molecular function by further gene annotation. A recent study identified a total of 306 differently expressed genes between muscle and heart tissue based on 1 253 361 Roche 454 reads and confirmed most genes by a microarray approach.[13] In this study, more differently expressed genes were found between back-leg muscle and ovary based on 1 087 003 reads. As far as different tissues are used by both studies, these differently expressed genes still require for further confirmation. As the muscle and ovary are crucial tissues for growth and reproduction, the identified genes with different expression are the potential candidate for growth and reproduction traits of S. scrofa. Plenty expression SNPs (eSNPs) as identified by this study are valuable molecular markers for further research on S. scrofa. A total of 24 214 SNPs (minor allele frequencies ≥0.2) were found to specifically match the S. scrofa genome, in which 608 and 125 are in X chromosome and mtDNA, and the rest (23 481) in 18 autosomes. In general, these SNPs should be reliable eSNPs and act as candidate markers for identification of eQTL. In the genome level, a total of 98 151 SNPs were predicted based on 4.8 million whole-genome shotgun sequences.[5] Another study, moreover, discovered as many as 372 886 SNPs by sequencing with Illumina's Genome Analyzer (GA), and 62 621 loci of them were used to design the Illumina Porcine 60K + SNP iSelect Beadchip.[6] This SNP chip is useful for identification of candidate genes or QTLs underling quantitative traits such as body composition.[7] SSRs, or microsatellite, are neutral molecular markers that wildly distribute in a genome. It was formerly proved that SSRs comprise 3% of the human genome, with the greatest contribution from di-nucleotide repeats (0.5%).[22] In this study, 45.72% of 11 928 SSRs are di-nucleotide repeats, follows by tri-nucleotide repeats (36.06%) and tetra-nucleotide repeats (14.85%), as well as penta-nucleotide repeats (3.38%). In addition to (AC) and (AT) of di-nucleotide repeats, and (AAT) and (AAC) of tri-nucleotide, (GT) and (GTT) also have high frequencies as indicated by this study, which is different from that of the human genome.[22] It is probably because (GT) and (GTT) are equal to (AC) and (ACC) each, since the reverse strands for some ESTs are used in de novo analysis. In fact, slight differences are found for SSRs among human, mouse and dog, as well as S. scrofa.[23] In conclusion, we have demonstrated the muscle and ovary transcriptome of S. scrofa by the use of high-throughout 454 pyrosequencing. Our study obtained a set of 121 299 transcripts or ESTs and demonstrated some important features of S. scrofa transcriptome, such as GC content, gene annotation and pathways across whole transcriptome. In addition, we identified reliable markers of 24 214 SNPs and 11 928 SSRs. This study is helpful for understanding the genetic architecture of S. scrofa transcriptome and provides useful resource and markers for functional genomic research in future.

Supplementary data

Supplementary data are available at www.dnaresearch.oxfordjournals.org.

Funding

This research was supported by the National Major Special Projects on New Varieties Cultivation for Transgenic Organisms (2008ZX08006-005 and 2009ZX08009-145B) and the Important Projects in Key Fields in Guangdong and Hongkong, 2008 (2008A02).
  25 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Initial sequencing and comparative analysis of the mouse genome.

Authors:  Robert H Waterston; Kerstin Lindblad-Toh; Ewan Birney; Jane Rogers; Josep F Abril; Pankaj Agarwal; Richa Agarwala; Rachel Ainscough; Marina Alexandersson; Peter An; Stylianos E Antonarakis; John Attwood; Robert Baertsch; Jonathon Bailey; Karen Barlow; Stephan Beck; Eric Berry; Bruce Birren; Toby Bloom; Peer Bork; Marc Botcherby; Nicolas Bray; Michael R Brent; Daniel G Brown; Stephen D Brown; Carol Bult; John Burton; Jonathan Butler; Robert D Campbell; Piero Carninci; Simon Cawley; Francesca Chiaromonte; Asif T Chinwalla; Deanna M Church; Michele Clamp; Christopher Clee; Francis S Collins; Lisa L Cook; Richard R Copley; Alan Coulson; Olivier Couronne; James Cuff; Val Curwen; Tim Cutts; Mark Daly; Robert David; Joy Davies; Kimberly D Delehaunty; Justin Deri; Emmanouil T Dermitzakis; Colin Dewey; Nicholas J Dickens; Mark Diekhans; Sheila Dodge; Inna Dubchak; Diane M Dunn; Sean R Eddy; Laura Elnitski; Richard D Emes; Pallavi Eswara; Eduardo Eyras; Adam Felsenfeld; Ginger A Fewell; Paul Flicek; Karen Foley; Wayne N Frankel; Lucinda A Fulton; Robert S Fulton; Terrence S Furey; Diane Gage; Richard A Gibbs; Gustavo Glusman; Sante Gnerre; Nick Goldman; Leo Goodstadt; Darren Grafham; Tina A Graves; Eric D Green; Simon Gregory; Roderic Guigó; Mark Guyer; Ross C Hardison; David Haussler; Yoshihide Hayashizaki; LaDeana W Hillier; Angela Hinrichs; Wratko Hlavina; Timothy Holzer; Fan Hsu; Axin Hua; Tim Hubbard; Adrienne Hunt; Ian Jackson; David B Jaffe; L Steven Johnson; Matthew Jones; Thomas A Jones; Ann Joy; Michael Kamal; Elinor K Karlsson; Donna Karolchik; Arkadiusz Kasprzyk; Jun Kawai; Evan Keibler; Cristyn Kells; W James Kent; Andrew Kirby; Diana L Kolbe; Ian Korf; Raju S Kucherlapati; Edward J Kulbokas; David Kulp; Tom Landers; J P Leger; Steven Leonard; Ivica Letunic; Rosie Levine; Jia Li; Ming Li; Christine Lloyd; Susan Lucas; Bin Ma; Donna R Maglott; Elaine R Mardis; Lucy Matthews; Evan Mauceli; John H Mayer; Megan McCarthy; W Richard McCombie; Stuart McLaren; Kirsten McLay; John D McPherson; Jim Meldrim; Beverley Meredith; Jill P Mesirov; Webb Miller; Tracie L Miner; Emmanuel Mongin; Kate T Montgomery; Michael Morgan; Richard Mott; James C Mullikin; Donna M Muzny; William E Nash; Joanne O Nelson; Michael N Nhan; Robert Nicol; Zemin Ning; Chad Nusbaum; Michael J O'Connor; Yasushi Okazaki; Karen Oliver; Emma Overton-Larty; Lior Pachter; Genís Parra; Kymberlie H Pepin; Jane Peterson; Pavel Pevzner; Robert Plumb; Craig S Pohl; Alex Poliakov; Tracy C Ponce; Chris P Ponting; Simon Potter; Michael Quail; Alexandre Reymond; Bruce A Roe; Krishna M Roskin; Edward M Rubin; Alistair G Rust; Ralph Santos; Victor Sapojnikov; Brian Schultz; Jörg Schultz; Matthias S Schwartz; Scott Schwartz; Carol Scott; Steven Seaman; Steve Searle; Ted Sharpe; Andrew Sheridan; Ratna Shownkeen; Sarah Sims; Jonathan B Singer; Guy Slater; Arian Smit; Douglas R Smith; Brian Spencer; Arne Stabenau; Nicole Stange-Thomann; Charles Sugnet; Mikita Suyama; Glenn Tesler; Johanna Thompson; David Torrents; Evanne Trevaskis; John Tromp; Catherine Ucla; Abel Ureta-Vidal; Jade P Vinson; Andrew C Von Niederhausern; Claire M Wade; Melanie Wall; Ryan J Weber; Robert B Weiss; Michael C Wendl; Anthony P West; Kris Wetterstrand; Raymond Wheeler; Simon Whelan; Jamey Wierzbowski; David Willey; Sophie Williams; Richard K Wilson; Eitan Winter; Kim C Worley; Dudley Wyman; Shan Yang; Shiaw-Pyng Yang; Evgeny M Zdobnov; Michael C Zody; Eric S Lander
Journal:  Nature       Date:  2002-12-05       Impact factor: 49.962

Review 3.  Domestic-animal genomics: deciphering the genetics of complex traits.

Authors:  Leif Andersson; Michel Georges
Journal:  Nat Rev Genet       Date:  2004-03       Impact factor: 53.242

4.  DEGseq: an R package for identifying differentially expressed genes from RNA-seq data.

Authors:  Likun Wang; Zhixing Feng; Xi Wang; Xiaowo Wang; Xuegong Zhang
Journal:  Bioinformatics       Date:  2009-10-24       Impact factor: 6.937

Review 5.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

6.  A regulatory mutation in IGF2 causes a major QTL effect on muscle growth in the pig.

Authors:  Anne-Sophie Van Laere; Minh Nguyen; Martin Braunschweig; Carine Nezer; Catherine Collette; Laurence Moreau; Alan L Archibald; Chris S Haley; Nadine Buys; Michael Tally; Göran Andersson; Michel Georges; Leif Andersson
Journal:  Nature       Date:  2003-10-23       Impact factor: 49.962

7.  Identification of a mutation in porcine ryanodine receptor associated with malignant hyperthermia.

Authors:  J Fujii; K Otsu; F Zorzato; S de Leon; V K Khanna; J E Weiler; P J O'Brien; D H MacLennan
Journal:  Science       Date:  1991-07-26       Impact factor: 47.728

8.  Transcriptome sequencing of malignant pleural mesothelioma tumors.

Authors:  David J Sugarbaker; William G Richards; Gavin J Gordon; Lingsheng Dong; Assunta De Rienzo; Gautam Maulik; Jonathan N Glickman; Lucian R Chirieac; Mor-Li Hartman; Bruce E Taillon; Lei Du; Pascal Bouffard; Stephen F Kingsmore; Neil A Miller; Andrew D Farmer; Roderick V Jensen; Steven R Gullans; Raphael Bueno
Journal:  Proc Natl Acad Sci U S A       Date:  2008-02-26       Impact factor: 11.205

9.  Pigs in sequence space: a 0.66X coverage pig genome survey based on shotgun sequencing.

Authors:  Rasmus Wernersson; Mikkel H Schierup; Frank G Jørgensen; Jan Gorodkin; Frank Panitz; Hans-Henrik Staerfeldt; Ole F Christensen; Thomas Mailund; Henrik Hornshøj; Ami Klein; Jun Wang; Bin Liu; Songnian Hu; Wei Dong; Wei Li; Gane K S Wong; Jun Yu; Jian Wang; Christian Bendixen; Merete Fredholm; Søren Brunak; Huanming Yang; Lars Bolund
Journal:  BMC Genomics       Date:  2005-05-10       Impact factor: 3.969

10.  Design of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology.

Authors:  Antonio M Ramos; Richard P M A Crooijmans; Nabeel A Affara; Andreia J Amaral; Alan L Archibald; Jonathan E Beever; Christian Bendixen; Carol Churcher; Richard Clark; Patrick Dehais; Mark S Hansen; Jakob Hedegaard; Zhi-Liang Hu; Hindrik H Kerstens; Andy S Law; Hendrik-Jan Megens; Denis Milan; Danny J Nonneman; Gary A Rohrer; Max F Rothschild; Tim P L Smith; Robert D Schnabel; Curt P Van Tassell; Jeremy F Taylor; Ralph T Wiedmann; Lawrence B Schook; Martien A M Groenen
Journal:  PLoS One       Date:  2009-08-05       Impact factor: 3.240

View more
  14 in total

1.  Identification of glutathione S-transferase genes responding to pathogen infestation in Populus tomentosa.

Authors:  Weihua Liao; Lexiang Ji; Jia Wang; Zhong Chen; Meixia Ye; Huandi Ma; Xinmin An
Journal:  Funct Integr Genomics       Date:  2014-05-29       Impact factor: 3.410

2.  Transcriptome analysis of silver carp (Hypophthalmichthys molitrix) by paired-end RNA sequencing.

Authors:  Beide Fu; Shunping He
Journal:  DNA Res       Date:  2012-01-24       Impact factor: 4.458

3.  Characterization and comparative analyses of muscle transcriptomes in Dorper and small-tailed Han sheep using RNA-Seq technique.

Authors:  Chunlan Zhang; Guizhi Wang; Jianmin Wang; Zhibin Ji; Zhaohuan Liu; Xiushuang Pi; Cunxian Chen
Journal:  PLoS One       Date:  2013-08-30       Impact factor: 3.240

4.  Histological and transcriptome analyses of testes from Duroc and Meishan boars.

Authors:  Haisheng Ding; Yan Luo; Min Liu; Jingshu Huang; Dequan Xu
Journal:  Sci Rep       Date:  2016-02-11       Impact factor: 4.379

5.  Elucidating a molecular mechanism that the deterioration of porcine meat quality responds to increased cortisol based on transcriptome sequencing.

Authors:  Xuebin Wan; Dan Wang; Qi Xiong; Hong Xiang; Huanan Li; Hongshuai Wang; Zezhang Liu; Hongdan Niu; Jian Peng; Siwen Jiang; Jin Chai
Journal:  Sci Rep       Date:  2016-11-11       Impact factor: 4.379

6.  Transcriptome profiling identifies differentially expressed genes in postnatal developing pituitary gland of miniature pig.

Authors:  Lei Shan; Qi Wu; Yuli Li; Haitao Shang; Kenan Guo; Jiayan Wu; Hong Wei; Jianguo Zhao; Jun Yu; Meng-Hua Li
Journal:  DNA Res       Date:  2013-11-26       Impact factor: 4.458

7.  Comprehensive transcriptome profiling and functional analysis of the frog (Bombina maxima) immune system.

Authors:  Feng Zhao; Chao Yan; Xuan Wang; Yang Yang; Guangyin Wang; Wenhui Lee; Yang Xiang; Yun Zhang
Journal:  DNA Res       Date:  2013-08-13       Impact factor: 4.458

8.  Comparative transcriptome analysis of tomato (Solanum lycopersicum) in response to exogenous abscisic acid.

Authors:  Yan Wang; Xiang Tao; Xiao-Mei Tang; Liang Xiao; Jiao-Long Sun; Xue-Feng Yan; Dan Li; Hong-Yuan Deng; Xin-Rong Ma
Journal:  BMC Genomics       Date:  2013-12-01       Impact factor: 3.969

9.  Comparative transcriptome analysis of grapevine in response to copper stress.

Authors:  Xiangpeng Leng; Haifeng Jia; Xin Sun; Lingfei Shangguan; Qian Mu; Baoju Wang; Jinggui Fang
Journal:  Sci Rep       Date:  2015-12-17       Impact factor: 4.379

10.  De Novo Transcriptome Assembly of the Chinese Swamp Buffalo by RNA Sequencing and SSR Marker Discovery.

Authors:  Tingxian Deng; Chunying Pang; Xingrong Lu; Peng Zhu; Anqin Duan; Zhengzhun Tan; Jian Huang; Hui Li; Mingtan Chen; Xianwei Liang
Journal:  PLoS One       Date:  2016-01-14       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.