| Literature DB >> 17407547 |
Jan Gorodkin1, Susanna Cirera, Jakob Hedegaard, Michael J Gilchrist, Frank Panitz, Claus Jørgensen, Karsten Scheibye-Knudsen, Troels Arvin, Steen Lumholdt, Milena Sawera, Trine Green, Bente J Nielsen, Jakob H Havgaard, Carina Rosenkilde, Jun Wang, Heng Li, Ruiqiang Li, Bin Liu, Songnian Hu, Wei Dong, Wei Li, Jun Yu, Jian Wang, Hans-Henrik Staefeldt, Rasmus Wernersson, Lone B Madsen, Bo Thomsen, Henrik Hornshøj, Zhan Bujie, Xuegang Wang, Xuefei Wang, Lars Bolund, Søren Brunak, Huanming Yang, Christian Bendixen, Merete Fredholm.
Abstract
BACKGROUND: Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17407547 PMCID: PMC1895994 DOI: 10.1186/gb-2007-8-4-r45
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
cDNA libraries
| Lib name | Tissue (Animals) | Description | Reads |
| Amn | Amnion (S) | - | 2,394 |
| Aor | Aorta (M) | - | 5,121 |
| Bla | Bladder (M) | - | 8,042 |
| Nbma | Bone marrow (S) | 115 days, bone marrow | 10,068 |
| Cbe | Brain (M) | Cerebellum | 4,180 |
| Cbrb | Brain (B) | Brain (cortex) | 7,814 |
| Fco | Brain (M) | Frontal cortex | 6,361 |
| Hyp | Brain (S) | Hypothalamus | 7,001 |
| Pgl | Brain (M) | Pituitary gland | 8,440 |
| Ecca | Brain (S) | F 50 days, cortex | 8,693 |
| Ecea | Brain (S) | F 50 days, cerebellum | 4,361 |
| Fcea | Brain (S) | F 100, cerebellum | 3,682 |
| Fcca | Brain (S) | F 107, cortex cerebri | 5,056 |
| Fhia | Brain (S) | F 107 Hippocampus | 5,897 |
| Cblb | Haemopoetic (B) | Blood | 8,711 |
| Jca | Cartillage (S) | Joint capsule | 8,775 |
| Ncaa | Cartillage (S) | 115 days, cartilage | 7,306 |
| Panb | Endocrine glands (M) | Pancreas | 4,238 |
| Ret | Eye (M) | Retina | 7,768 |
| Eyea | Eye (S) | F 50, eye | 5,865 |
| Fat | Fat (M) | Fat | 6,783 |
| Che | Heart (B) | - | 7,336 |
| Hea | Heart (M) | - | 4,890 |
| Hlv | Heart (S) | Left ventricle | 7,181 |
| Cje | Intestine (B) | Jejunum | 6,052 |
| Col | Intestine (S) | Large intest, colon asc. | 5,128 |
| Duo | Intestine (S) | Small intest, duodenum | 5,787 |
| Ill | Intestine (S) | Small intest, illeum | 5,695 |
| Jej | Intestine (S) | Small intest, jejunum | 10,109 |
| Lin | Intestine (M) | Large intestine | 6,868 |
| Sin | Intestine (M) | Small intestine | 5,716 |
| Ejea | Intestine (S) | F 50, Jejunum | 10,118 |
| Ncoa | Intestine (S) | 115 days, colon | 6,183 |
| Njea | Intestine (S) | 115 days, jejunum | 6,027 |
| Cki | Kidney (B) | - | 6,052 |
| Kid | Kidney (M) | - | 7,708 |
| Cli | Liver (B) | - | 6,544 |
| Liv | Liver (M) | - | 6,836 |
| Elia | Liver (S) | F 50, liver | 6,587 |
| Flia | Liver (S) | F 100, liver | 4,929 |
| Clu | Lung (B) | - | 8,358 |
| Lunc | Lung (M) | - | 6,645 |
| Elua | Lung (S) | F 50 days, lung | 2,595 |
| Nlua | Lung (S) | 115 days, lung | 5,217 |
| Cly | Lymphatic gland (B) | - | 8,289 |
| Lyg | Lymphatic gland (M) | - | 7,513 |
| Lnt | Lymphatic gland (S) | - | 7,027 |
| Cga | Mammary gland (B) | - | 3,583 |
| Mcp | Mammary gland (S) | Mammae, collostrum prod | 5,860 |
| Mga | Mammary gland (M) | 7 days after weaning | 6,242 |
| Mgmb | Mammary gland (M) | 14 days after birth | 5,545 |
| Mgp | Mammary gland (M) | 7 days pre-birth | 4,335 |
| Med | Mediastinum (S) | - | 8,602 |
| Bfe | Muscles (M) | M. biceps femoris | 6,673 |
| Ctlb | Muscles (B) | Tenderloin | 6,533 |
| Isp | Muscles (M) | M. infraspinatus | 6,650 |
| Ldo | Muscles (M) | M. longissimus dorsi | 10,309 |
| Mas | Muscles (S) | M. masseter | 4,755 |
| Sme | Muscles (M) | M. semimembranosus | 3,274 |
| Ssp | Muscles (M) | M. supraspinatus | 7,379 |
| Ste | Muscles (M) | M. semitendinosus | 7,396 |
| Tbr | Muscles (M) | M. triceps brachii | 6,486 |
| Vin | Muscles (M) | M. vastus intermedius | 3,007 |
| Esea | Muscles (S) | F 50, M. semitendinosus | 7,905 |
| Nmsa | Muscles (S) | 115 days, M. semitendinosus | 4,676 |
| Gul | Oesophagus (M) | - | 5,631 |
| Ova | Ovary (M) | - | 7,744 |
| Cov | Ovary (S) | - | 7,567 |
| Plad | Placenta (M) | - | 7,481 |
| Pro | Prostata (M) | - | 1,953 |
| Rec | Rectum (M) | - | 5,778 |
| Cmu | Rhinal mucosal membrane (B) | - | 5,365 |
| Nmma | Rhinal mucosal membrane (S) | 115 days, mucosal memb. | 7,530 |
| Sag | Salivary gland (M) | - | 5,473 |
| Csk | Skin (B) | - | 7,105 |
| Ski | Skin (M) | - | 6,815 |
| Ton | Skin (S) | Tip of tongue, mucosa | 5,698 |
| Eepa | Skin (S) | F 50, epidermis | 8,159 |
| Erua | Skin (S) | F 50, regium bilicalis | 8,330 |
| Nepa | Skin (S) | 115 days, epidermis | 5,437 |
| Spc | Spinal cord (M) | Spinal cord | 8,821 |
| Ebsa | Spinal cord (S) | F 50 days, brainstem | 8,453 |
| Fbsa | Spinal cord (S) | F 107 brainstem | 5,703 |
| Spl | Spleen (M) | - | 6,984 |
| Csp | Spleen (B) | - | 6,204 |
| Cst | Stomach (B) | - | 7,141 |
| Sto | Stomach (M) | - | 5,561 |
| Sug | Suprarenal glands (M) | - | 7,856 |
| Cag | Suprarenal glands (B) | Adrenal gland | 6,614 |
| Cte | Testicle (B) | - | 3,416 |
| Tes | Testicle (M) | - | 4,812 |
| Cty | Thyroid glands (B) | - | 9,608 |
| Thg | Thyroid glands (M) | - | 7,887 |
| Pty | Thyroid glands (S) | Piglet 2 days, thymus | 7,007 |
| Ftya | Thyroid glands (S) | F 100, thymus | 5,687 |
| Tra | Trachea (M) | - | 8,124 |
| Ute | Uterus (S) | - | 7,531 |
| Cut | Uterus (B) | - | 5,885 |
The generated cDNA libraries, representing 35 tissues. They are here shown as two (overlapping) sets: a physiologic set and a developmental set. The column 'Lib name' gives three letter code for the library. 'Tissue' indicates the overall tissue the library was generated from, where '(Animals)' indicates whether the library was generated from a single (S or B) or multiple (M) animals. Libraries listed with (M) and (S) represent the pig breeds (mostly cross-breeds) used in Danish breeding (Landrace, Yorkshire, Duroc, and Hampshire), whereas the libraries listed with (B) present Chinese pig breeds. 'Description' provides a short description. The column 'Reads' shows the number of reads that went into that library after cleaning. The sum of all 'Reads' corresponds to the number of generated reads that contributed to the assembly, that is the number of reads after cleaning vector, repeats, and so on. Library names beginning with 'C' originates from Chinese pig breeds (except for 'Col' and 'Cbe'), whereas the remaining libraries originated from Danish pig breeds. aDevelopmental tissue. bIgnored in expression analysis (see Materials and methods). cLikely to be heavily contaminated by liver expressed sequence tags. dA normalized library.
Figure 1Distribution of cluster sizes. The number of clusters on the y-axis versus the cluster size (number of reads) on the x-axis exhibit a power law-like region. The distribution marked 'All' indicates the cluster size distribution for the entire dataset, whereas the other distributions are examples from specific libraries: 'Pla' (placenta, normalized) and 'Fcc' (cerebellum F100 days).
Match of contigs and singletons to known databases
| Match level (ID/Sbj) | Contigs | Singletons | ||
| UniProt | NcRNAdb | UniProt | NcRNAdb | |
| M0 (98%/100%) | 1,982 | 21 | 173 | 6 |
| M1 (95%/95%) | 1,304 | 18 | 101 | 12 |
| M2 (85%/90%) | 2,517 | 72 | 236 | 20 |
| M3 (70%/70%) | 3,480 | - | 749 | - |
| M4 (60%/50%) | 3,603 | - | 1,355 | - |
| M5 (20%/20%) | 11,973 | - | 12,337 | - |
The table list the number of hits to given databases with various levels of matching for clusters and singletons. The cutoffs for given match level are indicated in terms of alignment identity (ID) and subject coverage (Sbj) for UniProt and the noncoding RNA databases (ncRNAdb). Only match levels up to M2 (alignment length larger than 30 nucleotides) for ncRNAs are included (counting each contig/singleton only once) and the matches have been cleaned for tRNAs because these appears to be the most frequent RNAs from contamination, such as E. coli. A curated list of ncRNAs for levels M0 and M1 can found in Additional data file 1 (Table S1). Also see text for details. It should be noted that a few conreads match the same UniProt ID. This can be due to phylogenetic decomposition or single reads not being assembled. The total number of contigs was 48,629; the number of singletons was 73,171.
Figure 2Diversity of cDNA libraries. The libraries (x-axis) are ranked according to their diversity (blue dot on y-axis). The names of the libraries on the x-axis correspond to those listed in Table 1. The diversity of a library is computed as the number of conreads in which the library has at least one read included, divided by the total number of reads present in the library. (See Materials and methods, in the text, for further details.) Two additional measures are included as well. 'top10' (green dots) refers to the fraction of reads comprising the 10 most expressed contigs in that particular library. 'hk80' (red dots) refers to the fraction of reads representing the 65 housekeeping candidates expressed in more than 80 libraries listed in Additional data file 1 (Table S2). Brain and testes libraries are among the most diverse. These also appear as the most diverse from the average diversity for each of the 35 tissues (not shown). Note that the normalized library Pla is among the most diverse tissues, as one would expect a normalized library to be.
Figure 3Distribution of cluster coverage of cDNA libraries. The values on the x-axis indicate the number of libraries for which there is at least one expressed sequence tag (EST) read present. The corresponding value on the y-axis shows the number of conreads for a given number of libraries. The vertical lines at 60 and 80 indicate cut-offs for potential housekeeping genes. The data indicate the presence of power law-like behavior. The data also show that we can only expect a small portion of the clusters to be composed of reads from many libraries.
Figure 4Patterns of differential expression. Differential expression within brain and spinal cord tissues. The clusterings were made using the package of de Hoon and coworkers [43], with options 'uncentered correlation' and 'average-linkage'. Gray fields indicate that the number of reads did not exceed the read cutoff of four reads for a given contig in a given library. However, such numbers were still counted as having the value zero when centering the expression values for the gene cluster. The tree has arbitrary scale.
Primers and PCR conditions
| PigEST name | Gene symbol | Oligo sequence (5' to 3') | Amplicon length | Tm (°C) |
| Ss1.1-rhlv24b_a21.5 | Troponin | CCAGAGTCCCCAGGATA | 100 | 63 |
| Ss1.1-rcst01_n6.5 | Pepsinogen A precursor | TACTGCTGCTCAGCTTG | 106 | 60 |
| Ss1.1-rcst21_l12.5 | Pepsinogen C | TCCTGGTCCTTTTTGACACCTAGAGGACTTGCTGGGGTTG | 108 | 60 |
| Ss1.1-rhyp08c_e13.5.5 | Myelin basic protein | GCAGGGCATAGAGATGGTGTCCCGACCCTGTTAGGAAGAT | 100 | 60 |
| Ss1.1-Liv1-LVRM1E040203.5.5 | Fetuin B | GCCCTGTGTTTCAAATCCTGAGGAGCCACAAGGACAGCTA | 100 | 60 |
| Ss1.1-rnlu1830b_g11.5 | SP-C | TGTACATCTAGGAAACATCAGATTCTTTGGTGGTAGAAGCC | 201 | 60 |
| Ss1.1-rill310b_f20.5 | Gastrotropin | TGAACAGCCCCAACTACCACTCATGCCAGCTTCTTGCTTA | 110 | 60 |
| Ss1.1-rduo424b_g21.5 | Vitamin D-dependent calcium binding protein | TGAGTGCCCAAAAGTCTCCTCAGTTGCTTCAGCTCCTCCT | 153 | 60 |
Table of the used primers and their corresponding polymerase chain reaction (PCR) conditions for the eight selected genes for which quantitative PCR (qPCR) was carried out. The column 'PigEST name' indicates the name of contig. 'Gene name' is the gene name of the selected genes according to UniProt match. 'Oligo sequence (5' to 3')' is the oligonucleotide sequences used in the qPCR experiment. 'Amplicon length' is the length of the amplified product in the qPCR. 'Tm' indicates the annealing temperature used in the qPCR.
Figure 5Gene Ontology content of cDNA libraries and tissues. A heat map of the log odds values (in bits) for each library, found by comparing the observed fraction of the Gene Ontology top level categories of (a) 'molecular function' and (b) 'biological process' with the respective averages. Gene Ontology categories were taken from corresponding M0 to M3 BLAST matches to UniProt. The libraries are grouped by their corresponding tissues, and the coloring indicates the category where we find higher expression than by chance. Only the relevant tissues are indicated by numbers and listed by their range of cDNA library names.