| Literature DB >> 17488528 |
Charu G Kumar1, Joshua H Larson, Mark R Band, Harris A Lewin.
Abstract
BACKGROUND: Among the eutherian mammals, placental architecture varies to a greater extent than any other tissue. The diversity of placental types, even within a single mammalian order suggests that genes expressed in placenta are under strong Darwinian selection. Thus, the ruminant placenta may be a rich source of genes to explore adaptive evolutionary responses in mammals. The aim of our study was to identify novel transcripts expressed in ruminant placenta, and to characterize them with respect to their expression patterns, organization of coding sequences in the genome, and potential functions.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17488528 PMCID: PMC1884150 DOI: 10.1186/1471-2164-8-113
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Bioinformatics scheme for identifying novel transcripts
| Steps in the scheme | Number of sequences | |
| Removed | Remaining | |
| Starting set of 5' placenta ESTs | 12,614 | |
| PipeBLASTN (automated BLASTN analysis) | 10,235 | 2,379 |
| 581 | 1,798 | |
| 3' reads from single-pass sequencing using anchored oligodT primer | 618 | 1,180 |
| PipeBLASTN of 3' reads | 404 | 776 |
| Assembly of 3' and extended 5' mates | NAa | 493 |
| Full-clone sequencing of unassembled 3' ESTs | NA | 283 |
| 171 | 322 | |
| 232 | 51 | |
| Working set of divergent homologs and novel transcripts (322+51) | NA | 373 |
| TBLASTX against UniGene databases to separate novel transcripts from divergent homologs | 75 | 298 |
| 134 | 164 | |
| Screening for internal poly A tracts | 73 | 91 |
a NA; not applicable.
Figure 1Length distribution of ORFs. Length distribution of 78 ORFs (orange bars) predicted in 64 NTs with >33 codons. The expected frequency of false positives (yellow bars) was calculated according to Frishman et al. [52].
Evidence of functionality in 91 NTsa
| Type of functional element | No. of NTs |
| Protein motif | 4 |
| Coiled-coil repeats | 1 |
| Transmembrane (TMHMM, SMART) | 1 |
| Novel protein domain | 1 |
| Signal peptide (SignalP-NN, SignalP-HMM, SMART) | 4 |
| UTR elements | 13 |
| Inverse repeats | 12 |
| G-quartets | 7 |
| CpG Islands | 30 |
| Non-coding RNA structural elements | 2 |
| ncRNA | 21 |
a Sequences annotated with the above elements can be found in Additional file 2.
Sample feature table for 91 NTs
| BTC1_14FL (XM_873204) | 868 | 63 | 4 | AATAAA | CpG | muscle | 7p13 | chr4:43, 188,679–43,191, 770 | intergenic | interstitial | ||
| BTC1_15FL | 630 | none | C | 1 | AATAAA | ncRNA | none | 14q11.2 | scaffold3389:59,086–59,711 | inter genie | interstitial | |
| BTC1_33FL | 691 | 35 | C | 5 | ATTAAA | CpG | none | 6q14.3 | scaffold4626:20,602–22,398 | intergenic | interstitial | |
| BTC1_34FL | 977 | 100 K | A | 3 | AATAAA | ADH_DRE, GY-Box, signalP | placentome | 9q32 | scaffold9964:10,570–17,798 | intergenic | interstitial | |
| BTC1_53FL | 525 | 106 | C | 1 | AATAAA | none | none | 19q13.31 | chr6:49, 142,506–49,143,026 | intergenic | interstitial | |
| BTC1_55FL | 777 | 82 K | A | 2 | AATAGA | signalP | none | 16q22.1 | chr18:13,861,852–13,862,619 | intergenic | boundary | |
| BTC1_57FL | 911 | none | C | 1 | AATAAA | none | thymus | 7p22.1 | chr25:33,360,786–33,361,318 | interstitial | ||
| BTC1_58FL | 1544 | 64 | A | 2 | ATTAAA | ADH_DRE, GY-Box | none | 11q12.2 | chr29:31,601,481–31,607,081 | interstitial | ||
| BTC1_71FL | 691 | none | C | 1 | AATAAA | none | 3p21.31 | chr22:42,680,522–42,681,188 | interstitial | |||
| BTC1_77FL | 1095 | 84 | C | 1 | AATAAA | GY-Box; CpG | thymus | 18q21.33 | chr24:44,739,526–44,740,613 | subtelomeric | ||
| BTC1_78FL | 1736 | 91 K | A | 1 | AATAAA | GY-Box; IR | mesenteric lymphnode, thymus | 1p36.13 | chr2:85,996,428–85,997,304 | intergenic | subtelomeric | |
| BTC1_79FL | 732 | none | C | 4 | TATAAA | CpG; ncRNA | cerebrum | 2p11.2 | chr11:34,570,339–34,575,174 | intergenic | interstitial | |
| BTC1_92FL (XM_587188) | 933 | 93 K; 127 | C | 1 | TGTAAA | none | cerebrum | 17q25.1 | chr19:49,709,796–49,710,719 | intergenic | interstitial | |
| BTC1_93FL | 1269 | 147 K | C | 1 | AATAAA | GY-Box | thymus | 22q12.2 | chr17:42,667,501–42,668,530 | interstitial | ||
| BTC1_95FL | 2492 | 108 | C | 1 | AATAAA | none | skin | 6p21.2 | chr23:8,629,884–8,632,373 | intergenic | interstitial | |
| BTC1_102FL | 795 | 38 K | C | 1 | AATAAA | G-quartet; IR | none | 10q11.21 | chr28:33,147,612–33,148,436 | intergenic | subtelomeric | |
| BTC1_113FL (XM_611248) | 659 | 69 K | C | 5 | AATAAA | CpG | thalamus | 1p35.3 | chr2:76,776,847–76,779,658 | intergenic | interstitial | |
| BTC1_115FL | 962 | 36; 35 | C | 1 | AATAAA | none | thymus | 1p34.1 | chr3:67,576,355–67,577,638 | interstitial | ||
| BTC1_130FL (XM_611254) | 664 | 125 K; 65 K | A | 3 | AATAAA | none | adrenal, cerebellum, thalamus | 19q13.43 | chr18:55591568–55593819 | subtelomeric | ||
| BTC1_132FL | 2174 | none | A | 1 | AATAAA | IR; ncRNA | thymus | Xq25 | scaffold4717:48,263–50,399 | interstitial | ||
| BTC3_7JE | 548 | 35 | A | 5 | AATAAA | none | muscle | 7p22.1 | chr4:43,188,676–43,191,750 | intergenic | interstitial | |
| BTC1_22JE | 1569 | none | A | 1 | ATTAAA | IR; ncRNA | placentome | 5q33.3 | scaffold399:92,975–94,544 | interstitial | ||
| BTC2_43JE | 1043 | none | C | 2 | AAACAA | G-quartet; CpG | cerebrum, thalamus | 8q24.3 | scaffold12671:9,483–10,533 | interstitial | ||
| BTC1_51JE | 958 | 130 | C | 2 | AATAAA | none | skin | 11p15.5 | scaffold2079:18,521–19,645 | intergenic | interstitial | |
| BTC1_100JE | 2381 | 92 K | C | 1 | GATAAA | GY-Box | none | 19p13.11 | chr7:5,772,691–5,775,068 | intergenic | interstitial | |
| BTC1_104JE (XM_883284) | 619 | 59; 68 | A | 5 | AATAAA | None | placentome | none | scaffold6598:22994–33093 | intergenic | interstitial | |
| BTC1_113JE | 830 | none | A | 2 | AATAAA | CpG; ncRNA | none | 1p13.2 | scaffold1237:214289–215207 | intergenici | interstitial | |
| BTC1_118JE | 2257 | 50; 92 | C | 1 | AATAAA | GY-Box | none | 22q13.2 | scaffold9684:8273–9215 | intergenic | interstitial | |
| BTC1_144JE | 832 | 150 K | C | 1 | AATAAA | none | none | 1p36.33 | chr16:37642343–37643174 | intergenic | boundary | |
| BTC1_146JE | 2206 | none | A | 3 | ATTAAA | IR; ncRNA | muscle, skin | 14q32.31 | chr21:45,674,164–45,678,931 | intergenic | interstitial | |
| BTC1_203JE | 773 | 129 K | C | 3 | ATTAAA | CpG | heart | 8q21.2 | chr14:46,885,318–46,890,350 | intergenic | interstitial | |
| BTC1_215JE | 1038 | 109 K | C | 1 | AATAAA | Eu_thiol protease | none | 2p25.3 | scaffold473:40,328–41,349 | intergenic | interstitial | |
| BTC1_8NG | 703 | 71 K | A | 4 | AATATA | none | thalamus | none | chr19:48,755,357–48,757,924 | interstitial | ||
| BTC1_39NG | 767 | 172 K | C | 2 | AATAAA | Proline-rich with_coiled-coil | none | 1p31.3 | scaffold997:257,434–259,042 | interstitial | ||
| BTC1_58NG | 562 | 49 | C | 1 | AATAAA | CpG | cerebrum | 13q34 | chr12:48,063,584–48,064,091 | subtelomeric | ||
| BTC1_63NG | 483 | 142 K | C | 2 | unknown | signalP; ATPaseE1. E2 | none | 16p13.12 | chr25:14,893,445–14,897,173 | intergenic | interstitial | |
| BTC1_104NG | 728 | 144 K; 149 K | C | 1 | AATAAA | TOP | none | 14q32.32 | chr21:48,386,644–48,387,349 | subtelomeric | ||
| BTC1_149NG | 690 | 35 | C | 1 | TATAAA | CpG | skin, testis | 22q11.21 | chr17:45,366,139–45,366,804 | subtelomeric | ||
| BTC1_237NG | 622 | 67 | C | 1 | AATAAA | CpG | cerebrum | 17q25.3 | chr19:47,237,560–47,238,117 | intergenic | interstitial | |
| BTC1_255NG (XM_596632) | 651 | 64 | C | 2 | AATAAA | CpG;Heavy metal_ion transport | muscle | 3q29 | chr1:43,207,885–43,208,506 | intergenic | interstitial | |
| BTC1_269NG | 716 | 116 | C | 1 | ATTAAA | CpG | none | 11q13.3 | chr29:41,407,248–41,407,965 | intergenic | boundary | |
| BTC1_286NG (XM_870778) | 788 | 36; 184 K | C | 1 | unknown | CpG | placentome | none | chr14:662,313–663,087 | subtelomeric |
a number in parenthesis is the GenBank accession of the cattle RefSeq prediction matching the NT.
b GenBank accession No. of 3' sequence including primer-walked sequence.
c K denotes start-codon of ORF flanked by Kozak consensus.
dC, cattle-specific; A, cetartiodactyla-specific
e 'Unknown' indicates that a polyadenylation signal could not be identified.
f 'IR' indicates presence of an inverse repeat; ADH_DRE, GY-box, K-box, Brd box and CPE are functional UTR elements.
gene symbol indicates NT is located in an intron of that gene.
h "Boundary" indicates that the NT anchors within 1 Mbp of the end of a homologous synteny boundary on a human chromosome as defined in [23];
"Subtelomeric" indicates that the NT aligns within 2 Mbp from the end of a cattle chromosome.
Lineage specificity and expression features of 91 cattle NTs
| Cetartiodactyla-specific transcriptsc | No | 7 | 7 | 3 |
| Yes | 16 | 14 | 15 | |
| Novel transcriptsd | No | 20 | 19 | 6 |
| Yes | 48 | 46 | 18 |
a Gene expression validated by microarray analysis.
b Splice-site analysis was conducted using est2genome. Some of the NTs had missing exons due to gaps in the sequence scaffolds, or due to alignment reaching the end of a scaffold.
c BLASTN hits with E ≤ 10-05 to cetartiodactyl EST database.
d Sequences that have BLASTN hits with E > 10-05 and TBLASTX hits with E > 10-10
Figure 2Genomic context of BTC1_14RD and BTC1_130FL. Top panel (A). A modified image from the UCSC cow genome browser (March 2005, Btau_2.0) showing BTC1_14RD and its alternately spliced product BTC1_130FL aligned to the cattle genome (Contig455) [GenBank:AAFC02000448]. Cattle BAC AC146804 [25] aligned to the same region using BLAT (regions of similarity shown with vertical bars). A track for human proteins is shown to demonstrate that there are no known human homologs in this region. A scaled track for cattle ESTs (partial representation of "squish mode" due to the large number of ESTs) shows high support for alternatively spliced cattle transcripts encoded in this region. Bottom panel (B). A modified image from the UCSC human genome browser (May 2004, Hg17) showing the in silico anchoring of BTC1_14RD, represented as a block arrow, to a subtelomeric region of HSA19q on the basis of flanking sequence similarity in cattle BAC AC146804. This region is syntenic to a segment of BTA18 (shown as a separate track at the top). BTC1_130FL anchors in the same region (not shown to maintain clarity of the figure). The assumptive map location of the gene encoding the artiodactyl-specific transcript Ast1 [GenBank:AY427788] is also shown. No significant flanking match was identified in Contig455 (due to its shorter length). Unmodified UCSC Genome Browser tracks for known human proteins, Genscan genes, retroposed genes, conserved sequences, and segmental duplication are shown.
Figure 3Genomic context of BTC1_146JE. Top panel (A). A modified image from the UCSC cow genome browser (March 2005, Btau_2.0) showing BTC1_146JE aligned to the cattle genome (Contig54150) [GenBank:AAFC02053608]. Cow ESTs are shown in a scaled "squish" mode to conserve space due to the large number of ESTs aligning in the region. Bottom panel (B). Anchoring of BTC1_146JE, represented as a block arrow, to HSA14q32.31 (May 2004, Hg17). BTC1_146JE was anchored on the basis of its complete alignment to contig54150 and similarity of flanking sequence in the contig to the human genome shown as a vertical line along the track. This region is syntenic to a segment of BTA21 (shown as a separate track), consistent with RH mapping data [23]. UCSC Genome Browser tracks for known human proteins, Genscan genes, sno/miRNA, conserved sequences, and segmental duplication are unmodified.
Figure 4Genomic context of BTC1_113FL. Top panel (A). A modified image from the UCSC cow genome browser (March 2005, Btau_2.0) showing BTC1_113FL aligned to the cow genome (Contig74653; GenBank accession no. AAFC02073929). A track for cattle ESTs ("squish mode") shows high support for alternatively spliced cattle transcripts encoded in this region and the presence of 3' exons of SECP43. Genscan predicts a gene within the alignment of BTC1_113FL but does not predict all of the exons from ESTs. A CpG island is located at the 5' end of the predicted gene. Bottom panel (B). Anchoring of BTC1_113FL represented as a block arrow, to HSA1p35.3 (May 2004, Hg17). BTC1_113FL, was anchored on the basis of its complete alignment to cattle contig74653 and similarity of flanking sequence in the contig to the human genome (shown as vertical lines along the track). This region is syntenic to a segment of BTA2 (shown as a separate track), consistent with RH mapping data [23]. UCSC Genome Browser tracks for known human proteins, Genscan genes, sno/miRNA, conserved sequences, and segmental duplication are unmodified.
Expression levels of 86 NTs
| High | 12 (14%) | 5 (6%) |
| Moderate | 14 (16%) | 10 (11%) |
| Low | 33 (38%) | 12 (14%) |
a See Materials and Methods for definition
Figure 5Preferential expression of 39 NTs in 18 sampled tissues. Orange bars indicate NTs with no ORF and preferential expression in more than one tissue. Yellow bars indicate NTs with ORFs and preferential expression in more than one tissue. Blue bars represent NTs with no ORF and preferential expression in a single tissue. Purple bars represent NTs with ORFs and preferential expression in a single tissue. M.L.; mesenteric lymph node.