| Literature DB >> 22133125 |
Geng Chen1, Ruiyuan Li, Leming Shi, Junyi Qi, Pengzhan Hu, Jian Luo, Mingyao Liu, Tieliu Shi.
Abstract
BACKGROUND: The complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22133125 PMCID: PMC3288009 DOI: 10.1186/1471-2164-12-590
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Overview of identification of the missing expressed genes beyond the human reference genome. Human brain and cell transcriptome sequencing reads were used to validate the transcribed regions in Asian and African novel sequences, quantify the expression of unalignable RefSeq genes and identify novel transcript contigs.
Figure 2The expression levels of those unalignable RefSeq genes in brain and cell lines. The threshold is 0.1 RPKM (Reads Per Kilobase of the transcript per Million mapped reads).
Novel transcript contigs in brain and cell lines.
| Items | Brain | Cell lines |
|---|---|---|
| Total number of transcript contigs | 254769 | 204625 |
| Number of contigs unaligned to GRCh37, RefSeq genes and EST | 16225 | 11638 |
| Number of unmapped contigs aligned to Human Fosmid sequences | 41 | 25 |
| Number of unmapped contigs aligned to HuRef genome | 184 | 100 |
| Number of unmapped contigs aligned to Celera genome | 181 | 90 |
| Number of unmapped conitgs aligned to YH novel sequences | 103 | 42 |
| Number of unmapped contigs aligned to NA18507 novel sequences | 137 | 46 |
| Number of unmapped contigs aligned to chimpanzee genome | 119 | 56 |
| Number of unmapped contigs aligned to macaque genome | 64 | 36 |
| Total number of aligned unmapped contigs | 313 | 173 |
| Total contig length (bp) | 50324 | 29664 |
| N50 contig size (bp) | 195 | 194 |
*The brain and cell line transcript contigs were aligned to the human reference genome (GRCh37), RefSeq genes and EST sequences with 90% indentify and 90% coverage as threshold. The unalignable brain and cell line transcript contigs were then aligned to human Fosmid sequences, HuRef genome, Celera genome, Asian (YH) and African (NA18507) novel sequences, chimpanzee and macaque genomes with 90% identity and 100% coverage as the threshold.
The locations of seven conserved brain novel transcript contigs on GRCh37.
| Contigs | Length (bp) | Chromosomes | Estimated deletions (kb) | Deletion start coordinates |
|---|---|---|---|---|
| NODE_319067 | 163 | Chr3 | 5.2 | 90651858 |
| NODE_373345 | 100 | Chr9 | 2.4 | 18778464 |
| NODE_445444 | 138 | Chr14 | 1.6 | 82423894 |
| NODE_463359 | 131 | Chr8 | 27.3 | 58365531 |
| NODE_469290 | 100 | Chr5 | 14.7 | 2864231 |
| NODE_518716 | 100 | Chr18 | 20.4 | 12912086 |
| NODE_559864 | 100 | Chr12 | 2.2 | 2903843 |
*The "estimated deletions" were calculated by aligning the extended contigs (extended 10 kb on both sides of the identified novel contigs with HuRef genome as the reference) onto GRCh37 to find the distance between two broken sequences of each extended contigs.
Figure 3RT-PCR validating of conserved novel transcript contigs. Six conserved novel transcript contigs were validated expressed in three different types of human normal cells. Because gene expression usually exhibit temporal and spatial specificity, not all those novel transcript contigs were validated in every type of normal human cells. MCF10A: normal human breast cell; hFOB: human fetal osteoblast; 293T: human embryonic kidney cell; β-ACTIN: positive control; Luciferase: negative control; Marker: sm0331 DNA Ladder Mix.