| Literature DB >> 19948035 |
E Michael Gertz1, Kundan Sengupta, Michael J Difilippantonio, Thomas Ried, Alejandro A Schäffer.
Abstract
BACKGROUND: While attempting to reanalyze published data from Agilent 4 x 44 human expression chips, we found that some of the 60-mer olignucleotide features could not be interpreted as representing single human genes. For example, some of the oligonucleotides align with the transcripts of more than one gene. We decided to check the annotations for all autosomes and the X chromosome systematically using bioinformatics methods.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19948035 PMCID: PMC2791105 DOI: 10.1186/1471-2164-10-566
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Reporters divided into the five categories discussed in Results
| Fully Valid | RefSeq RNA Valid | Other Gene Valid | Possibly Valid | Invalid | Total |
|---|---|---|---|---|---|
| 25505 | 1859 | 2187 | 3168 | 9964 | 42683 |
The far right column shows the total number of eligible reporters, and the other entries show the total number of reporters in each category.
Results of aligning all eligible reporters to the database of human RefSeq RNAs
| Eligible | Rev. Comp. | Multiple Genes | Unique Gene | No Alignment |
|---|---|---|---|---|
| 42683 | 2009 | 2463 | 27364 | 10847 |
The first column show the total number of eligible reporters. The second column shows the number of reporters eliminated because they align to the reverse complement of a RefSeq RNA. The third column shows the number of reporters eliminated because they align to the transcripts of more than one gene. The fourth column lists the number of reporters that align to the transcripts of a single gene. The fifth column shows the number of reporters that did not have a sufficiently high-scoring alignment to a RefSeq RNA. Reporters that could be counted in more than one column of the third through fifth columns are counted in the leftmost column.
Results of placing the reporters that align with the RefSeq RNA transcripts of a single gene on the chromosome
| Eligible | Fully Valid | No Placement | Multiple Genes | Overlapping Genes | Wrong Gene |
|---|---|---|---|---|---|
| 27364 | 25505 | 177 | 1429 | 238 | 15 |
The first column is the total number of eligible reporters, which is the same data as the fourth column of Table 2. The second column shows the number of reporters that have a valid placement in a unique gene; this is the same data as the first column of Table 1. The third column shows the number of reporters that did not have a high-quality placement. The fourth column shows the number of reporters that have multiple placements and are placed in the location of more than one gene. The fifth column shows the number of reporters with a single placement, but a placement within overlapping genes. The sixth column shows the number of reporters placed in a single gene, but not the same gene as found by alignment to RNA. The sum of the four rightmost columns gives the 2565 RNA valid reporters in Table 1.
Counts of reporters associated with a putative transcript not in RefSeq
| Eligible | NoID | Suppressed | No Alignment | Valid ID |
|---|---|---|---|---|
| 10847 | 2114 | 845 | 285 | 7603 |
The first column shows the number of reporters that did not have an alignment with a RefSeq RNA; these data are also shown as the fifth column of Table 2. The second column shows the number of reporters that had no discernible identifier in the annotation file. The third column shows the number of reporters for which the annotated identifier had been suppressed or removed in Entrez Nucleotide. The fourth column shows the number of reporters that did not align with the annotated sequence. The fifth column shows the number of reporters that could be associated with a putative transcript. Reporters that could be counted in more than one of the second through fourth columns are counted in the leftmost column.
Results of placing the reporters that do not align with a RefSeq RNA transcript
| Eligible | No Placement | Multiple Genes | Overlapping Genes | Wrong Chromosome | Other Gene Valid | Possibly Valid |
|---|---|---|---|---|---|---|
| 7603 | 1662 | 389 | 91 | 83 | 2187 | 3168 |
The first column is the total number of eligible reporters, which is the same data as the fifth column of Table 4. The second column shows the number of reporters that could not be placed. The third column shows the number of reporters that have multiple placements and are placed in the location of more than one gene. The fourth column shows the number of reporters with a single placement, but a placement within overlapping genes. The fifth column shows the number of reporters with a unique placement on an unexpected chromosome. The sixth column shows the number of reporters that could be associated with a unique gene and thus are considered positionally valid. The seventh column shows the number of reporters that could be associated with a unique placement, not on a gene. The sixth and seventh columns of this table are the same as the third and fourth columns of Table 1. Reporters that could be counted in more than one column of the second through fifth columns are counted in the leftmost column.
Figure 1Flowchart showing the process, described in Methods, used to include a reporter in one of the top two categories, invalidated, or declare it eligible for validation for position. The process for validating by position is shown in Figure 2.
Figure 2Flowchart showing the process, described in Methods, used to validate a reporter by position, placing it into one of the categories: other gene valid, possibly valid, or invalid.