| Literature DB >> 16722601 |
Wei Zhou1, Neil R Smalheiser, Clement Yu.
Abstract
This informal tutorial is intended for investigators and students who would like to understand the workings of information retrieval systems, including the most frequently used search engines: PubMed and Google. Having a basic knowledge of the terms and concepts of information retrieval should improve the efficiency and productivity of searches. As well, this knowledge is needed in order to follow current research efforts in biomedical information retrieval and text mining that are developing new systems not only for finding documents on a given topic, but extracting and integrating knowledge across documents.Entities:
Year: 2006 PMID: 16722601 PMCID: PMC1459215 DOI: 10.1186/1747-5333-1-2
Source DB: PubMed Journal: J Biomed Discov Collab ISSN: 1747-5333
A citation record that shows the MEDLINE fields. Fukagawa T, Nogami M, Yoshikawa M, Ikeno M, Okazaki T, Takami Y, Nakayama T, Oshimura M. Dicer is essential for formation of the heterochromatin structure in vertebrate cells. Nat Cell Biol. 2004 Aug;6(8):784-91.
| PMID | 15247924 |
| OWN | NLM |
| STAT | MEDLINE |
| DA | 20040810 |
| DCOM | 20040827 |
| LR | 20051122 |
| PUBM | Print-Electronic |
| IS | 1465–7392 (Print) |
| VI | 6 |
| IP | 8 |
| DP | 2004 Aug |
| TI | Dicer is essential for formation of the heterochromatin structure in vertebrate cells. |
| PG | 784-91 |
| AB | RNA interference is an evolutionarily conserved gene-silencing pathway in which the nuclease Dicer cleaves double-stranded RNA into small interfering RNAs. The biological function of the RNAi-related pathway in vertebrate cells is not fully understood. Here, we report the generation of a conditional loss-of-function Dicer mutant in a chicken-human hybrid DT40 cell line that contains human chromosome 21. We show that loss of Dicer results in cell death with the accumulation of abnormal mitotic cells that show premature sister chromatid separation. Aberrant accumulation of transcripts from alpha-satellite sequences, which consist of human centromeric repeat DNAs, was detected in Dicer-deficient cells. Immunocytochemical analysis revealed abnormalities in the localization of two heterochromatin proteins, Rad21 cohesin protein and BubR1 checkpoint protein, but the localization of core kinetochore proteins such as centromere protein (CENP)-A and -C was normal. We conclude that Dicer-related RNA interference machinery is involved in the formation of the heterochromatin structure in higher vertebrate cells. |
| AD | Precursory Research for Embryonic Science and Technology of Japan Science and Technology Agency, National Institute of Genetics and The Graduate University for Advanced Studies, Mishima, Shizuoka 411-8540, Japan. |
| FAU | Fukagawa, Tatsuo |
| AU | Fukagawa T |
| FAU | Nogami, Masahiro |
| AU | Nogami M |
| FAU | Yoshikawa, Mitsuko |
| AU | Yoshikawa M |
| FAU | Ikeno, Masashi |
| AU | Ikeno M |
| FAU | Okazaki, Tuneko |
| AU | Okazaki T |
| FAU | Takami, Yasunari |
| AU | Takami Y |
| FAU | Nakayama, Tatsuo |
| AU | Nakayama T |
| FAU | Oshimura, Mitsuo |
| AU | Oshimura M |
| LA | eng |
| PT | Journal Article |
| DEP | 20040711 |
| PL | England |
| TA | Nat Cell Biol |
| JT | Nature cell biology. |
| JID | 100890575 |
| RN | 0 (Cell Cycle Proteins) |
| RN | 0 (Heterochromatin) |
| RN | 0 (Nuclear Proteins) |
| RN | 0 (Phosphoproteins) |
| RN | 0 (RAD21 protein, human) |
| RN | EC 2.7.1.- (Bub1 spindle checkpoint protein) |
| RN | EC 2.7.1.37 (Protein Kinases) |
| RN | EC 3.1.- (Endoribonucleases) |
| SB | IM |
| CIN | Nat Cell Biol. 2004 Aug;6(8):696-7. PMID: 15303098 |
| MH | Animals |
| MH | Blotting, Western |
| MH | Cell Cycle Proteins/genetics/metabolism |
| MH | Cell Death/genetics |
| MH | Cell Line |
| MH | Cell Survival |
| MH | Centromere/chemistry |
| MH | Chickens |
| MH | Chromosomes, Human, Pair 21 |
| MH | Endoribonucleases/deficiency/*genetics/*physiology |
| MH | Gene Silencing |
| MH | Heterochromatin/*chemistry/genetics/*metabolism |
| MH | Humans |
| MH | Immunohistochemistry |
| MH | In Situ Hybridization, Fluorescence |
| MH | Models, Biological |
| MH | Mutation |
| MH | Nuclear Proteins/genetics/metabolism |
| MH | Phosphoproteins/genetics/metabolism |
| MH | Protein Kinases/genetics/metabolism |
| MH | RNA Interference |
| MH | Research Support, Non-U.S. Gov't |
| MH | Restriction Mapping |
| MH | Transgenes |
| EDAT | 7/13/2004 5:00 |
| MHDA | 8/28/2004 5:00 |
| PHST | 2004/05/29 [received] |
| PHST | 2004/06/29 [accepted] |
| PHST | 2004/07/11 [aheadofprint] |
| AID | 10.1038/ncb1155 [doi] |
| AID | ncb1155 [pii] |
| PST | ppublish |
| SO | Nat Cell Biol. 2004 Aug;6(8):784-91. Epub 2004 Jul 11. |
Figure 1Venn diagram visualization of a PubMed search. 37,600 documents were retrieved when "propranolol" was searched in PubMed and 244,225 for "hypertension". The overlap, 4,155 documents, is the set of documents having both "propranolol" and "hypertension".
The Google PageRank algorithm.
| PageRank is defined as follows [8]: |
| We assume web page A has pages T1...Tn which link to it. The parameter d is a damping factor which can be set between 0 and 1 (usually set to 0.85). Also, C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: |
| PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) |
| PageRanks form a probability distribution over web pages. The PageRank value of a web page reflects the frequency of encountering that page by a Web user who surfs across the web following links randomly. |
Term weighting and normalization in the vector space model.
| A typical term weighting strategy combines the |
| The idea of IDF is that the fewer documents having the term, the more useful the term is in discriminating those documents having it from those not having it. On the other hand, if a term occurs many times in a document, then it is likely that the term is significant in representing the contents of the document. With this weighting strategy, the highest weight is accorded to terms that occur frequently in a document but infrequently elsewhere. |
| With very large collections, not all terms in the document are used for indexing. Some terms have to be removed. This is usually accomplished through the elimination of |
| The most common approach to relevance ranking in VSM is to give each document a score based on the sum of the weights of terms common to the document and query. Terms in documents typically derive their weight from the TF*IDF. Then the similarity between each document and the query is computed with the formula: |
| One problem with TF*IDF weighting is that longer documents accumulate more weight in queries simply because they have more words. As such, some approaches "normalize" the weight of a document. The most common approach is |
| A variety of other variations to the basic VSM have been developed. For example, |
11-point precision diagram. This example shows a query that is submitted to two different IR systems (IR1 and IR2), which are based on the same collection of 20 documents. Both IR1 and IR2 rank all 20 documents, of which 10 are relevant. However, IR1 ranks the relevant documents higher on average than does IR2. The mean average precision for IR1 = 0.79 and for IR2 = 0.40. The recall and precision curves for IR1 and IR2 are shown in figure 2.
| Ranking by IR1 | Ranking by IR2 | ||||||||
| Ranking | Doc | Relevant | Recall | Precision | Ranking | Doc | Relevant | Recall | Precision |
| 1 | d1 | no | 0.00 | 0.00 | |||||
| 2 | d2 | no | 0.00 | 0.00 | |||||
| 3 | d3 | no | 0.00 | 0.00 | |||||
| 4 | d4 | no | 0.30 | 0.75 | 4 | d4 | no | 0.00 | 0.00 |
| 5 | d5 | no | 0.00 | 0.00 | |||||
| 6 | d6 | no | 0.40 | 0.67 | |||||
| 8 | d8 | no | 0.50 | 0.63 | |||||
| 9 | d9 | no | 0.30 | 0.33 | |||||
| 10 | d10 | no | .060 | 0.60 | |||||
| 11 | d11 | no | 0.40 | 0.36 | |||||
| 13 | d13 | no | 0.50 | 0.38 | |||||
| 14 | d14 | no | 0.90 | 0.64 | |||||
| 15 | d15 | no | 0.60 | 0.40 | |||||
| 16 | d16 | no | 1.00 | 0.63 | |||||
| 17 | d17 | no | 1.00 | 0.59 | |||||
| 18 | d18 | no | 1.00 | 0.56 | |||||
| 19 | d19 | no | 1.00 | 0.52 | 19 | d19 | no | 0.90 | 0.47 |
| 20 | d20 | no | 1.00 | 0.50 | |||||
Figure 2Curve of precision vs. recall for the two IR systems shown in Table 4. IR systems typically show a trade-off between recall and precision, so that the more documents that are retrieved, the more irrelevant documents will be included. On the other hand, it can be seen that system IR1 performs uniformly better than system IR2 since it has higher precision values at every recall level.
For further reading
| A. Shatkay H, Feldman R: |
| B. Nadkarni PM: |
| C. Hersh WR: |
| D. Krallinger M, Valencia A: |
| E. Jensen LJ, Saric J, Bork P: |