| Literature DB >> 18693843 |
Natalia Grabar1, Sonia Krivine, Marie-Christine Jaulent.
Abstract
Making the distinction between expert and non expert health documents can help users to select the information which is more suitable for them, according to whether they are familiar or not with medical terminology. This issue is particularly important for the information retrieval area. In our work we address this purpose through stylistic corpus analysis and the application of machine learning algorithms. Our hypothesis is that this distinction can be performed on the basis of a small number of features and that such features can be language and domain independent. The used features were acquired in source corpus (Russian language, diabetes topic) and then tested on target (French language, pneumology topic) and source corpora. These cross-language features show 90% precision and 93% recall with non expert documents in source language; and 85% precision and 74% recall with expert documents in target language.Entities:
Mesh:
Year: 2007 PMID: 18693843 PMCID: PMC2655811
Source DB: PubMed Journal: AMIA Annu Symp Proc ISSN: 1559-4076