| Literature DB >> 28644863 |
Tome Eftimov1,2, Barbara Koroušić Seljak1, Peter Korošec1,3.
Abstract
Evidence-based dietary information represented as unstructured text is a crucial information that needs to be accessed in order to help dietitians follow the new knowledge arrives daily with newly published scientific reports. Different named-entity recognition (NER) methods have been introduced previously to extract useful information from the biomedical literature. They are focused on, for example extracting gene mentions, proteins mentions, relationships between genes and proteins, chemical concepts and relationships between drugs and diseases. In this paper, we present a novel NER method, called drNER, for knowledge extraction of evidence-based dietary information. To the best of our knowledge this is the first attempt at extracting dietary concepts. DrNER is a rule-based NER that consists of two phases. The first one involves the detection and determination of the entities mention, and the second one involves the selection and extraction of the entities. We evaluate the method by using text corpora from heterogeneous sources, including text from several scientifically validated web sites and text from scientific publications. Evaluation of the method showed that drNER gives good results and can be used for knowledge extraction of evidence-based dietary recommendations.Entities:
Mesh:
Year: 2017 PMID: 28644863 PMCID: PMC5482438 DOI: 10.1371/journal.pone.0179488
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1DrNER workflow diagram.
USAS categories.
| Entity | USAS category | USAS subcategory |
|---|---|---|
| Nutrient entity | ||
| Food entity | ||
Default chunking result for one sentence.
| Token id | Tokens | Chunk tokens | Token id | Tokens | Chunk tokens |
|---|---|---|---|---|---|
| 1 | The | B-NP | 26 | while | |
| 2 | recommended | I-NP | 27 | for | B-PP |
| 3 | intake | I-NP | 28 | men | B-NP |
| 4 | for | B-PP | 29 | and | I-NP |
| 5 | total | B-NP | 30 | women | I-NP |
| 6 | fiber | I-NP | 31 | over | B-PP |
| 7 | for | B-PP | 32 | 50 | B-NP |
| 8 | adults | B-NP | 33 | it | B-NP |
| 9 | 50 | B-NP | 34 | is | B-VP |
| 10 | years | I-NP | 35 | 30 | B-NP |
| 11 | and | O | 36 | g | I-NP |
| 12 | younger | B-NP | 37 | and | O |
| 13 | is | B-VP | 38 | 21 | B-NP |
| 14 | set | I-VP | 39 | g | I-NP |
| 15 | at | B-PP | 40 | per | B-PP |
| 16 | 38 | B-NP | 41 | day | B-NP |
| 17 | g | I-NP | 42 | , | O |
| 18 | for | B-PP | 43 | respectively | |
| 19 | men | B-NP | 44 | , | O |
| 20 | and | O | 45 | due | B-PP |
| 21 | 25 | B-NP | 46 | to | I-PP |
| 22 | g | I-NP | 47 | decreased | B-NP |
| 23 | for | B-PP | 48 | food | I-NP |
| 24 | women | B-NP | 49 | consumptions | I-NP |
| 25 | , | O | 50 | . | O |
The Boolean function for the first post-hoc chunking.
| A | B | f |
|---|---|---|
| 0 | 0 | 1 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
Fig 2Karnaugh map of the Boolean function for the first post-hoc chunking.
Example of first post-hoc chunking.
| Tokens | POS tags | Chunk tokens | Food | Nutrient | Quantity/Unit | Chunk |
|---|---|---|---|---|---|---|
| People | NNS | B-NP | 0 | 0 | 0 | |
| of | IN | B-PP | 0 | 0 | 0 | |
| any | DT | B-NP | 0 | 0 | 0 | |
| age | NN | I-NP | 0 | 0 | 0 | |
| who | WP | B-NP | 0 | 0 | 0 | B-NP |
| are | VBP | B-VP | 0 | 0 | 0 | B-VP |
| African | JJ | B-NP | 0 | 0 | 0 | B-NP |
| Americans | NNP | I-NP | 0 | 0 | 0 | I-NP |
| should | MD | B-VP | 0 | 0 | 0 | B-VP |
| further | RBR | I-VP | 0 | 0 | 0 | I-VP |
| reduce | VB | I-VP | 0 | 0 | 0 | I-VP |
| sodium | NN | B-NP | 0 | 0 | B-NP | |
| intake | NN | I-NP | 0 | 0 | 0 | I-NP |
| to | TO | B-PP | 0 | 0 | 0 | B-PP |
| 300 | CD | B-NP | 0 | 0 | 0 | |
| mg | NN | I-NP | 0 | 0 | ||
| per | IN | B-PP | 0 | 0 | 0 | |
| day | NN | B-NP | 0 | 0 | 0 | |
| . | . | O | 0 | 0 | 0 | O |
1 indicates the result of the first post-hoc chunking
* indicates a beginning of new chunk
XEntities.
| Chunks | Chunk tokens | Food | Nutrient | Quantity/Unit |
|---|---|---|---|---|
| People | B-NP | 0 | 0 | 0 |
| of | B-PP | 0 | 0 | 0 |
| any age | B-NP | 0 | 0 | 0 |
| who | B-NP | 0 | 0 | 0 |
| are | B-VP | 0 | 0 | 0 |
| African Americans | B-NP | 0 | 0 | 0 |
| should further reduce | B-VP | 0 | 0 | 0 |
| sodium intake | B-NP | 0 | 0 | |
| to | B-PP | 0 | 0 | 0 |
| 300 mg | B-NP | 0 | 0 | |
| per | B-PP | 0 | 0 | 0 |
| day | B-NP | 0 | 0 | 0 |
| . | O | 0 | 0 | 0 |
* indicates a beginning of new chunk
The first phase of the drNER method for Φ1.
| Tokens | POS tags | Chunk tokens | Food | Nutrient | Quantity/Unit | Chunk | Chunk | Chunk |
|---|---|---|---|---|---|---|---|---|
| People | NNS | B-NP | 0 | 0 | 0 | B-NP | B-NP | |
| of | IN | B-PP | 0 | 0 | 0 | I-NP | I-NP | |
| any | DT | B-NP | 0 | 0 | 0 | I-NP | I-NP | |
| age | NN | I-NP | 0 | 0 | 0 | I-NP | I-NP | |
| who | WP | B-NP | 0 | 0 | 0 | B-NP | B-NP | |
| are | VBP | B-VP | 0 | 0 | 0 | B-VP | I-NP | |
| African | JJ | B-NP | 0 | 0 | 0 | B-NP | I-NP | |
| Americans | NNP | I-NP | 0 | 0 | 0 | I-NP | I-NP | |
| should | MD | B-VP | 0 | 0 | 0 | B-VP | B-VP | B-VP |
| further | RBR | I-VP | 0 | 0 | 0 | I-VP | I-VP | I-VP |
| reduce | VB | I-VP | 0 | 0 | 0 | I-VP | I-VP | I-VP |
| sodium | NN | B-NP | 0 | 0 | B-NP | B-NP | B-NP | |
| intake | NN | I-NP | 0 | 0 | 0 | I-NP | I-NP | I-NP |
| to | TO | B-PP | 0 | 0 | 0 | B-PP | B-PP | B-PP |
| 300 | CD | B-NP | 0 | 0 | 0 | B-NP | B-NP | |
| mg | NN | I-NP | 0 | 0 | I-NP | I-NP | ||
| per | IN | B-PP | 0 | 0 | 0 | I-NP | I-NP | |
| day | NN | B-NP | 0 | 0 | 0 | I-NP | I-NP | |
| . | . | O | 0 | 0 | 0 | O | O | O |
1 indicates the result of the first post-hoc chunking
2 indicates the result of the second post-hoc chunking
3 indicates the result of the third post-hoc chunking
* indicates a beginning of new chunk
Fig 3Graphic representation of the recommendation Φ1.
Fig 4Parse tree of Φ1.
Knowledge extraction of 15 dietary recommendations.
| Recommendation | Group | Action | Food | Nutrientt | Quantity/Unit | |
|---|---|---|---|---|---|---|
| 1. | Good sources of magnesium are: fruits or vegetables, nuts, peas and beans, soy products, whole grains and milk. | - | are ( | fruits or vegetables, nuts, peas and beans ( | Good sources of magnesium ( | - |
| 2. | The RDAs for Mg are 300 mg for young women and 350 mg for young men. | - | are ( | - | The RDAs for Mg ( | 300 mg for young women ( |
| 3. | Increase potassium by ordering a salad, extra steamed or roasted vegetables, bean-based dishes fruit salads, and low-fat milk instead of soda. | - | - | salad ( | Increase potassium ( | - |
| 4. | Babies need protein about 10 g a day. | Babies ( | need ( | - | protein ( | 10 g a day ( |
| 5. | 1 teaspoon of table salt contains 2300 mg of sodium. | - | contains ( | table salt ( | sodium ( | 1 teaspoon ( |
| 6. | Milk, cheese, yogurt and other dairy products are good sources of calcium and protein, plus many other vitamins and minerals. | - | are ( | Milk, cheese, yogurt and other dietary products ( | good sources of calcium and protein ( | - |
| 7. | Breast milk provides sufficient zinc, 2 mg/day for the first 4-6 months of life. | - | provides ( | Breast milk ( | sufficient zinc ( | 2 mg/day for the first 4-6 months of life ( |
| 8. | If you’re trying to get more omega-3, you might choose salmon, tuna, or eggs enriched with omega-3. | you ( | ’re trying to get ( | salmon, tuna, ( | more omega-3 ( | - |
| 9. | If you need to get more fiber, look to beans, vegetables, nuts and legumes. | You ( | need to get ( | beans, vegetables, nuts, and legumes ( | more fiber ( | - |
| 10. | Eating foods high in vitamin C and iron can reduce the absorption of ingested nickel. | - | can reduce ( | Eating foods ( | vitamin C and iron( | - |
| 11. | The body of a 76 kg man contains about 12 kg of protein. | - | contains ( | - | protein ( | The body of a 76 kg man ( |
| 12. | Excellent sources of alpha-linolenic acid, ALA, include flaxseeds and walnuts. | - | include ( | flaxseeds and walnuts ( | Excellent sources of alpha- linolenic acid ( | - |
| 13. | The recommended intake for total fiber for adults 50 years and younger is set at 38 g for men and 25 g for women, while for men and women over 50 it is 30 g and 21 g per day, respectively, due to decreased food consumption. | 50 years ( | is set ( | decreased food consumption( | The recommended intake for total fiber for adults ( | 38 g for men ( |
| 14. | I’m good at tennis. | - | - | - | - | - |
| 15. | Your hat looks very nice. | - | - | - | - | - |
Fig 5Distribution of documents per institution.
Fig 6Distribution of number of sentences per documents from scientifically validated web sites.
Fig 7Distribution of number of sentences per documents that are abstracts from scientific publications.
Evaluation results.
| Food | Nutrient | Quantity/Unit | |||||||
|---|---|---|---|---|---|---|---|---|---|
| TP | FP | FN | TP | FP | FN | TP | FP | FN | |
| Scientifically validated web sites | 326 | 5 | 22 | 243 | 0 | 13 | 47 | 0 | 2 |
| Scientific publications | 213 | 0 | 3 | 314 | 2 | 4 | 39 | 0 | 9 |
| Test corpora | 539 | 5 | 25 | 557 | 2 | 17 | 86 | 0 | 11 |
TP indicates true positives
FP indicates false positives
FN indicates false negatives