| Literature DB >> 23631733 |
Antonio Jimeno Yepes1, Elise Prieur-Gaston, Aurélie Névéol.
Abstract
BACKGROUND: Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain.Entities:
Mesh:
Year: 2013 PMID: 23631733 PMCID: PMC3651320 DOI: 10.1186/1471-2105-14-146
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Availability of foreign language resources in MEDLINE as of October 28, 2011 (top 10 languages in number of citations)
| German | 756,385 | 220,426 | 59,506 |
| Russian | 639,845 | 196,865 | 418 |
| French | 624,878 | 176,988 | 17,270 |
| Japanese | 383,419 | 124,707 | 1,525 |
| Italian | 269,185 | 58,033 | 564 |
| Spanish | 265,410 | 92,655 | 5,439 |
| Chinese | 178,146 | 132,070 | 958 |
| Polish | 160,916 | 43,486 | 461 |
| Czech | 81,064 | 16,080 | 0 |
| Portuguese | 75,973 | 32,607 | 2,543 |
Description of parallel corpora obtained from MEDLINE data
| | ||||
|---|---|---|---|---|
| Number of citations | 14,815 | 14,817 | 3,371 | 3,371 |
| Number of sentences | 137,938 | 130,692 | 33,167 | 32,085 |
| Number of words | 2,699,851 | 2,863,638 | 676,092 | 760,863 |
Excerpts from parallel corpora obtained from MEDLINE data
| ENFR – PMID 9750586 | |
| ABEN-Partial avulsion of the middle turbinate is an unusual complication of nasotracheal intubation, while minor nasal mucosal trauma is common. We report a case in a 25 year-old healthywoman, diagnosed four years after nasotracheal intubation for removal of wisdom teeth under general anaesthesia, consisting in a unilateral nasal obstruction related to partial avulsion of the middle turbinate. | ABFR-L'avulsion partielle du cornet moyen est une complication inhabituelle de l'intubation nasotrachéale, alors que le traumatisme de la muqueuse nasale est plus fréquent. Nous rapportons le cas d'une patiente de 25 ans, sans antécédent particulier, qui après une intubation nasotrachéale pour extractions dentaires, a présenté une obstruction nasale unilatérale en rapport avec un arrachement du cornet moyen sur toute sa longueur, avec bascule en arrière bloquant la choane. |
| ENES – PMID 19447450 | |
| ABEN-Bacterial vaginosis is a widely spread health problem with multiple connotations. It has been the subject of many studies and work during decades and it still remains a polemic entity, with contradictory finding. The polymicrobian etiology, unsolved epidemiology, obstetrico-gynecological complications and high recurrence rate following treatment, make this infection a target for researchers. It is not an inflammatory process -yet an immune response exists. In this disorder, vaginal discharge increases, and it is associated with a high risk of developing sexually transmitted diseases. | ABES-La vaginosis bacteriana es un problema de salud ampliamente difundido, con múltiples connotaciones. Ha sido objeto de gran cantidad de estudios y trabajos desde hace décadas y aun en la actualidad sigue siendo una entidad polémica y de resultados contradictorios. La etiología polimicrobiana, la epidemiología no aclarada, las complicaciones obstetroginecológicas y la alta frecuencia de recurrencias tras el tratamiento hacen de esta infección un objetivo para los investigadores. No es un proceso inflamatorio, pero existe una respuesta inmunitaria, cursa con un aumento de flujo vaginal y está asociada a un aumento del riesgo de adquisición de enfermedades de transmisión sexual. |
Results of systematic corpus evaluation using edit distance
| TIFR | 14,817 | 5.82 | 325 |
| TIEN | 14,815 | 5.99 | 347 |
| ABEN | 14,089 | 8.20 | 1,180 |
| ABFR | 14,153 | - | - |
| TIES | 3,371 | 6.82 | 70 |
| TIEN | 3,371 | 4.39 | 14 |
| ABEN | 2,961 | 7.50 | 148 |
| ABES | 2,968 | - | - |
Analysis of error causes in extracted data
| Correct | 15 | No TIFR in MEDLINE | 6 |
| TIFR difference in MEDLINE vs. publisher | 9 | ||
| Incorrect | 90 | Inverted EN/FR Incomplete title extraction | 60 |
| Keyword extraction instead of title | 18 | ||
| Incomplete abstract extraction | 8 | ||
| 4 | |||
| Correct | 59 | Title difference in MEDLINE vs. publisher | 9 |
| No TIES in MEDLINE | 49 | ||
| Abstract difference in MEDLINE vs. publisher | 1 | ||
| Incorrect | 174 | Title difference in MEDLINE vs. publisher | 7 |
| Incomplete abstract extraction | 66 | ||
| Erroneous ABEN extraction | 101 | ||
Manual evaluation of hunalign alignments
| | ||||
|---|---|---|---|---|
| Sentences are unrelated | 2 | −0.05094 | 7 | 0.103413 |
| Some common content, and additional content | 1 | −0.14269 | 9 | 0.284214 |
| ES (resp. FR) has content not covered in EN | 11 | 0.297399 | 16 | 0.303225 |
| EN has content not covered in ES (resp. FR) | 14 | 0.571879 | 14 | 0.402056 |
| Sentences are aligned | 70 | 0.623229 | 54 | 0.497464 |
Alignment of abstract sentences using hunalign
| | ||||
|---|---|---|---|---|
| 137,938 | 130,692 | 33,167 | 32,085 | |
| 86,645 | 86,645 | 23,316 | 23,316 | |
| 52,094 | 52,094 | 16,120 | 16,120 | |
Example of discarded sentences based on their hunalign score
| The results of prior endoscopic analysis were normal. ~~~ The presence of multiple fundic gland polyps was detected as was their disappearance 6 months after treatment cessation. | No suelen asociar componente displásico.Se describen 4 casos de pacientes en tratamiento crónico con IBP, con endoscopia previa normal, en los que se detectó la presencia de múltiples pólipos de glándulas fúndicas, y se constató su desaparición a los 6 meses tras la supresión del tratamiento. |
Example of sentence properly aligned based on their hunalign score
| Background and objective: Detection of asymptomatic peripheral arterial disease increases the risk of vascular morbibity and mortality. | Fundamento y objetivo: la detección de arteriopatía periférica silente mediante el índice tobillo-brazo (ITB) incrementa el riesgo de enfermedad y muerte vasculares. |
Distribution of corpus sentences in translation experiments
| 458,543 | 57,317 | 57,317 | |
| 28,882 | 28,882 | 28,881 | |
| 17,351 | 17,365 | 28,881 | |
| 198,512 | 24,814 | 24,814 | |
| 7,772 | 7,772 | 7,772 | |
| 5,403 | 5,418 | 7,772 | |
BLEU metric, training on title corpus and title + abstract sentences corpus decoding results
| newstest2011 | 14.09 | 15.40 | 20.94 | 21.19 | |
| | 12.00 | 12.82 | 18.43 | 18.77 | |
| | 14.19 | 15.19 | 19.59 | 20.29 | |
| | | ||||
| 47.39 | 47.93 | 49.93 | 50.63 | ||
| 16.53 | 18.28 | 23.36 | 24.03 | ||
| 19.29 | 21.12 | 25.00 | 25.59 | ||
| | | ||||
| 47.01 | 48.05 | 49.82 | 50.58 | ||
| 20.81 | 22.54 | 28.24 | 28.15 | ||
| 24.25 | 25.78 | 29.98 | 30.40 |
Fluency and adequacy values for the manual evaluation of the translations
| 2.96 | 3.42 | 3.33 | 3.46 | |
| 4.21 | 4.11 | 3.83 | 3.98 |
Fluency and adequacy examples
| Dès lors que la gpa ne contredit aucun de nos droits fondamentaux, on ne peut que souhaiter qu'elle puisse devenir une indication médicale de fiv. | Methods: theoretical sampling and advantages. | |
| De plus, les individus anxieux manifesteraient une tendance à l’inhibition du partage social des émotions (r=0,26; p=0,05). | L’imagerie montrait un nodule de 1,7 cm du corps pancréatique. | |
| L’imagerie montrait un nodule de 1,7 cm du corps pancréatique. | Cette lésion géodique qui ne semble pas être aussi rare au niveau du carpe, peut être découverte par hasard ou rarement par des douleurs du poignet, exceptionnellement par une fracture. | |
| En el presente artículo se describen el diseño y los principales objetivos de un ensayo clínico para evaluar la eficacia y la seguridad del losartán . | Dans les pays industrialisés, la gastroentérite aiguë pédiatrique à rotavirus (geapr) est responsable d’une morbidité élevée. | |
| L’évolution a été favorable dans 90% des cas. | L’évolution a été favorable dans 90% des cas. | |
This table shows examples of scored sentences (in bold) with the corresponding original source sentences. Portions of inadequate text (for fluency) or missing/untranslated content (for adequacy) are underlined. The examples for Adequacy 1 and Fsluency 4 are taken from the ENES corpus while all other examples are from the ENFR corpus.
Excerpts of abstract sentences translated based on the model trained on title and abstract sentences