| Literature DB >> 35284247 |
Arash Hajikhani1, Lukas Pukelis2, Arho Suominen1,3, Sajad Ashouri1, Torben Schubert4,5, Ad Notten6, Scott W Cunningham7.
Abstract
This paper demonstrates a method to transform and link textual information scraped from companies' websites to the scientific body of knowledge. The method illustrates the benefit of Natural Language Processing (NLP) in creating links between established economic classification systems with novel and agile constructs that new data sources enable. Therefore, we experimented on the European classification of economic activities (known as NACE) on sectoral and company levels. We established a connection with Microsoft Academic Graph hierarchical topic modeling based on companies' website content. Central to the operationalization of our method are a web scraping process, NLP and a data transformation/linkage procedure. The method contains three main steps: data source identification, raw data retrieval, and data preparation and transformation. These steps are applied to two distinct data sources.Entities:
Keywords: Economic classification scheme; Knowledge transformation; Natural language processing; Web scraping
Year: 2022 PMID: 35284247 PMCID: PMC8914545 DOI: 10.1016/j.mex.2022.101650
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Fig. 1Methodological process for transforming companies' website scrapped content to the scientific literature.
MAG data description.
| MAG Entity Descriptive | |
|---|---|
| Publications | 268,618,709 |
| Authors | 279,177,391 |
| Field of Studies (FOS) | 714,595 |
| Conferences | 4550 |
| Journals | 49,062 |
| Institutions | 27,057 |
Fig. 2Web scrapping stages.
| Subject Area; | Economics and Finance |
| More specific subject area; | |
| Method name; | |
| Name and reference of original method; | |
| Resource availability; |
Examples of FOS tagging.
| Text (from a publication abstract) | FOS tags with normalized percentage scale ratio of dominance (0:not similar - 100:similar) | ||
|---|---|---|---|
| Retinitis pigmentosa (RP) is an inherited retinal dystrophy caused by the loss of photoreceptors and characterized by retinal pigment deposits visible on fundus examination. Prevalence of non syndromic RP is approximately 1/4000. The most common form of RP is a rod-cone dystrophy, in which the first symptom is night blindness, followed by the progressive loss in the peripheral visual field in daylight, and eventually leading to blindness after several decades. Some extreme cases may have a rapid evolution over two decades or a slow progression that never leads to blindness. In some cases, the clinical presentation is a cone-rod dystrophy, in which the decrease in visual acuity predominates over the visual field loss. RP is usually non syndromic but there are also many syndromic forms, the most frequent being Usher syndrome. To date, 45 causative genes/loci have been identified in non syndromic RP (for the autosomal dominant, autosomal recessive, X-linked, and digenic forms). Clinical diagnosis is based on the presence of night blindness and peripheral visual field defects, lesions in the fundus, hypovolted electroretinogram traces, and progressive worsening of these signs. | {“Progressive visual loss”: 32, “Visual field loss”: 28, “Cone-Rod Dystrophy”: 28, “Complete Blindness”: 28, “Dystrophy”: 27, “Visual field”: 27, “Meridian (perimetry, visual field)”: 26, “Functional visual loss”: 26, “Visual field test”: 26, “Total blindness”: 26, “Visual field testing”: 25, “Blindness”: 25, “Quadrantanopia”: 24, “Visual Disturbance”: 24, “Visual defects”: 24, “Sudden visual loss”: 24, “Hemianopsia”: 24, “Anterior Visual Pathway”: 23, “Visual Physiology”: 23, “Visual deficit”: 23, “Visual changes”: 23, “Retinal Dystrophies”: 22, “Goldmann perimetry”: 22, “Visual Disorders”: 22, “Visual prosthesis”: 22, “Transient blindness”: 22, “Cone dystrophy”: 21, “Retinitis pigmentosa”: 21, “Visual structure”: 21, “Visual research”: 21, “Visual symptoms”: 21, “Gene therapy of the human retina”: 21, “Visual system”: 21, “Automated static perimetry”: 21, “Visual abnormalities”: 21, “Visual approach”: 20, “Visual phenomena”: 20, “Visual rhetoric”: 20, “Visual Manifestations”: 20, “Gaze-contingency paradigm”: 20, “Peripheral vision”: 20, “Optic disk pallor”: 20, “Visual space”: 20, “Visual processing”: 20, “Central vision”: 20, “Vision for perception and vision for action”: 20, “Visual rehabilitation”: 20, “Macular dystrophy”: 19, “Cortical blindness”: 19, “Visual capture”: 19, | ||
| Molecular diagnosis can be made for some genes, but is not usually performed due to the tremendous genetic heterogeneity of the disease. Genetic counseling is always advised. Currently, there is no therapy that stops the evolution of the disease or restores the vision, so the visual prognosis is poor. The therapeutic approach is restricted to slowing down the degenerative process by sunlight protection and vitaminotherapy, treating the complications (cataract and macular edema), and helping patients to cope with the social and psychological impact of blindness. However, new therapeutic strategies are emerging from intensive research (gene therapy, neuroprotection, retinal prosthesis). | “Biased Competition Theory”: 19, “Visual behavior”: 19, “Visual control”: 19, “Visual sensory”: 19, “Visual threshold”: 19, “Visual language”: 19, “VISUAL TRAINING”: 19, “Nyctalopia”: 19, “Visual technology”: 19, “Central scotoma”: 19, “Slow progression”: 19, “Visual communication”: 19, “Visual snow”: 19, “Visual sociology”: 18, “Leber's congenital amaurosis”: 18, “N2pc”: 18, “Change blindness”: 18, “Tunnel vision”: 18, “Visual hierarchy”: 18, “Neuro-ophthalmology”: 18, “Blindsight”: 18, “Humphrey visual field”: 18, “Leber congenital amaurosis”: 18, “Optic neuropathy”: 18, “Goldmann perimeter”: 18, “Visual testing”: 18, “Posterior ischemic optic neuropathy”: 18, “Visual N1”: 17, “Ischemic optic neuropathy”: 17, “Visual thinking”: 17, “Visual culture”: 17, “Tangent screen”: 17, “Visual Suppression”: 17, “Bilateral blindness”: 17, “Retinal degeneration”: 17, “Visual Objects”: 17, “Legal blindness”: 17, “Visual reasoning”: 17, “Vision therapy”: 17, “Visual estimation”: 17, “Visual Ergonomics”: 17, “Visual angle”: 17, “Visual impairment”: 16, “Homonymous hemianopsia”: 16, “PDE6B”: 16, “RPE65”: 16, “Visual phototransduction”: 16, “Molecular therapy”: 16, “Visual appearance”: 16, “Visual contrast”: 16} | ||
| Company name and website | NACE codes | Company's website keywords | FOS tags with normalized percentage scale ratio of dominance (0:not similar - 100:similar) |
| “WOLFVISION GMBH”, “www.wolfvision.at” | NACE Rev. 2 main section: C-Manufacturing | {“hybrid learning”: 125, “screen sharing”: 108, “collaborative learning”: 85, “learning collaborative”: 68, “on-screen display”: 68, “remote management”: 64, “educational institution”: 50, “student education”: 50, “county court”: 38, “imaging quality”: 37, “web conferencing”: 36, “co working”: 33, “public research”: 31, “class collaboration”: 28, “firmware version”: 28, “image storage”: 28, “single center”: 27, “core product”: 26, “additional feature”: 24, “lecture capture”: 22, “online teaching”: 21, “matrix solution”: 21, “light system”: 21, “teaching staff”: 20, “health science”: 20, “classroom teaching”: 19, “core system”: 18, “provide access”: 16, “administration of justice”: 16, “bring your own device”: 15} | {“Collaboration tool”: 22, “Presentation logic”: 19, “Mass collaboration”: 18, “Virtual collaboration”: 18, “Social collaboration”: 17, “Distributed collaboration”: 17, “Meeting Request”: 17, “Variable presentation”: 16, “Electronic meeting system”: 15, “Meeting Reports”: 15, “Collaborative software”: 14, “Hybrid system”: 14, “Online document”: 14, “Presentation Manager”: 14, “Mobile collaboration”: 13, “Whiteboard”: 13, “Fixed wireless”: 13, “Face Presentation”: 13, “Presentation layer”: 13, “Technical Presentation”: 13, “Wireless site survey”: 13, “Sales presentation”: 13, “Scientific collaboration network”: 13, “Collaboration”: 12, “Meeting Abstracts”: 12, “Wi-Fi array”: 12, “Wireless Internet Protocol”: 12, “Disease Presentation”: 12, “Wireless network interface controller”: 12, “Hybrid material”: 12, “Bring your own device”: 11, “Wireless LAN controller”: 11, “Wireless intrusion prevention system”: 11, “Wireless grid”: 11, “Multimedia”: 11, “Hybrid”: 11, “Online presence management”: 11, “Transverse presentation”: 11, “Motorola Canopy”: 11, “Authoring system”: 11, “Online participation”: 11, “Online learning community”: 11, “Content creation”: 11, “CAPWAP”: 11, “Content management”: 11, “Web annotation”: 11, “Hybrid intelligent system”: 11, “Meeting Material”: 11, “Online research methods”: 11, |
| “Municipal wireless network”: 11, “Computer-supported collaborative learning”: 11, “Computer-supported cooperative work”: 11, “Asynchronous learning”: 10, “Team meeting”: 10, “Online discussion”: 10, “Wireless security”: 10, “Document engineering”: 10, “Document management system”: 10, “Wireless Application Protocol”: 10, “Web content management system”: 10, “Vision document”: 10, “Web document”: 10, “Learning Management”: 10, “Application sharing”: 10, “Computer-mediated communication”: 10, “Quality documents”: 10, “Online computer”: 10, “Web content”: 10, “Web 2.0″: 10, “Cross-presentation”: 10, “Online community”: 10, “Collaborative learning”: 10, “Online help”: 10, “Educational technology”: 10, “Media space”: 10, “Certified Wireless Network Administrator”: 10, “Wireless”: 10, “Summary (document)”: 10, “Vertex Presentation”: 10, “Synchronous learning”: 10, “Blended learning”: 10, “Wireless USB”: 10, “Hybrid power”: 10, “Supply chain collaboration”: 10, “Virtual learning environment”: 10, “Wireless WAN”: 10, “Collaborative editing”: 10, “Teleconference”: 10, “Computer-assisted web interviewing”: 10, “Classroom management”: 10, “Collaborative engineering”: 10, “Online identity”: 10, “Hybrid Bond”: 9, “Source document”: 9, “Hybrid zone”: 9, “Living document”: 9, “Fetal Presentation”: 9, “Online degree”: 9, “Open classroom”: 9, “Wi-Fi”: 9} | |||