| Literature DB >> 32426247 |
Viktor Sebestyén1, Endre Domokos2, János Abonyi1.
Abstract
The proposed multilayer network-based comparative document analysis (MUNCoDA) method supports the identification of the common points of a set of documents, which deal with the same subject area. As documents are transformed into networks of informative word-pairs, the collection of documents form a multilayer network that allows the comparative evaluation of the texts. The multilayer network can be visualized and analyzed to highlight how the texts are structured. The topics of the documents can be clustered based on the developed similarity measures. By exploring the network centralities, topic importance values can be assigned. The method is fully automated by KNIME preprocessing tools and MATLAB/Octave code.•Networks can be formed based on informative word pairs of a multiple documents•The analysis of the proposed multilayer networks provides information for multi-document summarization•Words and documents can be clustered based on node similarity and edge overlap measures.Entities:
Keywords: Document clustering; Multi-document summarization; Network similarity; Text-mining
Year: 2020 PMID: 32426247 PMCID: PMC7226890 DOI: 10.1016/j.mex.2020.100902
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Fig. 1The process of the proposed multilayer network-based comparative document analysis method.
Fig. 2The preprocessing workflow developed in KNIME.
Fig. 3The dendrogram-based clustering of the analyzed countries.
Fig. 4Similarity-based 2D visualization of countries around the world based on published VNRs.
| Subject Area: | Computer Science |
| More specific subject area: | |
| Method name: | |
| Name and reference of original method: | |
| Resource availability: |
| for |
| for |
| for |
| if |
| |
| end if |
| CK=1-L_sim; %L_sim contains the similarity measures of the M documents, so it is an |
| ZCK = linkage(CK,'complete'); |
| figure ( |
| [H,T,outperm_ck]= dendrogram(ZCK,0,'Orientation','left','Labels',listcountries); |
| figure ( |
| clf |
| Y = mdscale(L_sim,2); %,'Start','random','criterion','sammon' |
| plot(Y(:,1),Y(:,2),'w.') |
| hold on |
| text(Y(:,1),Y(:,2),listcountries,'Fontsize',8); |