| Literature DB >> 32288689 |
Gema Bello-Orgaz1, Jason J Jung2, David Camacho1.
Abstract
Big data has become an important issue for a large number of research areas such as data mining, machine learning, computational intelligence, information fusion, the semantic Web, and social networks. The rise of different big data frameworks such as Apache Hadoop and, more recently, Spark, for massive data processing based on the MapReduce paradigm has allowed for the efficient utilisation of data mining methods and machine learning algorithms in different domains. A number of libraries such as Mahout and SparkMLib have been designed to develop new efficient applications based on machine learning algorithms. The combination of big data technologies and traditional machine learning algorithms has generated new and interesting challenges in other areas as social media and social networks. These new challenges are focused mainly on problems such as data processing, data storage, data representation, and how data can be used for pattern mining, analysing user behaviours, and visualizing and tracking data, among others. In this paper, we present a revision of the new methodologies that is designed to allow for efficient data mining and information fusion from social media and of the new applications and frameworks that are currently appearing under the "umbrella" of the social networks, social media and big data paradigms.Entities:
Keywords: Big data; Data mining; Social media; Social networks; Social-based frameworks and applications
Year: 2015 PMID: 32288689 PMCID: PMC7106299 DOI: 10.1016/j.inffus.2015.08.005
Source DB: PubMed Journal: Inf Fusion ISSN: 1566-2535 Impact factor: 12.975
Fig. 1The conceptual map of Social BigData.
Fig. 2The MapReduce processes for counting words in a text.
Basic features related to social big data applications in marketing area.
| Authors | Ref. num. | Summary | Methods |
|---|---|---|---|
| Trattner and Kappe | Targeted advertising on Facebook | Real-time measures to detect the most valuable users | |
| Jansen et al. | Twitter as eWOM advertising mechanism | Sentiment analysis | |
| Asur et al. | Using Twitter to forecast box-office revenues for movies | Topics detection, sentiment analysis | |
| Ma et al. | Viral marketing in social networks | Social network analysis, information diffusion models |
Basic features related to social big data applications in crime analysis area.
| Authors | Ref. num. | Summary | Methods |
|---|---|---|---|
| Phillips and Lee | Decision support system (DSS) to analyse crime trends allowing to catch suspects | NLP, Similarity measures, classification | |
| Ku and Leroy | Technique to discover geospatial co-distribution relations among crime incidents | Network analysis | |
| Chainey et al. | Comparative assessment of mapping techniques to predict where crimes may happen | Spatial analysis, mapping methods | |
| Gerber | Identify discussion topics across a city in the United States to predict crimes | Linguistic analysis, statistical topic modelling | |
| Kirkos et al. | Identification of fraudulent financial statements | Classification (decision trees, neural networks and Bayesian belief networks) | |
| Quah and Sriganesh | Detect fraud detection in real-time credit card transactions | Neural network learning, association rules | |
| Li et al. | Identify the signs of fraudulent accounts and the patterns of fraudulent transactions | Bayesian classification, association rules |
Basic features related to social big data applications in health care area.
| Authors | Ref. num. | Summary | Methods |
|---|---|---|---|
| Culotta | Track and predict outbreak detection using Twitter | Classification (regression models) | |
| Aramaki et al. | Classify tweets related to influenza | Classification | |
| Bodnar and Salathé | Assess disease outbreaks from tweets | Regression methods | |
| Fisichella et al. | Detect public health events | Modelling trajectory distributions | |
| GPHIN | Identify information about disease outbreaks and other events related to public healthcare | Classification documents for relevance | |
| BioCaster | Monitoring online media data related to diseases, viruses, bacteria, locations and symptoms | Topic classification, named entity recognition, event recognition | |
| HealthMap | Global disease alert map | Mapping techniques | |
| EpiSpider | Human and animal disease alert map | Topic and location detection |
Basic features related to social big data applications in user experiences-based visualisation.
| Authors | Ref. num. | Summary | Methods |
|---|---|---|---|
| GGobi | Visualisation program for exploring high-dimensional data | Supervised Classification, Unsupervised Classification, Inference | |
| MIMO | Visualisation Framework for Real Time Decision Making in a Multi-Input Multi-Output System | Bayesian causal network, Decision Making Tools | |
| Insense | Collecting user experiences into a continually growing and adapting multimedia diary. | Classification of patterns in sensor readings from a camera, microphone, and accelerometers | |
| Many Eyes | Creating visualisations in collaborative environment from upload data sets | Visualisation layout algorithms | |
| TweetPulse | Building social pulse by aggregating identical user experiences | Visualising temporal dynamics of the thematic events |