STSM Organization of knowledge in large datasets: clustering algorithms and strategies

July 3, 2016 - July 13, 2016

COST STSM Reference Number: COST-STSM-TD1210-33910
Period: 2016-07-03 to 2016-07-13
COST Action: TD1210
STSM type: Regular (from Italy to Poland)
STSM Applicant: Prof Giulia Rotundo, Faculty of Economics, Sapienza University of Rome, Rome (IT),
STSM Topic: Organization of knowledge in large datasets: clustering algorithms and strategies
Host: Piotr Wezyk, Faculty of Forestry, University of Agriculture in Krakow, Cracow (PL)

The aim of the STSM is twofold:
1) from one side, analysing data mining techniques for the application to the detection of communities in complex networks;
2) on the other side, exploring the application of methods for detecting clusters in complex networks for data mining purposes.
On 1), algorithms currently used in big data are going to be considered for the application to complex networks. The starting
point is the exploration of Velickov et al. and Vieira et al.. In such papers, the data mining phase involves the application of
techniques in order to extract patterns of interest for the effective production of knowledge. A specific algorithm, named J48, is
used. Such algorithm is an implementation of the machine learning introduction algorithm (enhanced version of the ID3
algorithm know as C4.5 (Quinlan 1992), a mainstream one), that chooses an attribute to split the data into two subsets based
on the highest normalized information gain. The algorithm repeats this procedure on each subset until all instances in this
subset belong to the same class. The procedure allows to create a decision tree, that reduces the search and classification
time. Subsets belonging to the same class constitute nodes in the decision tree. The analysis is going to continue examining
algorithms used by the host group, that need them for organizing the new free satellite data of the ESA (Sentinel2).



July 3, 2016
July 13, 2016
Piotr Wezyk


