STSM report of Sarven Capadisli on Statistical Linked Dataspaces

STSM report of Sarven Capadisli on Statistical Linked Dataspaces

I (Sarven Capadisli) am a statistical Linked Data researcher at the E-Government-Institut in Bern University of Applied Sciences, and I’m also working on my PhD under Prof. Sören Auer‘s supervision at University of Bonn. In March I visited the  eHumanities group, part of Royal Netherlands Academy of Arts and Sciences, in The Netherlands for a Short Term Scientific Mission.

The purpose of the STSM was in two fold: to exchange state of the art techniques in publishing and consuming statistical Linked Data on the Web, and to investigate areas where the knowledge from both parties – myself and the specialists from the Data2Semantics project (VU Amsterdam, UvA), and the CEDAR project (DANS-KNAW, eHumanities group) – can lead to research and development of semantic statistical tooling, as well as working towards common papers.

The initial days of the scientific exchange were carried out by revisiting my work on statistical linked dataspaces and analysis, and Albert Meroño Peñuela‘s (PhD student at VU Amsterdam, and part of the CEDAR project) work on statistical concept drifts. More specifically, discussions revolved around linked statistical concept versioning and provenance (use of concept versions in statistical analysis, and visualisation of concepts with time series (e.g., how/when concepts are created, mutated, resolved), and attempting to evaluate the degree of comparability of the statistical concept schemes and concepts across agencies (or one agency/organization) and across time.

Parallel discussions on publishing, visualisations, analysis and preservation of cultural data took place with Dr. Christophe Guéret, post-doctoral researcher at VU Amsterdam, and KNOWeSCAPE‘s WG4 leader, as well as with Dr. Andrea Scharnhorst, Head of e-research at Data Archiving and Networked Services (DANS).

Albert and I also explored the feasibility of developing a context-aware time series plots by enriching base data. The idea was to bring in a layered views e.g., wars/conflicts, climate change, finances, which can be used in time series analysis. They created a working pilot based on wars and conflicts which will be integrated as part of the Linked Statistical Data Analysis platform.

I gave a presentation at the eHumanities research meeting on my 270a.info work, which included a walk through of some of the “interesting bits” (e.g., transformations, provenance, comparability, interlinking, URI design, interfaces) based on actual implementations of statistical Linked Dataspaces from 2010 to present. The design and realization of a Linked Analysis service at stats.270a.info that is intended to work towards creating citizen-centric interfaces, as well as the path to create large amounts of statistical linked analysis that discoverable by human and machine-friendly. The presentation was wrapped up with discussions and lessons learned, both, from my experience as well as the attendees at he meeting.

By the end of the week, Albert and I tried to identify the missing gaps in our respective research areas, and if possible, where they may overlap. The purpose of this was to put the knowledge obtained during the week together and future work. It was concluded that two research papers can be materialized. One on the use of predicting concept change across datasets leading to: policy making, determining concept complexity, and maintenance of concepts in systems, and if possible, a deterministic way of detecting concept change. The other paper was to investigate the correlation between the strongly correlated indicators and semantic similarity between the statistical concepts within those indicators.

One additional outcome of the visit was given my overlapping work with that of KNOWeSCAPE’s. I accepted Dr. Andrea Scharnhorst’s invitation to join KNOWeSCAPE’s WG’s 3 and 4.

Sarven
http://csarven.ca/#i