After “Comparing algorithms for topic detection in science” – a workshop report by Andrea Scharnhorst

The workshop took place April 20-21 and during the two days this group continued its collaborative work on the comparison of algorithms. Since the last meeting September 2014 all involved group worked together at a shared dataset – labeled as The Berlin dataset – a set of publications and their citations in the field of astrophysics. Prior to the workshop, cluster solutions had been shared and first steps have been made to compare those cluster solutions: Do they produce the same number of clusters? Do they contain the same sets of documents? Are they stable inside of the method applied to produce them? Which organization of research do they show?

As bottom line we could state that there is not one clustersolutions, nor there is one best cluster solution. Methods relying on either direct citations, co-citations, lexical methods, they all present different perspectives into ordering scientific fields. Which one to be applied will also depend on the purpose of the clustering: evaluation, study of the dynamics of science, science policy advice or information retrieval.

At the ISSI2015 the results of this subgroup in WG 2 will be discussed with the wider expert community of bibliometricians. All participants agreed that it is only useful for the further professionalization of the field to open up the black box of clustering and to start a discourse on validity of methods. Our reporting about this will be continued.

As new kid on the block a team of OCLC-DANS joined the group and presented a visual interface which allows to see all cluster solutions next to each other.(see Koopman, R., Wang, S., & Scharnhorst, A. (2015). Contextualization of topics – browsing through terms, authors, journals and cluster allocations. (see Figure below)Arxiv Digital Libraries (cs.DL); Information Retrieval (cs.IR)

From participants of our workshops papers has been submitted to the ISSI 2015, which will be published in the proceedings. A special session “Same data ‐ different results?” will be hold Tuesday June 30, 2015.

Amsterdam, May 15, 2015

Andrea Scharnhorst

