The definition of what is a “topic” in science represents a baseline for a lot of follow-up questions such as how to define research diversity, how to measure interdisciplinary, or how to identify breakthrough research. As reported elsewhere in Berlin since six years Jochen Gläser and colleagues organize a small-scale workshop with bibliometric experts to see how this fundamental problem for understanding the science system can be supported by quantitative means.
Quite uniquely this summer workshop “Measuring the Diversity of Research”, August 29-30, 2014 got the participants engaged in applying their specific methods of clustering and mapping to a shared, cleaned dataset of publications from astrophysics. The topical landscape of a field or subfield is not easy to be determined automatically. Depending which information signals are used, e.g. lexical elements or references, we get another mirror image of the field. Science is made up from a dense fabric of thoughts and results. Putting them into disjunct classes always misses certain aspects. What is an optimal compromise in the delineation of a topic also very much depends on the question to be answered with the measurement.
This workshop brought together the application of so-called hybrid methods, whereby bibliographic clustering is combined with lexical/textual analysis (Wolfgang Glänzel) with direct citation methods (Nees Jan van Eck, Theresa Velden), Frank Havemann) and with co-citations methods (Kevin Boyack).
The method core of the contributions to the workshop concerns clustering algorithms which run over complex networks. Their design is a research problem in itself. But networks are common also to other projects in the Computational Humanities programme, such as the Elite Network Shifts. There is also another communality which links Digital Humanities and Bibliometrics. This is the problem how to allocate features or characteristics of objects, events or persons to a classification system, or in other words, how to define relevant dimensions in networked information spaces. Because of this underlying information theoretical problem such scientometric exercises have a meaning beyond bibliometrics and research evaluation.
The Berlin workshop also proved that working together with one dataset is an excellent way to test methods, and to their meaning. While this seems to be evident epistemologically, it does not that often appear in scientific practices. Which is why data sharing but equally organizing hands-on activities on shared data is so important. Trust is the decisive ingredient for such a practice change. Going together with hospitality, friendliness, and focused but unhasted atmosphere of discussion this makes a workshop like to past one a treat for knowledge workers, for which we all thank Jochen, Michael and Frank as local organizers!
For KnoweScape this workshop contributes to the wider shared interest into information spaces resulting from scholarly communication and its implications for research evaluation and impact measurements. The Berlin meeting is also an example how ongoing national and international research projects trigger the emergence of a COST Action, and how by this COST Action national research groups are broad together, new alliances are formed and ideas for future research are born. In Leiden last April already members of our TD1210 community met who are interested in this particular application area for knowledge maps. In our second year, we have events, e.g. in Zurich in February 2015 which build on this. From the Berlin workshop new ideas emerged to be discussed at the Second Annual KnoweScape Conference in Thessaloniki in November for disseminating ideas and results beyond the TD120 community.
Andrea Scharnhorst