Report of the Budapest Workshop “Identification, location and temporal evolution of topics” by Sandor Soos

Report of the Budapest Workshop “Identification, location and temporal evolution of topics” by Sandor Soos

Identification, location and temporal evolution of topics

Data and algorithm – comparison of approaches

Library and Information Centre of the Hungarian Academy of Sciences, 29-30. Aug. 2016.

 

Conference summary

The timely problem

From science studies to research evaluation to science policy, there is an increasing need for trustworthy information on how the science system is organized and evolving, where research fronts are located etc. The branch of scientometrics called science mapping has developed a wide variety of methods to address such issues. In fact, it reached a point where a next generation of questions naturally arised: How to identify the most suitable methods? What benchmarks to use for validating results of topic detection and for delineating fields of science? How field experts and expertise shold be engaged? How, and to what extent, can research evaluation or science policy utilize, or even, built upon the results of science mapping? The workshop in Budapest, co-funded by the Knowescape Cost Action and the IMPACT EV FP7 project, and following a series of workshops in Berlin, Amsterdam, Istanbul (at ISSI 2015), was organized to address these problems, stated in the title of the corresponding special issue of Scientometrics as “Same data, different results”.

 

Bibliometric advancements and competing methodologies

Demonstrating the core concept of the workshop, Theresa Velden exposed the fundamental challenge stemming from the rich variety of bibliometric methods available for scientific topic detection. Based on a large-scale publication dataset on astrophysics, both citation-and-reference-based and text mining solutions, implemented in a joint exercise by expert gorups worldwide (CWTS, Ecoom, SciTech Inc, OCLC, etc.), were confronted. A systematic comparison between methods and the resulting topical structures for the field of astrophysics revealed that both the choice on data models (making use of citation links as direct citations, for bibliographic coupling or co-citation measurement) and extraction (clustering) algorithms significantly affect the topical landscape. It points towards the importance of selecting the method most tightly fitting the research or policy question at hand, which is probably both the solution and the main challenge behind topic identification. Beyond testing up-to-date variants of now-conventional methods acting on metadata, elaborating on (full)text mining  approaches in bibliometric settings was also an extensive branch of communication. Wolfgang Glänzel proposed statistically re(de)fined methods of mining the topical composition of scholarly corpuses, borrowed from quantitative linguistics and tagged as “nano-level” scientometrics for evaluative purposes. Haluk Bingol was focusing on citation analysis being sensitive to the textual context of citation, while George Kampis presented a “blindfolded” solution of uncovering topical dynamics within large-scale on-line textual data. As a corollary, the combination of citation- and text-based methods was presented by Edgar Schiebel,  who presented a sophisticated hybrid workflow of detecting research fronts based on various recent developments.

 

Algorithms: The physics of bibliometrics

Beyond data models (link- or text-based) and associated infoscience methods, another salient direction of the two-day discourse was the interplay and methodological overlap between bibliometrics and various scientific domains, regarding topic detection. Most prominently, expert from physics, the study of complex systems and complex networks presented valuable insights on how the advancements in network science could better be utilized in science mapping. Tim Evans introduced a rather unconventional approach of remodelling document citation networks within the framework of space-time geometry (“netometry”), to uncover topics and their evolution in a natural way. At the heart of Péter Pollner’s approach lied the succesful “cfinder” algorithm developed for complex networks to uncover overlapping communities (hence, topics) and their relations, grounding also the identification of changing roles for publications throughout their citation history; Gergely Tibély, from the same Hungarian research group, continued with a set of models tailored towards detecting hierarchies in complex networks, used in constructing a science map on the organization of disciplines via hierarchical ordering of scientific journals by citation relations.

 

Outreach: Interfaces with science policy

Being of outstanding importance, the issues and methods of mapping the science system (e.g. the delineation of fields) as a science policy tool played an important role in the workshop. Kevin Boyack triggered great interest by highlighting the findings of their recent research on hitherto neglected factors behind the research focus of nations, namely altruistic vs. economic motives, which study was utilizing their proposed high precision global science map. Petra Ahrweiler introduced a new project that utilizes knowledge mapping techniques and visual analytics to reveal the relations between societal expectations and European policies (such as New and Emerging Technologies, NEST and Responsible Research and Innovation, RRI). The interplay between science policy and science mapping was articulated by Sándor Soós while exposing the work done under the IMPACT EV FP7 project, the latter focusing on the impact of European SSH research. Science mapping, in this case, served as a tool for comparing the evolution and aspects of multidisciplinarity within social vs. natural sciences, in order to inform research evaluation practices targeting the outcome of EU funded SSH projects.

 

Lessons to learn

Complemented by a series of theme-oriented discussions and author panels, the workshop offered quite a lot to learn, in terms of both novel technical solutions and long-needed conceptual insights. Fundamental is the consensus that emerged from various discussions (including an author panel on an upcoming special issue of Scientometrics entitled Same data, different results, or a roundtable discussion on validation methods and future challenges, led by Andrea Scharnhorst, Jochen Gläser and Theresa Velden), that bibliometrics is a fast evolving field utilizing diverse methods, analytic frameworks, techniques from various scientific domains (cf. theory of complex networks), therefore, a smooth and more fruitful communication should take place between these domains. It would be necessary for avoiding the “black box” effect of transdisciplinary applications (as Jochen Gläser put it), that is to gain full awareness of built-in assumptions and scope of methods, of what is artifactual vs. real in mapping results. Also, better communication would assure that state-of-the-art methods infiltrated sooner into applications. Synergies between the workshop and the IMPACT EV project were also discussed to assist the characterization of SSH research with the the aid of science mapping.

 

Sandor Soos

PS: For the last version of the programme see: http://www.mtakszi.iif.hu/KNOWeSCAPE.php