Report on the STSM “Concept Shifts in Knowledge Classification”
Venue: eHumanities DANS/KNAW: Amsterdam -The Hague, 4 -10 May 2014
This purpose of this Short Term Scientific Mission meeting was to discuss and plan further research, papers and events dealing with the presentation of evolution of knowledge and shifts in knowledge organization over time. The idea behind this collaboration is the use the Universal Decimal Classification (UDC) as a testbed (UDC Summary http://www.udcc.org/udcsummary/; the complete UDC content online http://www.udc-hub.com/). The UDC is a knowledge organization system covering all areas of knowledge which has been widely used in bibliographic collections for over hundred years (read more: http://en.wikipedia.org/wiki/Universal_Decimal_Classification). The UDC itself is available in a well structured data formats but most importantly the UDC is present in metadata in many bibliographic collections.
In planning this STMS we identified that the following areas of researchwould fit very well into the KNOWeSCAPE framework: a) proposing and testing a solution for exposing UDC and its historical data as linked data; b) exploring how UDC linked data and particularly the historical data can be used in supporting library linked data scenarios in cross-collection and cross-language searching; c) exploring how both the linked data or the UDC data format as it stands now can be used in visualizing complex knowledge schemes and in particular what such visualization would reveal in terms of knowledge discovery once we analyse its use of UDC in the living bibliographic collections (scientific articles, books, journals).
To learn more about historical UDC data in The UDC archive in The Royal Library in The Hague, I met with Dr. Gerhard Riesthuis (UDC editor) whose experience in managing UDC changes spans forty years. The expression ‘historical data’ concerns the changes in the UDC scheme that have happened over time caused by the changes in knowledge (for an illustration see, for instance, an overview of recent revisions at http://www.udcc.org/index.php/site/page?view=major_revisions). We established that the amount of historical data in the UDC archive that would need to be digitized would require long term planning but once done it would be possible to explore shifts in concept and knowledge from the beginning of 20th century to date. The general conclusion was that for such research it would be best to focus on a specific scientific field where we can easily trace the change in scientific approach to classification of plants and animals or look into concepts of places which would show the geopolitical changes in Europe since 1900. The meeting with Christophe Guéret (eHumanities DANS KNAW) in Amsterdam was an opportunity to look into some options and problems in publishing historical UDC data as linked data. The main issue is to determine a way of formulating URIs so as to enable m2m connections between library collections and the UDC scheme, once published as linked data. Here we experience a problem created due to changes and the shift in meaning between UDC codes as used in libraries and the latest version of the UDC scheme. Christophe’s experience with the CEDAR project (http://www.cedar-project.nl/) formed some of the solution we think we may implement in publishing versions of UDC. He suggested that porting UDC data using XML/RDF makes it much easier to process and access data and opens many opportunities for data presentation that would be more difficult to achieve using a conventional relational database structure. We also looked into a typology of changes that are constantly performed on the long living knowledge schemes and how this affects the meaning. We agreed that a write-up of our findings would be useful to a wider audience as the problem described here concerns all knowledge organization systems and terminological tools.
During the meeting with Andrea Scharnhorst in Amsterdam we investigated more concrete steps in how to put the initial research ideas into action. We agreed that the visualization of UDC may be an interesting research area and that access to data and the work done so far may be put to better use. The research on visualization of UDC is actually two-fold. Firstly we have a challenge of visualizing complex semantic relationships between knowledge areas in UDC as it stands now. Secondly we have an option to explore visualization of knowledge evolution. For this kind of research we may be able to make more progress if we follow Christophe’s plans for exposing historical data as XML/RDF. We also discussed an option of exploring patterns that would emerge if we would establish links between concepts/classes in library collections, scientific papers or research projects and major publishers using Book Industry Communication (BIC) categories (http://www.bic.org.uk/7/BIC-Standard-Subject-Categories/) and Book Industry Subject Headings (BISAC) (https://www.bisg.org/complete-bisac-subject-headings-2013-edition). For this kind of research we would have to look into the mapping of UDC to other classifications, BIC/BISAC or even some patent classifications. Existing mappings of UDC can be used for this purpose and we can also classify research projects using UDC and see whether we can follow this line of research.
On the final day we met in The Hague to draw a more detailed plan for the subsequent steps. Christophe has explained his ideas about managing the scheme versioning and some concrete steps were agreed about visualization. Following Andrea’s suggestions on visualization we discussed the best ways of presenting hierarchical and associative relationships in classification and especially the difficulty of presenting the distribution of the same concept as it is studies in various fields of knowledge. It was agreed that we should look into visualization solutions such as D3 Data driven documents (http://d3js.org/) or Eyeplorer (Andrea demonstrated its application on Wikipedia http://en.vionto.com/show/me/eyePlorer.com). We decided to look into current interface of the UDC Online (http://www.udc-hub.com/index.php) and UDC Summary for both inspiration and possible place to implement visualization solutions. We made some preliminary plans for a journal paper that would create more awareness of the importance of publishing historical data for knowledge discover in library collections. The Second Annual KNOWeSCAPE conference in Thessaloniki and subsequent WG 1 meeting: Evolution of classification systems, currently provisionally planned for the beginning of March 2015 would be opportunities to present the results of the research. We may also report on some of this work at NKOS workshop in London in September 2014 (at TPDL conference http://www.dl2014.org/)
During my stay in Amsterdam I also attended a regular research meeting at eHumanities. At this meeting, Lora Aroyo from Vrije University Amstedam presented her research on human-assisted computing for understanding of events in cultural heritage that can enhance information discovery and search expansion. The talk entitled “Interpretation CrowdTruth for digital hermeneutics” was followed by an interesting discussion on issues related to events-based interpretation (http://www.slideshare.net/laroyo/crowdtruth-for-digital-hermeneutics).