Pilot 3: A data journal for the Arts and Humanities
The goal of the pilot study was to define a framework for a data journal in the Humanities and to provide a related action plan. The pilot investigated how quality assessment and (open) peer review can be applied to research data in the Humanities.
In a first phase current practices and requirements of the Humanities data community were analysed, and the Humanities data field mapped based on existing research. In a second phase these insights were complemented with insights from two OpenUP workshops. Special attention was laid on infrastructure and technological requirements as well as on scholarly communication processes involved.
Based on existing e-Infrastructures and practices of Humanities research groups the pilot analysed and demonstrated the feasibility of a basic workflow that will combine the publication of data with commenting and reviewing systems. The research setting is provided by DARIAH, their extended network of Humanities research groups and by the research groups related to the Campus Labor at the University of Göttingen. The study builds on desk research based on reports and survey executed by DARIAH projects. Other inputs for the study were provided by results from our workshops.
There are numerous barriers researchers encounter in data exchange processes. One of the main obstacles is the generally closed world of scientific discourse in the Humanities. Much of the work in European Humanities research is not visible. The disperse research communities often fail to connect to one another because of the language barriers. Humanities scholars very often publish in their national languages, and the trend is to continue doing so in the future. Europe lacks an integrated database of published journals in various national languages. A database of this kind could be a sort of ‘who’s who’ within a particular field of research.
Another barrier relates to the actual research data. Due to lack of standards and common guidelines in data managements, it is very difficult to connect data. There are initiatives on a EU level, which work toward a more unified Humanities research landscape. CLARIN, the Common Language Resources and Technology Infrastructure, is focused on integrating language data across Europe. DARIAH, the Digital Research Infrastructure for the Arts and the Humanities, is more focused on increasing the visibility at the European level of national research related to cultural heritage, digital arts, etc. These two projects provide a positive direction of development in this field. Both projects try to fill in the gaps where no data exists and try to connect data where it does exist but lives a life of its own in an unconnected place.
Within the context of this pilot, we have examined projects that can serve as best practices for publishing data and building an infrastructure for advancing data sharing. There are initiatives focusing on widening the access to data through the development of digital archives that are reusable in an open access framework. Since the EC supports research infrastructure (RI) developments in the Humanities with special attention to the field of Digital Humanities, there are several projects, such as DARIAH ERIC, DIGILAB, KPLEX projects, that have the agenda of creating RIs, including the development of networks of facilities and resources and services offered to research communities to support their work.
The development of a data journal framework involves a description of the communication flow and a breakdown of this process into single steps. The data journal framework should include the following attributes:
- the assignment of persistent identifiers (PIDs) to datasets,
- peer review of data
- metadata information and technical check
- links to related outputs (journal articles)
- facilitation of data citation
- standards compliance
- discoverability (indexing of the data).
During the small group discussion section of the workshop the topic of data sharing and data availability was examined more in-depth. Participants were given a poster on which they could record examples of good practice, barriers and challenges to implementation, and (based on these barriers) what actions should be taken, by whom. The group rotated so that each group moved on to evaluate and validate the findings of the other groups. They had the option of adding any points they feel were not covered by the previous group. We received valuable input from researchers and publishers.
Participants listed the following challenges and barriers that hinder the uptake of data sharing and data publishing:
- some disciplines are less willing to share materials than others,
- unclear intellectual property rules and licensing,
- data ownership issues,
- technical aspects of linking research outputs,
- lack of incentives to do the extra work (reformatting, anonymizing, making datasets platform ready).
The actions needed to solve these issues recommended by the workshop participants are the following:
- raising awareness of licensing option, data ownership issues, intellectual property issues,
- developing and implementing data documentation processes,
- including steps on data curation in the regular research workflow.
Current practices demonstrate the lack of standardized workflows for data curation, sharing and publishing. Humanities data management practices at the University of Göttingen demonstrate a varied picture with various degree of openness in regard to archiving and sharing data within the research groups and with external researchers. Humanities projects and departments could take advantage of the institutional repository infrastructure or the developing DARIAH data repository services where standardized data templates, workflows and added quality assurances tools could provide a more consistent view on data publishing across the different disciplines in Humanities. Implementation of standards and guidelines for managing research data would definitely support a more common view on data sharing and data availability within Humanities projects. In many cases the tools are given for data publishing (e.g. psycholinguists are using a platform for data analysis which allows the publishing of the description of the data set, a data paper in a push of a button), however the awareness around the benefits and value of sharing research data is not part of their research flow. Humanities data publishing will be more prominent as awareness is increased among researchers on data management and data discoverability issues.
Read the full evaluation report here.