Pilot 2: Open Peer Review for Research Data

pilot2

The goal of the pilot study was to investigate the applicability of (open) peer review to research data in disciplines related to Social Sciences. 

In a first phase, we will analyse current dataset management and diffusion practices in this field. In particular, we examined:

  • types of datasets,
  • data providers,
  • publication/diffusion and
  • validation modes.

In a second phase, we conducted interviews and a survey taking into account both perspectives by data providers/journal publishers and data users. The aim is to collect additional insight into good practices supporting data validation or quality assessment.

Key Results

We interviewed managers and country specialists of the Human Mortality Database (HMD), and did a survey with the users of HMD. The interviews provided us with important information on procedures performed to assess the quality of data in a pre-publishing phase as well as strong and weak points connected with data sharing. The interviews aimed at exploring origin, motivations and organisational features of HMD, the goal and main features of the database, its data quality assessment process, and the interviewees’ pinion on Open Access to data.

The results of the survey with HMD users help understanding their practices in data access and use that can be considered as proxy indicator of post-publishing appreciation of the quality of the database. The survey specifically focused on the HMD user’s practices and attitudes in data access and use. In particular we asked the HMD users how often and how long they use HMD data; from which countries; how they download data dn which type of datasets; how they use the datasets; and how they perceive HMD compared to other sources of information.

Some important indications emerged from the analysis of the interviews that can drive the adoption of data quality assessment, and hence peer review, as well as some principles that can incentivize other scientific communities to share their research data. As stated by the HMD interviewees, the guiding principles to create an open access database were: comparability, flexibility, accessibility and reproducibility. Comparability was reached using a uniform, scientific methodology to calculate the various statistics of the 39 countries included in the database. Flexibility was achieved in the analysis of results using a uniform set of procedures for each population, but at the same time giving significant attention to each population in terms of its history and socio-political development. This is also reflected in the available formats of output data series. This is achieved thanks to the experiences and knowledge of country specialists, that is persons in charge of collecting data from a specific number of countries, who interact with statistical offices, check data consistency and provide population statistics together with a country report that explains specificity and motivation of analysis. Accessibility was guaranteed from the beginning by free of charge access of data, as well as by the provision of data in an open, no-proprietary format. Reproducibility is provided by the reconstruction of the data lifecycle that includes the availability of raw data, the method applied, the related results as well as the explanatory documentation. One of the main successful features of HMD is its transparent way of data managing and sharing that has two central phases of data validation. The first one is carried out by the CSs, who analyse the raw data according to a common predefined checklist that verifies consistency and plausibility of data. The second one is carried out in a collaborative way within the HMD team that validate the statistics before their publication, each time the database is updated.

Moreover, another successful component of HMD was its collaborative approach that is based on a strong scientific interest in the field as well as on the trust among the involved community that only recently has formally signed a Memorandum of understanding. The interviews also highlighted some indications that confirm some concerns already mentioned by other surveys. Interviewees stressed the importance of having a strong commitment of the organization in supporting the development of data infrastructures. This pertains different aspects: a long-term financial support (beyond the project duration), a policy endorsement on open data as well as a formal recognition of scientists for the efforts in data curation and quality assurance.

Considering the results of the survey, users confirm the main strength points of HMD regarding in particular the accurate and well-documented data quality assessment that make the process transparent and facilitate the reproducibility of the analysis. They do not outline evident weak points; they rather suggest improvements mainly related to the provision of tools that facilitate the import of data into statistical packages. This may be also related to a simple style interface, where some links could be better highlighted. A user’s comment summarises well this aspect: “the format of the website could be more aesthetically appealing, but as it is the site is very functional and suits the needs of the users”. Moreover, the different types of user profiles that comprise the research field as well as the private sector, addressing different users’ needs are indications of the importance of data sharing that reinforce Open Science principles. If considered under an OPR perspective, a straightforward transposition of the procedures adopted for scientific journals seems to be hard to apply. However, some traits of OPR, such as transparency in the quality assessment process, represent for open data a feature that should be promoted at a larger scale. This could be also applied to the trait of open participation that in case of open data implies a more common use of data citations by end-users as well as the implementation of additional tools to track data re-use. Further research is needed to explore practices of data sharing and management not only in Social sciences, to take the necessary steps to support and improve high quality data sharing.

Read the full evaluation report here.

Contact: Daniela Luzi, CNR-IRPPS National Research Council – Institute for Research on Population and Social Policies, d.luzi@irpps.cnr.it