Challenges of social science replication

The results of the large-scale replication project suggest that challenges in reproducibility remain, even in studies published in the most prestigious journals in science

How replicable are experiments in the social sciences? Teams from five universities around the world — including an NUS team led by Senior Deputy President and Provost Professor Ho Teck Hua — have collaborated to answer this question.

The teams tried to replicate one main finding from 21 experimental social science papers published in Science and Nature, two of the most prestigious journals in science, between 2010 and 2015. The 21 papers were chosen based on three criteria — the study the paper described tested for an experimental treatment effect; the study tested at least one clear hypothesis with a statistically significant finding; and the study was performed on accessible subject pools, such as students.

In order to ensure that the replications would be able to detect support for the findings even if they were as little as half the size of the original result, the design of the study was very high-powered, with an average sample size five times larger than that of each original study.

“In the project, we led by example, involving a global team of researchers. The team followed the highest standards of rigour and transparency to test the reproducibility and robustness of studies in our field,” commented Prof Ho. All studies conducted were preregistered on Open Science Frameworks (OSF) and all data and materials were publicly accessible with OSF registrations to facilitate the review and reproduction of the replication studies.

The team found that only 13 of the 21 replications — 62 per cent — showed significant evidence consistent with the original hypothesis. Additionally, the replications revealed that the effect sizes obtained were about 50 per cent smaller than the original studies. These results suggest that reproducibility is imperfect even among studies published in the most prestigious journals in science, and that findings, no matter where they are published, need to be carefully interpreted.

Before conducting the replications, the team set up prediction markets where fellow researchers could bet on which of the findings would replicate. The markets were found to be highly accurate, correctly predicting outcomes for 18 of the 21 replications. This accuracy opens up the possibility that prediction markets could be used to prioritise replication efforts for studies with highly important findings but relatively uncertain or weak likelihoods of replication success.

The success of the prediction markets is a major positive outcome of the project and shows that post-publication, peers have a good sense of strong and weak results. The team therefore believes that it may be good for researchers to open their research to peer review prior to publication to ensure that results are robust.

They added that the replication failures do not indicate that the original findings were false as errors in the replication or differences between the original and replication studies could still be responsible for some failures to replicate.  

This large-scale project, described in a paper recently published in Nature Human Behaviour, is part of an ongoing reformation of research practices to drive research culture towards greater openness, rigour and reproducibility. “With these reforms, we should be able to increase the speed of finding cures, solutions, and new knowledge,” said University of Virginia Professor Brian Nosek, one of the co-authors of the paper.