In their article ‘Self-correction in science: The diagnostic and integrative motives for replication‘, David Peterson and Aaron Panofsky (2021) distinguish two kinds of replication: diagnostic and integrative replication. In this blog post we discuss how well these categories apply to the studies in our sample, and we propose to distinguish a third kind of replication study: the context-exploratory replication.
According to Peterson and Panofsky (P&P) diagnostic replication starts from scepticism about the original result, and tries to reproduce it using the same procedure. The aim is scientific self-correction. Integrative replication on the other hand is based on trust in the original result and aims to extend it by integrating it into the replicating researcher’s own work. The ultimate goal here, according to P&P, is to ‘get it to work’, to use the original results for your own purposes.
Interviews P&P conducted with 60 members of the board of reviewing editors of the journal Science seemed to show that diagnostic replication, despite being the focus of the scientific reform movement of the last ten years, is actually quite rare. The editors that P&P spoke with hardly ever did replication studies to test the truth of the original claims: they simply assumed that the original researchers were trustworthy and the results were solid. They wanted to build on the original study, use its methods in their own research for example, typically doing so in a flexible, pragmatic manner.
If we look at our own interviews with this distinction between diagnostic and integrative replication in mind, we notice that in our sample it is certainly not true that diagnostic replication is rare. The difference between our findings and P&P’s results might have several reasons. First, funding structures in our sample were explicitly focused on performing diagnostic replications. They are what NWO wants to fund. And secondly, in our pool of 23 replication studies, the vast majority, 18 studies, is from social science. P&P, on the other hand, interviewed only eight social scientists, against 36 from the life sciences and 16 from the physical sciences – perhaps reflecting underlying editorial biases in the journal Science. While their sample is skewed away from social science, ours is skewed towards it.
In our sample, we do not have a single case of ‘integrative replications’ as described by P&P, though some scientists have revealed to us that failure to ‘get it to work’ in such integrative replication work was the reason why they became interested in performing the diagnostic replication that was eventually funded by NWO – which then became part of our sample of studies. P&P also note that integrative replications can have diagnostic value as a by-product: if you just can’t get the original researchers’ method or technique to work in your own research project, you start to think that there might be something wrong with the original study. If you try to build on the work of others, you sometimes find out that the foundation is not solid. (And the consequences for your career can be dramatic.) This is indeed what we also see in our sample.
Conversely, in our interviews with psychologists we see that the intention behind the replication study is often not merely diagnostic. The replicators do want to test the validity of the original results, but a different result does not have to mean complete rejection. One replicator emphasized that if he would find a much smaller effect size, the effect might still be meaningful in practice. An intervention that only makes a small difference for one person can have a large aggregate effect when it affects many people.
Connected with this is another difference with what P&P report. In their sample, in the rare cases when diagnostic replications are conducted, the original procedure is followed as closely as possible. However, we found that some of the psychologists who replicate older studies for primarily diagnostic purposes also try to improve on the original study — because better techniques are now available, or because parts of the original procedure ‘just don’t make sense’. One replicator for example objected to the demand that participants in a psychological experiment were tested in groups, as the original researchers said was necessary. He claimed there was no theoretical reason for this, and in times of corona it was difficult to work with groups. But the original researchers wouldn’t budge: participants had to be tested in groups. Everything had to be the same as in the original experiment.
Another noteworthy result from our interviews is that when a replication is done for primarily diagnostic reasons, the replicators are often at pains to be as neutral as possible. This fits very well with P&P’s concept of diagnostic replication. The goal of diagnostic replication, they write, “is to faithfully reproduce the means while remaining agnostic about the ends” (Peterson & Panofsky, 2021, p. 587). The replicators we spoke with try to keep an open mind. Whether they are sceptical about the original study or not, they go to great lengths to make sure that their prejudice cannot influence the result of their own study. Preregistration of the design of the study and the analysis is a common measure to prevent themselves unwittingly influencing the outcome.
The most striking result from our work so far, however, is that there appears to be a third kind of replication. We propose to call them ‘context-exploratory replication studies’. In this type of replication, the original study is trusted, but it is not replicated in order to integrate the work into one’s own research goals. Instead, the original study is intentionally replicated in other contexts and with other means, with a much broader diagnostic function. Importantly, these researchers assume context-sensitivity of the original results. For example, they want to know whether the original findings hold in other populations. Or: do they hold when using a newer technique or newer insights, and how do the newer findings then compare to the older techniques/ insights? In many cases, they ultimately test for generalizability and robustness of the original claim. But they can also aim to fine-tune the original results, put them into an even broader context, or enable future generations to make even better use of the underlying findings and/or data.
Peterson, D., & Panofsky, A. (2021). Self-correction in science: The diagnostic and integrative motives for replication. Social Studies of Science, 51(4), 583–605.