Home > Peer-to-Peer Review > A Proposal To Improve Peer Review: A Unified Peer-to-Peer Review Platform (part 1.5)

A Proposal To Improve Peer Review: A Unified Peer-to-Peer Review Platform (part 1.5)

Late again. So as I was working on part 2 of this blog series where I present our proposal to improve scholarly communication through the peer review element….I came across this rather scathing review of our proposal by David Crotty.

Since I don’t see the point of working on part 2 while someone has criticized some elements of our proposal, I’m going to take a short break and respond to the criticism first.

First things first, my co-author of our working paper no longer works at Erasmus University Rotterdam. He hasn’t updated his information yet. As for me, I currently don’t have any affiliations (relevant to this working paper anyway). So that’s that. I wouldn’t exactly classify myself as mysterious, as I do have a LinkedIn page where I’ve listed my educational background. But let’s focus on the actual comments.

Their system is designed to begin in open access preprint repositories and then potentially spread into use in traditional journals.

The design should, by default, allow journal publishers/editors to take advantage of the system. But that’s pretty much it. This part doesn’t change at all whether the peer-to-peer review model grows or not.

The proposal is full of gaping holes, including a need for a magical automated mechanism that will somehow select qualified reviewers for papers while eliminating conflicts of interest,

Okay. First of all, I don’t consider the idea of a recommendation system that can match manuscripts with suitable peer reviewers as magical. Now, in the Discussion & Conclusion section of our paper we go over the potential strengths and weaknesses of our peer-to-peer review model. In the “Potential Weaknesses” section of it, we’ve stated the following:

A key requirement of the peer-to-peer review model is that the automated manuscript assignment system has to be effective. Since it is essentially a type of recommendation algorithm, it should be technically and functionally feasible to find suitable manuscripts for scholars available for peer review. We identify two issues that remain for now. The first is how to verify whether there is a conflict of interest without making the real identities public. The ability to verify this would improve the answerability of this model significantly. Technically and functionally, filtering certain matches should be feasible, but it would significantly rely on the information that scholars provide. Perhaps allowing authors to indicate manually which authors (edit: scholars is probably a better term to use here) they do not want for peer review might help address this issue. The manual element can be done anonymously, making it only accessible to the automated manuscript selection algorithms. Ideally, we would be able to rely on the automated selection algorithms for this issue as much as possible. Creating a system that can compare paper abstracts, keywords, scholarly affiliations and future research projects to determine whether there is reason to believe there is a conflict of interest is a critical success factor.

To imply that we completely (and magically!) depend on the manuscript selection element, including the ability to find and reject matches with a conflict of interests, to be fully automated and working perfectly is highly inaccurate. In fact, on page 15, in the “On Peer-to-Peer Answerability” section of our paper, we’ve spent 5 paragraphs on addressing this exact issue, with the second paragraph starting with the following:

Manual approaches should additionally be implemented in the event the recommendation algorithms are unable to detect conflicts of interests. For example, scholars can manually prevent certain scholars from peer reviewing their manuscripts. The number of scholars they can prevent from peer reviewing can be based on the total number of suitable scholars. Furthermore, scholars should be given the opportunity and encouragement to mark both manuscripts and papers for which they are “Proficient” to peer review. If this statement is checked, additional statements are presented, such as the “No conflict of interest” statement and whether the scholars are “Interested”, “Very Interested” or “Not Interested” in peer reviewing the respective manuscripts and papers.

(SNIP: To the next paragraph)

After a manuscript has a certain amount of such “compatibility” statements checked by a number of scholars, a short overview with the titles and abstracts of the respective manuscripts can be added to the real profile pages of these scholars.

The rest you can read for yourself. I won’t do it justice unless I quote the entire thing, and I still got other things to handle in this post. One can certainly question how efficient this model can relatively be with these manual measures (an issue that we’ve also acknowledged and discussed), but to suggest a magical reliance on automating manuscript selections is highly inaccurate.

an over-reliance on citation as the only metric for measuring impact,

Not entirely sure what he’s referring to here. It’s true that we consider the paper citation count an important factor in determining the impact of a paper. And? I can imagine the number of views, downloads, ratings, comments, blog posts and such to be significant as well in determining the impact of a paper. Actually, we have factored in comments and ratings as something that can influence the impact of a manuscript. I’m sure we can consider the others as well later.

and a wide set of means that one could readily use to game the system.

Well, we’ve spent a lot of the paper addressing such issues. Did we identify all exploits? I doubt it. Did we create perfect measures to close the potential exploits? I doubt that. I’d like to think that at the design phase, which is where we are, we can (openly) discuss such issues. I, for one, am very interested in hearing about these ‘means that one could readily use to game the system’.

The proposal doesn’t seem to solve any of the noted problems with traditional peer-review

Solved is a big word. I think our message has been to try and “improve on the current situation”. Few to no incentives, for one. Accountability the other. Insight on the peer review quality (relatively). A higher utility of a single peer review by making it accessible to the relevant parties, such as other journals and peer reviewers of the same manuscript etc.

as it seems just as open to as much bias and subjectivity as what we have now.

Well, we do provide tools that allow scholars to at least track and (publicly) call out such offenses, in very extreme cases. In other cases they will simply not have their work “count” towards their “Reviewer Impact”, which is publicly visible. How is that as open as what we have now?

It’s filled with potential waste and delays as reviewers can apparently endlessly stall the process

What? No. Page 8 and 9:

Each peer review assignment is constrained by predetermined time limits. The default time limit for an entire process is one month after two peer reviewers have accepted the peer review assignment. Peer reviewers can agree to change the default time limit during the acceptance phase. Any reviewer who has not “signed off” by then will have Reviewer Credits extracted until the reviewers of the reports sign off or when the application for a peer review is terminated. This measure is to prevent a process going on for a far longer time than agreed to beforehand, which is not desirable for any party. An example of how the termination can work: a termination can happen when no new deadline, agreed by the authors and peer reviewers in question, has been set two weeks after it has passed the original deadline. In the case of termination one or more peer reviewers will have to be assigned to the peer review session to achieve the minimum of two peer reviews per manuscript.

Not exactly what I’d call the ability to stall endlessly.

and authors can repeatedly demand new reviews if they’re unhappy with the ones they’ve received.

Like how they can do now? Actually, we have something a little different in mind. See page 9:

When authors are not content after having gone through a peer review process, they can leave manuscripts “open” for others peer reviewers to start a new peer review session. The newer peer reviewers will have access to peer review reports of previous sessions, creating an additional layer of accountability. Concerning the consequences of multiple peer review sessions for the same manuscripts; in the traditional system the latest peer reviews before a manuscript is accepted for publication are the ones that count. In our peer-to-peer review model, the manuscript score is based on what the peer reviewers of the newest session have assigned to them. This is regardless of whether the scores are higher or lower than the previous manuscript scores. A possible alternative to this is to let the authors decide which results to attach to the manuscript rating. A disadvantage of authors selecting which set of grades to use is that it could likely weaken the importance of the earlier peer review sessions. To improve accountability and efficiency, previous reviews are not hidden from any future peer reviewers. The reviews will still count and the peer reviewers who have submitted them maintain the Reviewer Credits awarded to them. Regardless of how and which sets of grades are utilized, those specific grades are to be reflected in the rankings and returned search results.

So, yes, authors can demand new reviews if they’re unhappy with the ones they’ve received. And scholars can see how many times they’ve done this already based on the grades (and sometimes more, depending on the grades) of the existing peer reviews of those manuscripts and decide for themselves whether it’s worth their time to peer review them again. Again, you can question the effectiveness of this added level of accountability, but you cannot say authors can “abuse” the concept of requesting peer reviews as many times as they want. They can’t, and certainly not compared to what they already can and generally do with the current publishing system. Also, the section Crediting Reviewer Impact (which starts at page 11) covers additional “penalties” of authors repeatedly accepting new peer reviews.

Reviewers are asked to do a tremendous amount of additional work beyond their current responsibilities, including reviewing the reviews of other reviewers, and taking on jobs normally done by editors. If one of the problems of the current system is the difficulty in finding reviewers with time to do a thorough job, then massively increasing that workload is not a solution.

A legitimate concern. But here’s the thing, we’re not sure this is going to be true. Sure, we ask peer reviewers to additionally evaluate and score the peer reviews of the others. We’ll classify that as a chore. Not entirely substantiated, because we’ve more than once heard the sentiment shared that scholars actually enjoy having access to the other peer reviews of the manuscripts that they themselves have peer reviewed just out of curiosity, or to learn something from it. And are they not evaluating the other peer reviews by doing that? We’re just proposing to provide scholars who want to do that with the tools to do so effectively. But fine, we’ll consider that a chore.

But what if we achieve our intended objectives? What if by doing this the average quality of a peer review(er) goes up? What if the average number of peer reviews for manuscripts go down (because of the instruments that can hold peer reviewers and authors alike more accountable for untimely/low quality work)? And if you have to peer review a manuscript that has been peer reviewed before (but hasn’t been revised), what if you can save time by having access to previous peer reviews? And what if your own manuscripts receive greater odds of being noticed, read, reviewed and cited more often by peer reviewing well (more on this later, or you can just read it in the working paper)? A more efficient allocation (with a global platform) of the available peer reviewers, peer reviews, authors and manuscripts? Have a more objective understanding of (the impact of) your peer review proficiency (relatively to other scholars)? Open Access to scrutinized research literature? Would it still just be a tremendous waste of your time? Or may the benefits actually be worth it? Focusing on just the “chores” without pondering over the potential benefits, both perspectives which we’ve written extensively about, is not a very accurate way of evaluating proposals IMO.

There’s a reason that editors are paid to do their jobs — it’s because scientists don’t want to spend their time doing those things. Scientists are more interested in doing actual research.

And they can do that better when they don’t have to keep on peer reviewing (unrevised) manuscripts that have already been peer reviewed. And when they can have more access to scrutinized research literature.

Like the PubCred proposal, it fails to address the uneven availability of expertise, and assumes all reviewers are equally qualified.

Actually, the whole point of creating a metric for peer review proficiency is to more objectively measure the differences in peer review proficiency among scholars. And by providing them with the instruments to do so systematically, I’d like to think that we can get that kind of information. As for the former issue, I’m not entirely sure what he means. Scholars aren’t being punished for not peer reviewing. They can still submit their papers, and if they’re interesting enough then surely some scholars will want to peer review them, which they can with no penalties.

Also like PubCred, the authors’ suggestions for paying for the system seem unrealistic. In this case, they’re suggesting a subscription model, which seems to argue against the very open access nature of the repositories themselves, limiting functionality and access to tools for those unwilling to pay.

The nature of Open Access preprint repositories is to provide access to preprints. That doesn’t change at all. Everybody can still submit and access preprints in OA preprint repositories. What they might want to be paying for is more advanced search instruments for peer-to-peer reviewed manuscripts (“postprints”). What we propose here is something that can no longer be classified as an “open access repository”. It’s a peer-to-peer review model with an own database for peer reviews and possibly an own database for revised papers, if the repositories “providing” manuscripts can’t accommodate for that, providing open access to scrutinized preprints (“postprints”). Paying for the scrutiny of manuscripts doesn’t go against the nature of scholarly communication, surely.

The authors spend several pages going into fetishistic detail about every aspect of the measurement, but just as in the proposed Scientific Reputation Ranking program suggested in The Scientist, they fail to answer key questions:

Who cares? To whom will these metrics matter? What is being measured and why should that have an impact on the things that really matter to a career in science? Why would a funding agency or a hiring committee accept these metrics as meaningful?


If you’re hoping to provide a powerful incentive toward participation, you must offer some real world benefit, something meaningful toward career advancement.

And with this, my failure to come up with a good title of this blog post is exposed: it’s not just about peer review. As the title of our working paper already suggests: it’s also about scholarly communication. Who cares? Scholars who want to read scrutinized research literature might care. Authors who want to see their papers scrutinized might care. People who care about Open Access might care. Scholars who want to peer review properly might care. Scholars who peer review properly and want to be rewarded with higher odds of having their own works noticed, read, reviewed and cited might care. Scholars who care about a more efficient allocation of peer reviewers and their peer reviews might care. You can argue against the validity of these “incentives”, but you can’t just disregard them completely without considering them and telling people “there’s nothing for them to gain”. I find that to be a very incomplete approach of evaluating proposals.

Look, there are plenty of legitimate concerns with our proposed model. We spent quite a bit of time addressing how those concerns can be tackled. We could use advice on how to improve our proposed solutions or even criticism of why they don’t work. What we don’t need are people completely ignoring our proposed solutions when they review our proposal. It doesn’t help us, and I don’t see how it can help you. And that’s all I have to say for now. Back to working on part 2.

  1. October 5, 2010 at 5:25 PM

    Hi Chao,

    I didn’t really have space to go into a detailed critique of your proposal in the Scholarly Kitchen post. The point there was to talk about the futility of social reputation scoring systems as motivators for participation in a professional market. That’s where the question “who cares” is an important one. No one questions the desire to improve the peer review process, improve science publishing, improve communication and speed progress. If your system accomplishes those sorts of things, then isn’t that motivation enough? Adding a scoring system does not seem to be a big motivator, particularly, as noted in the blog, because it’s irrelevant outside of your system. If I create an awesome new system to get the plant research community to count the number of leaves on trees, it’s unlikely they’ll be motivated to join in if I include a “Plant Factor” system that gives them higher scores for the more leaves they count and the diversity of trees sampled. What are they supposed to do with that score? If I’m up for tenure and I haven’t published any papers or secured any grants, will having a good Reviewer Impact score make any difference to my institution? If I’m a grant officer for the American Heart Association, I’m looking to fund researchers who can come up with results that will help cure disease, not researchers who are good at interpreting the work of other researchers or who are popular in the community. Why would I care about a grant applicant’s Reviewer Impact score?

    For any system to be adopted, it has to have clear utility and superiority on its own. An artificial ranking system does not add any motivation for participation. The one benefit offered by your Reviewer Impact score is more visibility for one’s own papers. That seems to be the opposite of what you’d want out of any paper filtering system. You want to highlight the best papers, the most meaningful results, not the papers from the best reviewers. If a scientist does spectacular work but is a bad reviewer, that work will be buried by your system in favor of mediocre work by a good reviewer.

    That said, I’m happy to expand my comments on your proposal. As a working paper, it deserves scrutiny and hopefully constructive criticism to improve the proposal.

    I call the automated selection program “magical” because it does not exist, and I don’t think it’s technologically capable of existing, at least if it’s expected to perform as well as the current editor-driven system. Your conflict of interest prevention system relies entirely on reviewers being completely fair and honest. One of the common complaints about the current system is that reviewers with conflicts deliberately delay, or spike a qualified publication. If those reviewers are so unethical that they’re willing to accept a review request from an editor, despite knowing their conflicts, why do you think they’d recuse themselves in your system? Isn’t landing a big grant or being the first to publish a big result going to be more important to them then scoring higher on an artificial metric? Note that authors can already request that certain reviewers be excluded in the current system, and yet conflicts of interest still happen.

    But there’s much more to selecting good reviewers than just avoiding conflicts of interest. Your system relies on reviewers accurately portraying their own level of expertise and accurately selecting only papers that they are qualified to review. One of the other big complaints about the current system is when reviewers don’t have the correct understanding of a field or a technique to do a fair review. A skilled editor finds the right reviewers for a paper, not just random people who are in the same field. When an editor fails to do their job properly, you get unqualified reviewers. In your system, this would be massively multiplied as there’s a seemingly random selection of who would be invited to review once you get into a particular field. Your system seems to have a mechanism built in where a reviewer can only reject a limited amount of peer review offers. After that, he must peer review manuscripts to remain part of the system. That puts pressure on reviewers to accept papers where they may not be qualified.

    Expertise is not democratically distributed. You want papers reviewed by the most qualified reviewers possible, not just someone who saw the title and abstract and thought it might be interesting or because they ran out of rejection opportunities allowed by the system.

    Citation: you can imagine other factors being added in, but from my recall, citation was the factor that was mentioned over and over again as being used to score a reviewer’s performance. I agree that citation is an incredibly important metric, but it’s a flawed one as well. It’s impossible to separate out a citation in a subsequent paper that lauds an earlier discovery versus one that proves it to be untrue. Fraudulent and incorrect papers get cited lots. Citation is a very slow metric as well, as you note in your proposal. If Reviewer Impact is indeed important to a career, it may not fit into the necessary timeline for someone up for tenure or funding. And citation is certainly an area where one could game the system, by deliberately citing all of the papers one reviewed. If the Reviewer Impact score is somehow decided to be important, you could choose papers relevant to your own work, give them a good review then cite them, thus pumping up your own score for judging science.

    Another example of a place for gaming is in reviewing the other reviewer on your paper. As I understand it, there are a certain number of “reviewer credits” given for each peer review session. If those are divided among the paper’s reviewers based on their performance, isn’t there an advantage in always ranking the other reviewer poorly so you garner more of the credits?

    Delays: one month is a lot longer than my current employer gives reviewers (2 weeks). Furthermore, as you note, the time limit can be changed by the reviewers. Let’s say they both want to sit on a competitor’s paper, so they move the time limit out to 6 months. Problem solved. And again, the punishment for such a delay would be a bad score in a metric that has less effect on one’s career than getting scooped would.

    Additional Reviews: again, punishment based on a meaningless metric is the penalty. But one should also consider the double-edged sword of a system where reviews are permanently attached to a paper. If a paper gets a bad reviewer who unfairly trashes it, should that paper be permanently tarnished by having that review read by every subsequent reviewer? Wouldn’t it be better if they gave the paper a fair chance, a blank slate? Clearly I’m not alone in thinking this, as the uptake levels for systems like the Neuroscience Peer Review Consortium are microscopic (1-2% of authors). And this science blogger puts it well:

    “Personally, I can say unequivocally that after I’ve had a paper rejected by a journal the last thing I want is to have to show the next journal to which I submit my manuscript the crappy reviews that I got the first time around. Why on earth would anyone want that? I want a fresh start; that’s why I resubmit the manuscript in the first place! Peer reviews from a journal that rejected my manuscript are not baggage I want to keep attached to the new manuscript as I submit it to another journal.”

    Additional work: there’s a huge difference between reading through the other reviewer comments on a paper and in writing up and doing a formal review of the quality of their work. If one is to take such a task seriously, then it’s a timesink. There seems to be a whole raft of negotiations involved and extra duties, extra rounds of review. The proposal itself is highly complicated, filled with all sorts of if/than sorts of contingencies. You’ve certainly put a lot of thought into it, but it’s way too complicated, too hard to explain to the participants. The ideal improvement to the system would be a streamlining, not an adding in more tasks, more negotiation, more hoops through which one must jump. Time is the most valuable commodity that most scientists are having to ration. Saving time and effort should be a major focus of any improved system.

    I do think you have some interesting ideas here, and I look forward to seeing future iterations.

  1. October 6, 2010 at 2:44 PM
  2. October 9, 2010 at 9:26 PM
  3. October 19, 2010 at 5:56 PM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: