With the increasing number of developments and advances in our current scientific industry, as well as a large number of students choosing STEM subjects as career opportunities, the number of scientific papers generated has skyrocketed. Each published paper has its own view of a problem and its detailed research and hypothesis. This makes it a daunting task for the scientific community to review and revise the proposed paper. Data and statistics show that every year many scientists mourn for being rejected for their work because of “half-hearted” or “incomplete” reviews.
A group of scientists – Weizhe Yuan, Pengfei Liu, and Graham Neubig of Carnegie Mellon University in Pittsburgh, Pennsylvania – came up with the eccentric idea of using artificial intelligence and machine learning to automate the process of reviewing proposed scientific papers. This model went through each submitted paper and gave an overview of what the paper was about and a brief overview of its contents. This model would also classify the papers based on their credibility and completeness.
The Carnegie Mellon team initially met this challenge with a few yardsticks. They went through a variety of reviews by international review systems such as ICML, NeurIPS, and ICLR and selected the characteristics of a well-written academic paper. They have developed the following standards:
- Determination: A scientific paper should take a clear stance in the course of its research and clearly state its basis.
- Comprehensive: The paper should be detailed, well-oriented, and begin with a summary of the paper and its contributions to the community.
- Justification: The paper should present legitimate evidence and conclusions that fully support its research.
- Accuracy: Any scientific statement presented in the paper must be factually correct, and any error leaves a large margin for error.
- Friendliness: The paper must be written in a friendly language and easy to read.
After setting these standards, the team collected a dataset called ASAP Review (Aspect-Enhanced Peer Review), which worked through machine learning papers from ICLR and NeurIPS between 2016 and 2020. After setting up the system, the team suggested that the abstract of the scientific paper could be based on aspects.
According to the review guidelines set by the Association of Computational Linguistics (ACL), the team identified eight aspects under which the papers will be reviewed. This matrix is fed into the system for better and more efficient checking. The eight aspects are as follows:
- Motivation or effect
- Solidity (accuracy)
- Meaningful comparison
After the standards were established and the assessment aspects established, the team used a pre-trained sequencing model called the BART. To identify the potential biases and discrepancies associated with the review, the researchers defined a basic aspect score that was used to calculate the occurrence of the required positives in the work.
After the systems were set up, the Carnegie Mellon team's own paper was submitted for review through this automated process, and the following extract was generated from the model:
“This paper presents an approach to assessing the quality of reviews generated by an automated scientific paper summary system. The authors create a dataset with evaluations called ASAP-Review1 from the field of machine learning and create fine-grained notes on aspect information for each review, which offer the possibility of evaluating the generated evaluations more comprehensively. You will train a summary model to generate ratings from academic papers and evaluate the results according to our rating metrics described above. "
The conclusion found that the review generated by the system is relatively comprehensive and can summarize the most important ideas, although in its current state it cannot completely replace the manual reviews. The generated review leads to some false assumptions, although despite this deficiency it also references important statements from the paper, which makes it easier for the reviewer to see critical information in the paper very easily.
However, the disadvantage of this model is extremely staggering. The team itself has admitted that analyzing the merits and intricacies of scientific contributions is complex, and an automated review system is nowhere near the safety of a human examiner. However, this system can help the reviewers to a great extent in sifting through the many contributions submitted. Therefore, the authors suggest that their developed system could already be used as a tool in a machine-based verification process. Members of this research team are confident that the tools, numbers and statistics, and scientific models presented in this article will go a long way towards automating the review process.
Source: Weizhe Yuan, Pengfei Liu, Graham Neubig "Can we automate the scientific review?”. arXiv.org Pre-Print, 2102.00176 (2021).