Large language models (LLMs) and scientific writing. What is not accepted

April 12, 2024 General

Clear communication is one of the essential qualities of a successful researcher scientist. Disseminating research findings through simple writing is a skill every scientist should have. Since the arrival of artificial intelligence (AI) based writing assistants like ChatGPT, scientific writing has been changed forever. Now a computer program can write very articulated and engaging content on any subject matter in seconds. These so-called large language model (LLM) based generative AI programs provide great opportunities but, at the same time, pose challenges for scientific publishers and grant funding bodies. I discuss some of these programs' opportunities, ethical implications and accepted usage here.

Use of Large Language Models in Scientific Communication

The power of LLMs lies in their remarkable training process on vast amounts of diverse data from the internet, books, scientific literature and scientific blog articles. This extensive exposure gives an impressive understanding of human language, enabling LLMs to generate human-like language, engage in fluid conversations, summarise lengthy texts, or draft original compositions. When applied to scientific writing, such skill holds tremendous potential in streamlining the difficult process of translating dense research findings into clear, concise, and engaging prose. The initial response to generative AI in scientific and academic writing was negative, and its usage was frowned upon. Nearly all publishing houses have banned using LLMs and generative AI text in their scientific papers. When the dust settled, it was clear that these deep learning models would stay and become even more powerful. Given the advantages of LLM usage, it is time to make rules and publish policies on how LLMs can be used rather than banning them altogether.

Writing assistants backed by LLMs

To a non-native English speaker like me, Natural Language Processing (NLP) based tools like Grammarly, Ginger, Pro Writing Aid, etc, help the writing process by suggesting suitable words or phrases and maintaining the correct sentence structure. These programs have been trained to generate contextually appropriate language suggestions in real-time. Scientists and researchers often grapple with limited timeframes and tight deadlines to produce high-quality manuscripts for publication. In such situations, the time-saving aspect of a writing assistant is precious. By reducing the effort spent on word choice deliberations, researchers can channel their energies towards refining their theoretical arguments and conveying their findings more effectively.

Also, grammar can be an Achilles' heel for even the most proficient scientists, mainly when their native language differs from their academic discourse. AI writing tools, however, come armed with an inherent understanding of grammatical rules derived from their extensive training data. They can use a built-in language correction mechanism to identify and suggest amendments to grammatical errors in real-time. This automated proofreading feature saves time and ensures a more polished final product that adheres to the rigorous standards demanded by peer-reviewed journals.

Using a writing assistant and spelling and grammar checkers is encouraged, and nobody has any objections. However, now, these tools are equipped with Large Language Models (LLMs), and they can "generate" a whole article with simple instructions, such as "prompt." Now, accepting the generated text in the manuscripts is subject to debate.

Apart from grammar checkers with generative AI capabilities, there are nearly hundreds of dedicated AI text generators capable of creating content on any given topic. They can outline a funding project, write a manuscript and suggest workable hypotheses. However, this content is superficial and lacks the subject depth and references. Still, this content is entering into the scientific literature.

Publishers' policies on AI-generated content

When relying on LLMs for scientific writing, authors must remember that the output of these models primarily draws upon patterns learned from existing texts. Because of that, LLMs tend to generate content that might be plagiarised, partially accurate, or lacks novel insights—aspects of paramount importance in scientific communication. Thus, human intervention remains indispensable in verifying facts, ensuring authenticity, and injecting fresh perspectives.

Furthermore, LLMs may not fully grasp nuanced concepts central to scientific understanding—a product of their training on generalised datasets rather than specialised ones. For instance, a model might produce an otherwise grammatically sound sentence that fails to convey an accurate scientific concept due to its lack of domain knowledge. Scientists must, therefore, validate LLM-generated content through rigorous scrutiny before integrating it into their work.

In the initial days, all the publishing houses were against using AI-generated text in manuscripts and scientific writings. However, this did not stop scientists from using them in their writing. Now, major publishing bodies are accepting AI-generated text as long as it is acknowledged and used within the accepted legal and ethical framework. As per Elsevier's policies, AI-generated text should not be used for medical insights, scientific conclusions, or clinical recommendations. However, it can be used to improve readability and make language corrections. Other publishing houses echo similar guidelines and their policies.

A recent paper published in BMJ by Ganjavi et al. compared the publishers' instructions and reported where they deviated and where all of them agreed. I feel that instead of many confusing guidelines, the scientific community should decide on a standard set of rules that everyone can use globally.

List of LLMs trained for research and scientific writing

Due to inherent limitations, general-purpose foundation models (like GPT-4, Claude, Gemini, Mixtral, etc.) may not be the best choice for scientific writing research. Although they undoubtedly have impressive language generation capabilities, they lack specialised knowledge in specific domains, including intricate scientific concepts and advanced mathematical equations.

However, a few models are trained explicitly for scientific writing and demonstrate an advantage by integrating domain-specific knowledge. One example is SciBERT. It has been trained on papers from the corpus of semanticscholar.org. which consists of a corpus size of 1.14M papers with 3.1B tokens. It was trained on the full text of the papers, not just abstracts, making it well-equipped to handle complex scientific terminologies and generate accurate citations. Its performance in scientific abstract summarisation, question answering, and semantic textual similarity tasks outperform general-purpose LLMs like GPT-4 and BERT.

Another promising LLM for scientific writing is BioBERT, created by researchers at Korea University for biomedical text mining. Trained in biomedical corpora, BioBERT has extensive knowledge of biomedical concepts and terminologies, making it an ideal tool for researchers in life sciences and biomedical domains.

I know these models are a "bit" outdated in the current AI research field. However, many new models are coming every couple of days and trained on state-of-the-art LLM models for specific purposes. Now, there may be better choices for scientific research writing, which are either fine-tuned on domain-specific data or equipped with retrieval-augmented generation (RAG) to generate fresh and focused text. Scientists and university students engaged in scientific pursuits should consider these tailored alternatives to optimise their writing endeavours.

Use of Large Language Models and Ethical Considerations

The primary concerns when using LLMs are plagiarism, transparency, and potential bias, which might be hardcoded in the training data.

To begin with, plagiarism remains a substantial issue in academic circles. As LLMs generate text based on patterns learned from vast amounts of data, reproducing content similar to existing publications is potentially risky. While the likelihood of exact plagiarism is relatively low, and with little paraphrasing, AI-generated text quickly passes the standard plagiarism checks. However, users must remain vigilant to avoid unintentional paraphrasing or drawing heavily from LLM-generated text. Employing plagiarism detection tools, citing LLM inputs diligently, and fostering a culture of originality in academic writing can mitigate such risks.

The use of LLMs also calls for greater transparency, which is another vital aspect of ethical practices in scientific writing. Academic integrity requires researchers to disclose the extent of AI involvement in their work, enabling readers to understand the limitations and potential biases of the generated text. Clear communication about using LLMs during the manuscript submission process, through acknowledgements or specific statements in the manuscript, establishes this transparency. Furthermore, researchers should be ready to explain their LLM usage in defence of potential challenges from reviewers or editorial boards.

Biases present in LLM outputs are inevitable due to the inherent biases in the underlying training data. These biases could manifest in various ways, such as gender, racial, or ideological prejudices that may unwittingly creep into scientific writing. Recognising these pitfalls and proactively assessing LLM outputs before integrating them into academic work is essential. Regularly updating LLMs with diverse datasets can help minimise such biases over time. Moreover, researchers should challenge LLM outputs critically and validate their accuracy by cross-referencing with existing literature or consulting experts in their respective fields.

Future Prospects of LLMs in Scientific Writing

In my view, the future prospects of LLMs in scientific writing hold immense potential for evolution and refinement. As these models continue to advance with new scientific knowledge, we can anticipate several developments that will shape their role in crafting precise and domain-specific content within the scientific community.

I feel that enhanced domain understanding will be a significant milestone in LLMs' trajectory. Some LLMs need help with nuances inherent to specialised fields, resulting in less accurate output. However, researchers are tirelessly working towards incorporating domain-specific knowledge into LLMs by giving them access to scientific data by RAG, through fine-tuning or pre-training on scientific corpora. This will eventually lead to LLMs better comprehending complex concepts and producing scientifically sound writing across various disciplines.

Machine learning algorithms are constantly being improved, enabling LLMs to recognise and adapt to patterns within scientific texts more effectively. As these models become adept at identifying contextual cues, citations, references, and even scientific notation, their generated content will exhibit enhanced accuracy. This progression will undoubtedly contribute to the credibility and acceptance of LLM-assisted writing among scientists and researchers.

I think collaboration between LLM developers and the scientific community will be crucial in shaping the future of these models' application in scientific writing. Close interaction will allow researchers to identify specific challenges and requirements of various scientific domains, which can be incorporated into the models' training process. This iterative approach will ensure LLMs continually refine their capabilities, providing increasingly reliable support to scientists in their communication endeavours.

Conclusion

Using AI tools, researchers can optimise their time, streamline drafting and editing processes, reduce errors, and ultimately improve the readability and clarity of their work. Also, scientists may foster more effective communication with their peers while engaging broader audiences, ultimately expediting knowledge dissemination and accelerating scientific progress.

However, it is crucial to acknowledge the current limitations of LLMs in the scientific domain. These models require proper training on domain-specific data to perform optimally, and the quality and extent of available training material may need to be revised to maintain their generalisation capabilities. Furthermore, researchers must remain vigilant in validating LLM-generated content, especially in critical contexts such as conclusions, methodologies, or interpretations of results.

Embracing LLMs as complementary tools rather than substitutes for human effort is essential. Scientific writing demands critical thinking, originality, and an in-depth understanding of the underlying research, attributes that LLMs have yet to replicate fully. By skillfully integrating LLMs into their workflow, researchers can harness the potential of these AI models to augment their writing while preserving the authenticity of their scientific endeavours.

Ref

Almarie B, Teixeira PEP, Pacheco-Barrios K, Rossetti CA, Fregni F. Editorial - The Use of Large Language Models in Science: Opportunities and Challenges. Princ Pract Clin Res. 2023 Jul 10;9(1):1-4. doi: 10.21801/ppcrj.2023.91.1. PMID: 37693832; PMCID: PMC10485814.
Seckel E, Stephens BY, Rodriguez F. Ten simple rules to leverage large language models for getting grants. PLoS Comput Biol. 2024 Mar 1;20(3):e1011863. doi: 10.1371/journal.pcbi.1011863. PMID: 38427611; PMCID: PMC10906892.
Birhane, A., Kasirzadeh, A., Leslie, D. et al. Science in the age of large language models.Nat Rev Phys 5, 277–280 (2023). https://doi.org/10.1038/s42254-023-00581-4
Aydın, Ömer and Karaarslan, Enis, OpenAI ChatGPT Generated Literature Review: Digital Twin in Healthcare (December 21, 2022).