Since their mass introduction in late 2022, AI chatbots – known as Large Language Models (LLMs) – such as ChatGPT and later versions such as GPT-4 have continued to garner press attention. These systems are capable of generating texts, CVs, translations and audio transcripts. Their writing capacity is allegedly so advanced that these systems can generate abstracts so coherently that even experts have been unable to detect that they were written by a machine. Their applied uses are commonplace; it has been suggested, for example, that they could be used to predict the early stages of Alzheimer’s. However, there has also been much insistence that these systems lack the true capacity to understand the texts they process (“reading” or “writing”). For this reason, they have been characterized as “stochastic parrots”. Other problems have also been highlighted, such as the lack of transparency in training data, privacy, biases, or the so-called “hallucinations” and falsehoods that they produce.
While the interest is real, nowadays it cannot be stated with certainty that this technology has been widely implemented in formal work processes nor that it has been generalized beyond experimental use or for leisure or the satisfaction of curiosity. This is, without a doubt, a matter that warrants elucidation through serious empirical studies.
AI chatbots as assistants
This article will leave aside the technical aspects and focus on some epistemological and general philosophical aspects related to the automation of reading and writing for scientific research. Given the scope of this article and with no claim of being exhaustive, we will limit ourselves to a rather specific topic: the use of LLMs for the automation of research and bibliographical review that tends to precede all academic research.
Let’s take as an example, the case of Elicit, one of these tools which seeks to optimize the flow of academic research. The developing company’s website describes it as follows:
“Elicit is a research assistant using language models like GPT-3 to automate parts of researchers’ workflows. Currently, the main workflow in Elicit is Literature Review. If you ask a question, Elicit will show relevant papers and summaries of key information about those papers in an easy-to-use table.”
According to the creators of Elicit, based on a user’s question, the system finds the 400 articles most “semantically” related to the question, orders them and returns as a result the most important key information from the eight most relevant articles (outcome measured, intervention and sample size).
In every academic and scientific-technical area, bibliographical research into the state of the art tends to precede any research. It is an instrument we use to find out what has been said about a topic, what other views or theories exist, what gaps are still to be filled, and so on, and which serves, in summary, to establish the foundations of our own research.
But is bibliographical research a mere instrument that we could optimize using a tool like Elicit?
To answer this question, we must first look at a preliminary question of a more general nature related to the way of performing science, which we will look at below.
Intensive science and extensive science
To tackle the question, and we hope without oversimplifying the matter, we can draw a parallel with agriculture and the cattle industry and distinguish between “intensive” and “extensive” science.(1)
Intensive science is one that is “successful” in terms of quantitative results, understood as the mass publication of papers and the maximization in every dimension of valuable scores by agencies of scientific quality. Intensive practices enable researchers to survive and get a promotion. In addition, it is a model of vertical specialization and, continuing with the parallel with agriculture, one of single-crop farming. In short, intensive science seeks to obtain the greatest possible yield from the time and other resources available in order to maximize the tangible benefits at the expense of other, less tangible or outright intangible aspects.
The practice of intensive science is normal in the field of AI, where, for example, there are myriad articles on systems presented as able to “detect emotions”, even though such systems are in reality incapable of detecting emotions in the strictest sense. However, in order to comply with the imperatives of the productive machinery of intensive technoscience,(2) the enormous complexity of human emotions is reduced to that which can be measured with an AI system, even at the cost of having to establish false universal categories and eliminate from the algorithmic model all references to corporality, context and culture.
So, if the main interest of a researcher is in “producing” an article to have citations and improve indicators of academic “quality”, even if the article does not add much intrinsic value, it is probable that prior bibliographical research is nothing more than an instrument rather than an end in itself. One will seek to compile articles and catalogue them, at most reading keywords, titles and abstracts and thus increasing productivity.
Can this be automated?
Yes, and in this sense, tools like Elicit, excepting some technical difficulties that may be improved in the future, can be useful for this type of intensive science, which is reductionist and focuses on obtaining solutions and the principal purpose of which is to maximize the production of articles.
If we perform a test with Elicit (see figure 1), the software presents to us as evidence computing science journals but does not offer results relating to theories of emotion from psychology, which would immediately situate the reductionism of AI-driven emotion recognition systems as outdated at best. However, this does not matter in the intensive science model, where research units specialize in the monoculture of articles and for which interdisciplinarity may present a threat to the meaning of their task.(3)
On the other hand, if we are moved by extensive science and are truly interested in detecting emotions, and if we truly wish to understand in depth the complexity of the matter, beginning with the very multidimensionality of the concept rather than stagnating with banal simplifications, tools such as Elicit will be far less useful. At best, Elicit will serve as a tool of partial help with a task that is broader and further-reaching.
This provides us with another response to the question suggested above. In the frame of extensive science, the answer is no: bibliographical research is not a mere instrument that can or should be automated. Aristotle said that the practice of medicine did not consist of cutting or not cutting, or of prescribing a remedy, but rather of doing so in a certain way. We can say the same about bibliographical research: it is not about merely obtaining a result, this being pages of text with references to other articles, but rather about obtaining this result in a certain way.
In short, we have two preliminary answers: for intensive science, bibliographical research can indeed be automated, while for extensive science it can only be done partially. This point warrants examination in greater depth, which is what we will attempt below.
Intrinsic ends and instrumental ends
Philosophy tends to distinguish between two types of values (or ends): intrinsic and instrumental. Intrinsic here refers to everything that has value in itself, such as friendship, health, fun and justice. Instrumental refers to things the value of which depends on their relation to something valuable, either by obtaining or preserving that thing. In this sense, instrumental values are also important.
For example, a drill has no intrinsic value, but it is useful for making holes in a wall so that we can hang pictures and enjoy them. On the other hand, the aesthetic pleasure derived from contemplating said pictures does have intrinsic value. We would certainly find little meaning in the question of why we want to obtain aesthetic pleasure, given that it is something desirable in itself.
In some situations, a single thing can combine both types of value: that is, it can be useful for something and simultaneously have value in itself. Take, for example, friendship. A friend can help us to get a better job or move house. At the same time, friendship has value in itself, and the value of having friends does not depend on a friend being useful for something. If we compare the value of friendship with that of the aforementioned drill, the difference is obvious: a common drill has little to no value beyond helping us to make holes in walls or wood.
The case of mathematics can help to illustrate this difference in greater depth. Mathematics is usually divided into pure and applied. The former includes areas such as algebra, geometry or mathematical analysis. People who work in pure mathematics say that it holds interest in se and is done per se: in itself and for itself, not as a means to an end. In other words, someone who is researching “finite groups” is doing so with an intrinsic interest, for example to abstract relevant properties, regardless of whether these discoveries are useful for some subsequent practical application. Naturally, it is entirely possible that a discovery or theory, however abstract, may eventually be useful for something. Nevertheless, the crux here is the mode in which the study is approached in that it does not depend on being for something later. In contrast, applied mathematics is for something. For example, in the area of industrial mathematics, the focus is on solving problems of an industrial nature, such as optimizing energetic efficiency in buildings through numerical simulations.
This difference between pure and applied mathematics is an idealization, and what we find in practice is that, in applied mathematics too, mathematicians are interested in practising mathematics as an end in itself.(4) Solving a problem relating to a geothermic heating system may have a practical application, but the task itself of solving the problem and the intellectual activity required to handle complex ideas, perform abstractions and formalize solutions are, in themselves, worth doing and give meaning to mathematics as a profession.
The intrinsic value of bibliographical research
Now that we have clarified the difference between intrinsic and instrumental ends, let us go back to bibliographical research. Conducting bibliographical research exceeds the instrumental dimension since it helps us to acquire and complement a vocabulary with which we can develop our own ideas, thus increasing our capacity for critical thinking and creativity, which have intrinsic value. Above all, good bibliographical research situates our work within a space of continual dialogue with our peers and those who precede us. At the same time, it enables us to find role models and position ourselves within an academic current, which can and does affect our identity and the values and beliefs we adopt as individuals. In other words, while some aspects of bibliographical research are instrumental (that is, they are useful forsomething), there are many others with an importance independent of their application or immediate practical utility but which rather are intrinsic to the practice of science in its extensive modality.
Just as with applied mathematics, which is never solely instrumental, we see something as apparently modest as bibliographical research just as profoundly connected with the two types of values: it is useful for something and it also has value in itself.
Based on this statement, we can ask a series of questions: should we automate these tasks connected to the most profound parts of our profession and which even lend it meaning and constitute it?
The answer does not depend on whether these chatbots for research work well or not. In our experience, Elicit in its current version does not offer better results than those of Google Scholar, but this is not a central aspect because the quality of these results may improve in the future. Here, the important question lies in the academic-scientific practice itself. What happens to its intrinsic values and ends when it is automated? What happens to the meaning and sense of these human practices when the instrumental values displace and erode mindsets, norms, and activities that have intrinsic value? Let us remember that, when conducting bibliographical research, one learns and shapes their thoughts, questions themselves, and establishes a dialogue of agreements and disputes with those that preceded them, consolidating skills and developing new ones.
We can go a step further. What happens when the intrinsic value obtained from bibliographical research is replaced by an answer from a kind of oracle rather than those who make up the practice? Let us imagine the best-case scenario: an answer without errors, without “hallucinations”, compiled from current and reputable sources. Suppose, then, that at best, these systems can provide the eight most relevant articles. For whom are they relevant? Does it make sense to talk about relevance as a neutral, universal notion? If we separate the researcher from this search process, if we separate them from this dialogue with the various traditions that take place during bibliographical research, what conceptual tools do we have for developing a notion of relevance? How can we talk about relevance without a subject that imprints meaning and commitment on this stage of research?
The process of bibliographical research is part of the process of constructing a conceptual and theoretical scaffold for thinking about the topics we are researching. These scaffolds, in turn, are not islands but rather are connected to the scaffolds of that multitude of researchers, past and present, with whom we, directly and indirectly, share academic-scientific practices and to whom we relate through vocabularies, traditions, methods, consensus and disputes. A crucial part of bibliographical research is precisely to consider which source to include or discard. On what criteria and evaluations do we base our decision? This task is key to the construction of a hypothesis or theory and it cannot be automated using a chatbot operating on the basis of statistical regularities. Delegating this task to a machine leads to an enormous loss for research itself by voiding of meaning the task of determining the relevance of a source and its inclusion or exclusion.
Should we, then, not automate anything? Is it impossible to optimize the process and make it more efficient? Of course, we can automate and optimize. We automate things such as the generation of a list of references using a bibliographical manager such as Zotero, and we delegate grammatical checking to the word processor. But for each activity that we automate, some things are lost and others gained; in these cases, we lose certain skills. Manually preparing a list in APA format or correcting grammar requires more knowledge than using Zotero or Word. However, we can defend these automations because they optimize the process and increase efficiency without fundamentally eroding the intrinsic aspects of the extensive practice of science.
This essay is not a tirade against automation, but rather a reflection that seeks to reason on which activities and tasks can be automated and which should not be delegated to machines. Does it make sense to automate the generation of hypotheses, bibliographical review, the design of experiments or the discussion of results? Can we automate certain parts of these processes? Which parts and to what extent?
To respond adequately, we must be clear on what we lose through automation and what we gain in return. Are there skills and activities that it is worth leaving aside in exchange for something more valuable? Or, on the contrary, do we sacrifice intrinsically valuable things in exchange for the instrumental value of efficiency?
The answers to these questions cannot be left in the hands of engineers with simplistic and “solutionist” views of scientific practices, which reduce it to the minimum in order to force it to fit through the input and output ports of an algorithmic system. Nor can they be left in the hands of academic managers interested only in “quality indicators”. These answers must be given principally by the people interested in the intrinsic value of scientific-academic practices, not by people guided by mere instrumental values. We are well aware that all of this also involves a radical questioning of how we communicate and evaluate scientific production.
To conclude, let us return to agriculture and the cattle industry. Fertilizers and milking robots are not mere tools but rather technological devices that shape a type of given practice. Technologies are not neutral instruments but rather ways of putting certain visions into practice while excluding others. Raising calves in cages is at odds with extensive grazing, and single-crop farming in opposition to sustainable, resilient agriculture. Something similar may occur with AI systems such as ChatGPT and Elicit. Their design and functionality seems to fit better with the instrumental values of intensive science than with the intrinsic ends of extensive science. For this reason, thinking about the eventual adoption and use of these systems is a good excuse to think also about the direction we want our academic-scientific practices to take. Do we want a science of texts (partially) automated by machines that other machines then process and summarize and in which no one reads what we write? Do we want an intensive, instrumental science optimized for quantity and efficiency or, in contrast, do we prefer to imagine and pursue an extensive science that seeks qualitative quality and depth and which is not just an instrument for something but rather an end in itself?
The authors would like to thank Txetxu Ausin and César Astudillo for their valuable comments on a draft of this article.
1. Intensive agriculture seeks to maximize the yield and profit from a crop using, for example, industrial machinery, fertilizers or abundant irrigation, which has a greater environmental impact. Extensive agriculture, on the other hand, is more respectful towards the environment and seeks a more sustainable use of the earth. Something similar occurs in the cattle industry, although with the addition of the important matter of animal wellbeing. This aspect is ignored in intensive cattle-keeping, which uses manufacturing processes to deal with living beings which are born on vessels and fed with bait, while extensive cattle-keeping tends to seek better conditions in the care of habitats, species and ecosystems, for example through free grazing depending on the season.
2. Understood as a hyper-technologized and accelerated scientific practice in which technology is no longer solely the result of the application of scientific knowledge but rather transforms scientific practice itself at every level, from processes to values.
3. For reasons of focus and space, we will leave aside for now the epistemological-ontological discussion of the nature of this intensive scientific knowledge. For the discussion in hand, we will assume that intensive science does produce knowledge in the same way that intensive agriculture and cattle-keeping do produce meat, eggs and vegetables.
Recommended citation: GUERSENZVAIG, Ariel; SÁNCHEZ MONEDERO, Javier. Nobody writing and nobody reading: artificial intelligence chatbots and the science we want. Mosaic [online], June 2023, no. 199. ISSN: 1696-3296. DOI: https://doi.org/10.7238/m.n199.2309