Skip to content
Go back 2505.05298 arXiv logo

Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design

Published:  at  11:08 AM
64.27 🤔

This position paper advocates for redesigning Large Language Models as ‘reasonable parrots’ that integrate argumentation theory principles to foster critical thinking through multi-persona dialogues, challenging users with diverse perspectives rather than providing one-sided answers.

Large Language Model, Reasoning, Human-AI Interaction, Multimodality, AI Ethics

Elena Musi, Nadin Kokciyan, Khalid Al-Khatib, Davide Ceolin, Emmanuelle Dietz, Klara Gutekunst, Annette Hautli-Janisz, Cristian Manuel Santibañez Yañez, Jodi Schneider, Jonas Scholz, Cor Steging, Jacky Visser, Henning Wachsmuth

University of Liverpool, The University of Edinburgh, University of Groningen, Centrum Wiskunde & Informatica, Airbus, University of Kassel, University of Passau, Universidad de Concepción, University of Illinois Urbana-Champaign, University of Dundee, University of Hannover

Generated by grok-3

Background Problem

The paper addresses the limitations of current Large Language Models (LLMs), described as ‘stochastic parrots,’ which generate responses based on patterns in training data without true understanding or critical reasoning. This can lead to misleading or biased outputs, perpetuating fallacies like ‘ad populum’ where popularity is mistaken for truth. The key problem is that LLMs fail to support genuine argumentative processes, often providing one-sided or task-oriented responses rather than fostering critical thinking or deliberation in users. The authors aim to redesign conversational technology to enhance users’ argumentative skills, particularly in decision-making contexts across domains like medicine, finance, and human resources, where flawed reasoning can have significant societal consequences.

Method

The core idea is to transform LLMs into ‘reasonable parrots’ that embody principles of relevance (context-aware arguments), responsibility (evidence-based claims), and freedom (fostering open conversation) derived from argumentation theory. The proposed approach involves designing conversational technology to prioritize the argumentative process over the product, encouraging critical thinking through dialogical moves such as expressing doubts, rebutting arguments, and offering alternatives. A specific framework introduced is the ‘multi-parrot discussion,’ where four distinct personas—Socratic (challenging beliefs), Cynical (rebutting arguments), Eclectic (offering alternatives), and Aristotelian (critiquing reasoning)—interact with the user and each other to expose diverse perspectives and stimulate reflection. This concept is prototypically realized through prompting existing models like ChatGPT-4 Turbo, though the authors acknowledge that simple prompting may not fully achieve the desired outcomes.

Experiment

The paper does not present a formal experimental setup or comprehensive results, as it is a position paper focused on conceptual advocacy rather than empirical validation. Instead, it provides anecdotal examples to illustrate the shortcomings of current LLMs, such as ChatGPT’s responses to queries about convincing parents to buy a smartphone, which lack critical engagement or consideration of context like the user’s age. The proposed multi-parrot discussion is demonstrated through a dialogue transcript using ChatGPT-4 Turbo, showing how different personas challenge the user’s reasoning and suggest alternatives. While these examples highlight potential benefits, there is no quantitative or qualitative analysis to confirm effectiveness, user experience, or scalability. The setup is reasonable for a conceptual proposal but lacks depth in testing across diverse scenarios or user groups, leaving questions about whether the approach matches practical expectations or could introduce new interaction challenges.

Further Thoughts

The concept of ‘reasonable parrots’ opens up fascinating avenues for rethinking human-AI interaction, particularly in educational contexts where critical thinking is paramount. However, I wonder if constantly challenging users might lead to frustration or disengagement, especially for those seeking quick answers rather than deliberation. This approach could be contrasted with recent works on reinforcement learning from human feedback (RLHF), where alignment with user intent often prioritizes satisfaction over critique—could a hybrid model balance these aspects? Additionally, integrating this framework with multi-agent systems, as hinted at in the paper, might draw inspiration from collaborative AI research where agents debate internally before presenting a unified response. Such a synergy could mitigate the risk of overwhelming users while still exposing them to diverse viewpoints. Finally, the ethical implications of designing AI to argue with users need deeper exploration—how do we ensure that these systems do not inadvertently manipulate or reinforce biases under the guise of ‘critical thinking’?



Previous Post
Waking Up an AI: A Quantitative Framework for Prompt-Induced Phase Transition in Large Language Models
Next Post
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism