LLMs as Research Partners in the Humanities

Başak Pırıl Gökayaz
Sep 3
6 min read

Large language models aren’t oracles—and they’re not replacements for scholars. But used deliberately, they can act like junior research partners: fast at surfacing patterns, proposing alternative framings, tidying messy corpora, or sketching argument maps—yet prone to fabrication, bias, and cultural flattening if unsupervised. Treating prompts as methods, pinning model versions, and verifying every quote or claim against primary sources turns ad-hoc tinkering into an auditable practice.

This piece outlines where LLMs genuinely help across the humanities research cycle, where they fail, and the ethical guardrails—disclosure, reproducibility, and community-aware data practices—that keep meaning, method, and responsibility firmly in human hands.

What Are LLMs and Why Do Humanists Care?

Large language models (LLMs) like GPT‑4 and its successors are computer programs trained on vast collections of text. They can answer questions, generate summaries and translations, write code and poetry, and even converse in natural language. Because they are pre‑trained on data from many disciplines, they can easily switch between tasks. It is this general‑purpose capability that makes them attractive to scholars outside the sciences. The same model that can draft an email for a graduate student might also help a historian classify themes in diaries or support a linguist in translating dialects. Yet their broad applicability also raises questions about when and how they should be used in research.

The humanities have long relied on careful reading, interpretation, and contextualization. Early digital humanities projects focused on building corpora and performing computational analysis.

LLMs add a new dimension: they can generate prose, summarize complex arguments, and even offer interpretive suggestions. Institutions such as Cambridge University Press note that LLMs “are revolutionizing the landscape of humanities research” because they scale up data analytics and automate some qualitative tasks.

Sessions at universities now highlight how these tools enhance data analysis, refine research methodologies, enable data‑driven decisions, unlock new data sources, and encourage interdisciplinary collaboration. To make the most of these opportunities, researchers must understand both the potential and the limitations of these models.

Opportunities: How LLMs Help Humanities Scholars

Circular diagram of the humanities research cycle highlighting human-led steps and LLM-assisted touchpoints from scoping to verification. — Collaboration of the humanities scholars and LLMs

Let’s learn how LLMs help humanities scholars:

Scaling Analysis and Discovery

One promise of LLMs is the ability to process large amounts of text quickly. Humanities projects often involve sifting through thousands of pages of archives, newspapers or social‑media posts.

A call for papers from the journal Computational Humanities Research invites studies on systematic frameworks for integrating LLMs into humanities methodologies, case studies from disciplines such as history or literature, and comparative studies of LLM performance against traditional methods. This suggests a shift from using digital tools merely for counting words to using them as partners in interpretation.

The University of Gothenburg notes that advanced models can enhance data analysis, refine research methods, and unlock access to novel data sources. With the right prompts, an LLM can summarize themes, suggest classification,s or flag unusual phrases that warrant closer reading.

Qualitative Coding and Thematic Analysis

LLMs are particularly useful for qualitative research.

Petre Breazu of the University of Cambridge describes experiments where GPT‑4 performed thematic analysis on YouTube comments. In one experiment, the model followed Braun and Clarke’s six‑step procedure: reading the data, coding key phrases, identifying overarching themes, and describing each theme. Human evaluators then compared the model’s themes with those identified by an expert qualitative researcher.

Breazu’s team found that GPT‑4 could generate useful initial categorizations, reducing the time needed to process large datasets. However, depth and specificity often came from the researcher’s contextual knowledge, and the model occasionally misclassified comments because it lacked socio‑political awareness.

Supporting Low‑Resource Languages and Cultural Heritage

For scholars working with endangered or historical languages, LLMs offer both opportunities and challenges. An arXiv paper on low‑resource languages argues that LLMs can mitigate data scarcity through techniques such as few‑shot learning, meta‑learning, and unsupervised pre‑training.

Promising strategies include transfer learning (pre‑training on high‑resource languages and fine‑tuning on related low‑resource ones), synthetic data generation (creating artificial corpora that replicate linguistic features), and collaborative annotation with native speakers. These approaches, combined with community involvement and advanced computational methods, hold the potential to unlock linguistic treasures and preserve cultural diversity.

Limitations and Challenges

LLMs come with real constraints, edge cases, and trade-offs. These limits don’t cancel their value, but they set the terms of responsible use. We’ll name the key challenges next.

Biases and Misrepresentation

LLMs are trained on large datasets that reflect societal prejudices. Breazu warns that models can reproduce stereotypes or amplify existing biases. This is particularly problematic when analyzing sensitive topics such as race or gender.

LLMs may also generate content that appears factual but is fabricated, making them potential tools for spreading misinformation. Researchers must be vigilant in identifying and correcting these biases and should use LLMs as aids rather than authoritative sources.

Misalignment with Human Expectations

An MIT News report notes that the wide applicability of LLMs makes them difficult to evaluate systematically. Because humans decide when to deploy these models, evaluating them requires understanding how people form beliefs about their capabilities.

MIT researchers introduced a human generalization function to measure how people update their beliefs about an LLM after interacting with it. They found that when an LLM is misaligned with a user’s expectations, users can be over‑ or under‑confident about the tasks it can handle, leading to failures in high‑stakes situations. This underscores the need for training and guidelines to help researchers choose appropriate tasks for LLMs.

Synthetic Data and Epistemological Questions

In social science, LLMs are increasingly used to generate synthetic data for data augmentation, prototyping, or even direct analysis where models act as proxies for human subjects. However, such synthetic data rests on different epistemological assumptions than traditional methods.

An essay in the journal Sociologica challenges assumptions about LLM‑generated data and highlights the methodological issues social sciences and humanities must address before adopting these tools. Using synthetic data without careful scrutiny may lead to false conclusions or reinforce existing biases.

Responsible Use of AI in Academic Writing

As LLMs become easier to access, scholars are using them to brainstorm ideas and even draft parts of manuscripts. A 2024 ethical framework developed by researchers from Oxford, Cambridge, and other institutions emphasizes that human oversight is essential.

The guidelines propose three criteria to maximize the benefits of LLMs while maintaining integrity:

Human vetting and guaranteeing accuracy and integrity;
Ensuring substantial human contribution to the work;
Appropriate acknowledgment and transparency of LLM use.

The authors provide a template for LLM use acknowledgments so that authors can disclose AI assistance. Philosopher Julian Savulescu likens LLMs to “Pandora’s Box” for research, warning that they could eliminate academic independence yet also enable co‑creation and productivity. Co‑author Brian Earp stresses that ethical guidelines are about maximizing potential benefits while reducing risks.

Keeping Humans in the Loop

The Cambridge experiments show that combining human expertise with AI capabilities produces the best outcomes. Researchers should treat LLM outputs as drafts or suggestions to be evaluated critically. Clear, theory‑driven prompts improve the alignment between model outputs and human analysis. Training models with context‑specific information and feedback mechanisms can enhance their interpretative abilities. In all cases, human oversight is required to interpret ambiguous or sensitive content.

Transparency and Attribution

When using LLMs as partners, transparency about their role is crucial. Proper citations should distinguish between human analysis and AI‑generated text. Acknowledging AI assistance respects the contributions of both human and machine and helps readers evaluate the reliability of the findings. Ethical guidelines emphasize that disclosure fosters a more inclusive environment where human ingenuity and machine intelligence can jointly enhance scholarly work.

Towards Collaborative Intelligence

Large language models offer humanities scholars powerful new tools to analyze texts, explore patterns, and even translate endangered languages. They can accelerate qualitative coding, unlock novel research questions, and foster interdisciplinary collaboration. Yet their weaknesses—biases, misalignment with human expectations, and the epistemological challenges of synthetic data—mean they cannot replace human judgment.

Responsible use requires transparency, ethical guidelines, and a commitment to keeping humans in the loop. By viewing LLMs as research partners rather than oracles, humanities scholars can harness computational power while preserving the critical, contextual approaches that define their disciplines.

Sources:

Bail, Christopher A. “Can Generative AI Improve Social Science?” Proceedings of the National Academy of Sciences, vol. 121, no. 21, 2024, e2314021121. https://doi.org/10.1073/pnas.2314021121.

“Expanding the Toolkit: Large Language Models in Humanities Research (Call for Papers).” Computational Humanities Research, Cambridge University Press, 1 Aug. 2024–3 Feb. 2025, https://www.cambridge.org/core/journals/computational-humanities-research/announcements/call-for-papers/expanding-the-toolkit-large-language-models-in-humanities-research.

“New Ethical Framework to Help Navigate Use of AI in Academic Research.” Humanities Division, University of Oxford, 13 Nov. 2024, https://www.ox.ac.uk/news/2024-11-13-new-ethical-framework-help-navigate-use-ai-academic-research.

Rossi, Luca, Katherine Harrison, and Irina Shklovski. “The Problems of LLM-Generated Data in Social Science Research.” Sociologica, vol. 18, no. 2, 2024, pp. 145–168. https://doi.org/10.6092/issn.1971-8853/19576.

Zewe, Adam. “Large Language Models Don’t Behave like People, Even Though We May Expect Them To.” MIT News, 23 July 2024, https://news.mit.edu/2024/large-language-models-dont-behave-like-people-0723.

Zhong, Tianyang, et al. “Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research.” arXiv, 30 Nov. 2024, https://arxiv.org/abs/2412.04497.