One of the hottest debates in user research right now is whether large language models (LLMs) can do qualitative analysis, and if so, how. There’s no clear consensus yet. The space is experimental, guided mostly by broad principles like keeping a “human in the loop.”
At a surface level, the case seems pretty obvious. Upload your interview transcripts, ask for themes, get a structured summary in seconds. Compared to hours of coding and synthesis, that feels like a genuine breakthrough.
But there are harder questions to consider underneath that efficiency: if the model is doing the analysis, who is actually doing the thinking? And what does that mean for your research?
This post isn’t an argument against using AI in qualitative analysis. It’s an argument for using it deliberately with a clear understanding of what it can do, and what it can’t.
The transcript is not the interview
One thing that can get missed in these conversations is the idea that a transcript is a limited artefact. It is a written record of words… but it is not the interview.
The interview is a lived interaction. It includes the hesitation before a difficult answer. The shift in energy when a topic touches something personal or difficult. The moment a participant evades a question or contradicts themselves and does not seem to notice. The probe you chose not to ask because the room told you it was not the right moment. None of that critical information lives in the transcript.
When a researcher analyses qualitative data, they are not just reading the transcript text. They are drawing on everything they experienced in the room, everything they noticed before and after the recording started, and everything they brought to that conversation through their own experiences and skills. They are also exercising critical judgement, noting and probing on details they feel are important to the study throughout the exchange. That is not a flaw in the method. It is the method.
When you upload a transcript to an LLM, you are giving it a shadow of the original interaction. The model will process that text competently. It will surface patterns. But it has no access to those crucial details that the transcript cannot capture, and it has no way of knowing what it is missing.
Qualitative analysis is reflexive, not mechanical
There is a common misconception that qualitative analysis is about finding themes (or, worse, ‘user needs’) that are hiding in the data, waiting to be discovered. It is not. It is about constructing a defensible interpretation of complex and subjective data. The themes you identify are not neutral observations. They reflect choices you have made about what matters, what to foreground, and how to frame meaning. That is why two experienced researchers working on the same dataset can produce different, equally valid analyses.
This is not a weakness of qualitative work. It is what makes it powerful. Quantitative methods answer “how many” and “how often”. Qualitative methods answer “why” and “what does this mean”. The human researcher is not a variable to be controlled for, they are the instrument through which meaning gets made.
This has important implications for how we think about AI assistance. An LLM can identify patterns in text. However, it cannot exercise the kind of reflexive judgement that good qualitative analysis requires. It does not know what shaped your line of questioning. It cannot weigh what a participant seemed reluctant to say against what they eventually did say. It cannot identify a contradiction in the way an experienced researcher can.
The complexity of the data matters
A researcher should think carefully about the complexity of the data and the object of study before deciding whether to use an LLM for analysis. As a general rule, the more complex and diffuse the study data, the more context and reflexivity is required for analysis.
With simple open-ended survey responses, or short written feedback on a product or feature, the context is largely embedded in the question itself. This means the data is constrained and the interpretive leap required is relatively small. Using an LLM to cluster this kind of data and surface patterns is a pragmatic, low-risk choice. The efficiency gains are significant and the risks are manageable.
A series of ninety-minute depth interviews are a different matter. So are ethnographic conversations, sessions exploring sensitive topics, or any research where adaptive questioning or behavioural responses shaped the direction of the exchange. Here, the meaning of what was said depends on the full arc of the conversation, on what came before and after, on decisions made in the moment by a skilled interviewer. The transcript alone cannot adequately carry and convey that complexity.
Beware the outsourcing of meaning making
The threat here is not that AI will do qualitative analysis. It is that researchers will stop doing it themselves.
When a researcher lets an LLM generate primary themes before forming their own interpretation, several things go wrong. Context and nuance get flattened, that is, what the participant meant beyond the words recorded in the transcript. Coherence gets imposed on the data where ambiguity actually exists. The model is shaping what gets foregrounded and, ultimately, the findings of the study. If the prompt used was too vague, then fabricated quotes can appear to support tenuous or misleading findings. Often these sound plausible enough to slip through unchecked, even to experienced researchers.
Most importantly, the analytical authority shifts. The model becomes the analyst, the human becomes the reviewer. That reversal really matters. If you cannot explain how your interpretations were reached, your findings cannot be defended.
The credibility of your research depends on a chain of reasoning that connects your data to your claims. Outsource that chain and you may have outputs, but you do not have analysis.
How to use LLMs well in qualitative work
My position here is not anti-AI. It is anti-uncritical-outsourcing. Used well, LLMs offer real value. I am advocating for researchers creating a clear analysis workflow and sequencing the use of LLMs in sensible ways.
With any complex qualitative study you must do your own analysis first. Conduct your sessions, take structured notes immediately after, debrief with observers, re-immerse yourself in the data. Then write down your ideas and early themes before you open any AI tool. Articulate how you define your findings, what you consider to be important and where you feel uncertain. You must form your own analytic position.
Only then use the model for bounded tasks: extracting quotes aligned to your themes, checking for disconfirming evidence, comparing patterns across participant groups, flagging things you might have overlooked. In this configuration, the model is extending your analysis rather than replacing it. It is a fast, useful collaborator with a clearly defined supporting role.
Before using AI for analysis, ask whether someone who was not present in your sessions could fully grasp the nuance of your data from the transcript alone. If the answer is no, you need to lead the interpretation yourself. The AI can assist once you have done that work.
Human-led (not ‘human in the-loop’)
Keeping a “human in the loop” is often offered as the responsible middle ground.
Be careful with that framing. It positions the human as a checkpoint in a process the AI is running. Someone who reviews, approves, and moves on. That is not qualitative analysis. That is quality control on someone else’s work.
Human-led is different. It means the researcher sets the direction, forms the interpretation, and carries accountability for the conclusions. The AI does not hand you themes to validate. You develop themes, then use AI to stress-test, retrieve evidence, and check for what you might have missed. The model works inside the frame you built, rather than the other way around.
This distinction really matters. If a stakeholder or colleague challenges your findings, you need to be able to explain your interpretation. Where did this theme come from? What data supports it? Why did you frame it this way rather than another?
Used in a human-led way, AI can genuinely make researchers better: faster at retrieving evidence, more systematic about disconfirming cases, more transparent about how analysis was done. These benefits only materialise when the researcher stays in the driving seat throughout.
If we lose sight of who is responsible for meaning, we risk losing the thing that makes qualitative inquiry worth doing in the first place: a genuine attempt to understand what something means to the people who lived it. Stories told and understood with human judgement, not predicted by a language model.
A practical protocol for complex data analysis with an LLM
If you are working with high complexity qualitative data, using a clear workflow is really important.
1. Run your sessions
Conduct research as you normally would. Have observers present and prime them to capture both key quotes as well as behavioural observations. Take structured notes. Capture immediate impressions.
This means you are not limited to transcripts as the source data for deciphering meaning
2. Capture your own interpretation immediately
As moderator, record:
- Significant observations
- Emotional inflection points
- Emerging patterns
- Surprises
This preserves context that will not appear clearly in text alone.
3. Debrief with observers
Compare notes. Identify the key observations and any points of disagreement.
This human analysis is the first layer of rigour. It should precede any AI involvement.
4. Re immerse yourself in the data
Reread transcripts. Rewatch key clips if possible. Reconnect with participants’ stories and the main points relevant to your objectives.
Only after you have revisited the context should you move towards structured analysis.
5. Articulate your themes before consulting AI
This step is critical.
Write down:
- Your top themes and findings
- How you define them
- Where you feel uncertain
- Key quotes or participants that validate your findings. Form your own analytic position first. Without this step, you are not leading the analysis.
6. Use the LLM for bounded tasks
Now the model becomes useful.
You can ask it to:
- Extract verbatim quotes aligned to your themes
- Identify disconfirming evidence
- Compare themes across segments
- Highlight overlooked details
- Summarise specific subsets of data
This way the model extends your analysis rather than replacing it. Make sure to use explicit constraints in LLM prompts to guard against fabricated citations.
7. Guard against distortion
Several practical safeguards are necessary:
- Verify important quotes manually
- Check for fabricated citations
- Submit large datasets in batches to ensure you do not exhaust the context window
- Be explicit in your instructions to LLMs
- Treat outputs as drafts, not findings
Remember that models have context window limits. Large uploads can degrade performance silently. LLMs won’t tell you when they have exhausted their context window
A Simple Decision Test
Before using an LLM for analysis, ask yourself:
- How complex is the data?
- How much does interpretation depend on lived context?
- Did adaptive questioning shape meaning significantly?
Would someone who was not present grasp the nuance fully just by reading a transcript? If the answer is yes, you must lead.
About the author
With over 18 years in user and market research, Nick helps organisations understand people through mixed methods research and human-centred design practice. Now focused on the intersection of AI and qualitative inquiry, his work is about developing practical protocols and workflows that help research teams use AI safely and effectively, without losing the human judgement that makes qualitative work meaningful.
A note on AI use in creating this blog
I used AI tools, including Claude.ai and ChatGPT, to help write this post.
Every blog post starts with a theory or an idea I want to explore, which often begins as a long-form written summary including all main points and arguments. From there, I use LLMs to help with secondary research, structuring thinking, and editing drafts.
All the ideas, arguments, and conclusions here are my own. The work is human-led, AI-assisted