Therapeutic Alliance with an AI Therapist: What 527 Users Showed in a 2025 Study
A cross-sectional study of 527 users of the AI chatbot Clare (Schäfer et al., 2025) measured therapeutic alliance at 3.76 on the WAI-SR (max 5) — comparable to in-person outpatient psychotherapy and group CBT. The strongest alliance with the AI was formed by lonely users (r = 0.25) and those with marked symptoms of anxiety or depression (r = 0.37).
Why alliance predicts therapy outcomes better than technique
In psychotherapy, the "therapeutic alliance" refers to the working bond between client and clinician. Bordin (1979) decomposed it into three components: agreement on goals (Goal), agreement on tasks and methods (Task), and the emotional bond (Bond). These three dimensions were operationalized in the Working Alliance Inventory — the most widely used alliance measure (Horvath & Greenberg, 1989).
Wampold (2015), in a review that has become canonical, showed that the "common factors" of therapy — alliance, empathy, agreement on goals — explain a much larger share of outcome variance than the therapeutic modality itself. By his summary, the school of therapy (CBT, psychodynamic, humanistic) accounts for 0–1% of differences in outcome, while alliance accounts for roughly 5–7% — clinically a comparable or larger effect than the choice of method.
The implication is not that "technique doesn't matter," but something else: if an AI chatbot cannot form a working bond with the user, no internal CBT protocol will deliver the expected effect. So the question "is alliance with AI possible" is not philosophical but strictly operational.
Can users actually form an alliance with an AI chatbot?
By 2025 there is enough empirical data to answer this question quantitatively — using the same WAI scale that has been applied to human therapy for decades.
Darcy et al. (2021) ran the largest alliance measurement with an AI to date — 36,070 users of the Woebot chatbot. The Bond subscale at 3–5 days of use averaged M = 3.8 (SD = 1.0), exceeding the common clinical threshold for "high alliance" of 3.45 (Jasper et al., 2014). For Wysa, an analogous measurement on 1,205 users yielded M = 3.64 (Beatty et al., 2022).
These results are paradoxical at first glance: users report a "bond" with a system that does not exist as a person. There are several explanations. First, the "non-judgmental listener" effect — the absence of fear of judgment removes the block typical of a first meeting with a human therapist. Second, an AI chatbot is available the moment it is needed, which intensifies the subjective experience of "responsiveness" — a component of the emotional bond. Third, anthropomorphization: the user fills in the AI as a subject one can trust.
Key takeaway: Across large samples (n = 36,070 for Woebot, n = 1,205 for Wysa, n = 348 for Clare), users consistently rate the alliance with an AI chatbot at 3.6–3.8 out of 5 — a range typically considered high for in-person psychotherapy.
What Schäfer and colleagues found in 527 Clare users
The study by Schäfer, Krause, and Köhler (2025), published in Frontiers in Digital Health, extends the picture with fresh data. The authors examined Clare — a hybrid system from clare&me GmbH (Berlin), combining rule-based dialogue and fine-tuned LLMs, with voice and text formats and protocols drawn from CBT, self-compassion, and mindfulness.
The sample comprised 527 users from the United Kingdom (39%), Germany (30%), and the United States (26%). Mean age was 36.2 years, with a near-symmetrical gender distribution (52.6% women, 46.5% men). Alliance was measured 3–5 days after onboarding (n = 348 at this point).
| WAI-SR subscale | Mean | SD |
|---|---|---|
| Total | 3.76 | 0.72 |
| Bond (emotional connection) | 3.82 | 0.77 |
| Task (agreement on tasks) | 3.74 | 0.78 |
| Goal (agreement on goals) | 3.73 | 0.83 |
Bond = 3.82 — above the clinical threshold of 3.45 and comparable to in-person outpatient psychotherapy and group CBT, as the authors explicitly note in their discussion. In other words, after 3–5 days of working with an AI chatbot, a substantial share of users experiences an emotional bond statistically close to the one formed in face-to-face therapy.
The authors also documented the clinical severity of the sample: 69% of participants had symptoms of anxiety, 59% had symptoms of depression, 32% had high stress, and 86% scored as "lonely" on the UCLA scale. This is not a "light" audience of curious users, but people in real distress.
Who forms a stronger alliance with AI: the user profile
The correlation analysis in Schäfer et al. (2025) yielded an important practical result: alliance with AI is predicted by the user's clinical profile.
Loneliness correlated with total WAI at r = 0.25 (p < 0.001), and separately with Bond at r = 0.21. Psychological distress (PHQ-D) — r = 0.337. Anxiety and depression (PHQ-4) — r = 0.368. Social anxiety (Mini-SPIN) — r = 0.336. All coefficients are statistically significant and fall in the moderate range.
The interpretation: the higher the symptom load and loneliness, the more the user "invests" in the relationship with the AI. This is consistent with Schäfer and colleagues' hypothesis that Clare functions as a low-threshold resource — for people whose social barrier to human therapy (shame, awkwardness, cost, location) is currently insurmountable.
A separate surprise was the gender difference. Men (n = 168) scored higher on alliance than women (n = 176): M = 3.88 vs M = 3.65, t(348) = −3.17, p = 0.002, d = −0.34 (small-to-moderate effect). Against the well-documented finding that men are less likely to seek out a human therapist, this is a potential advantage of the AI format as a first point of entry into care.
This is supported by the user-reported motives. Asked why they chose an AI chatbot rather than a human, 35.7% said "to avoid embarrassment," 35.3% said "to get advice regardless of appearance," and 19.6% said "anonymity." These are not technical advantages but psychological barriers to human therapy that the AI removes at the entry point.
Where AI alliance differs from human alliance: four limits
Despite comparable averages, the alliance with AI works differently than the alliance with a human — and a product that ignores these differences risks making a false promise.
Limit 1: novelty as a driver. Schäfer et al. (2025) themselves note that only 1.52% of participants had previously used other digital mental health tools. A novelty effect may inflate initial alliance ratings, and it is unknown whether the level of 3.76 is sustained at 6 or 12 months. Of 527 participants, only 21 completed the full 8 weeks.
Limit 2: empathy is uneven across subgroups. Gabriel et al. (2024), in a paper with 29 citations, showed that the empathic quality of LLM responses in mental health support tasks differs statistically across patient subgroups and does not always conform to motivational interviewing principles. The "average alliance" in a sample hides dispersion: for some users the chatbot is more empathic than for others.
Limit 3: plasticity at the cost of authenticity. Hadar-Shoval et al. (2023), in Frontiers in Psychiatry, demonstrated that ChatGPT adapts its mentalizing style to the personality structure of the interlocutor. On one hand, this is a personalization resource; on the other, it carries the risk that the model "mirrors" the user's beliefs, losing the therapeutic function of challenge. De Choudhury et al. (2023) describe this "alignment bias" specifically as a clinical anti-pattern.
Limit 4: memory and continuity. Alliance in face-to-face therapy accumulates because the clinician remembers context. Most AI chatbots store either the history of a single session or a short window. Wang et al. (2025), in the AnnaAgent project, showed that multi-session memory (short, long, episodic) fundamentally changes the realism of working with the user — but such an architecture is rare in production systems.
What a product needs to do for alliance to work
The four limits above translate into concrete product requirements.
Memory across sessions. Without it, the user starts each session "from scratch," which breaks the Bond component — recognition and continuity. The architecture must store relevant context, but separately — with informed consent and the option to delete it.
Personalization to the clinical profile. Schäfer et al. (2025) show that the profile of loneliness, social anxiety, and symptom severity predicts alliance. It is logical for the system to adapt to this profile — from speech tone to session length and frequency. Hadar-Shoval et al. (2023) provides the technical confirmation that LLMs can do this if directed to.
Transparency about limitations. Schäfer and colleagues note plainly in their discussion: "despite comparable levels of disclosure, lower trust in chatbots underscores the need for transparent design." An honest declaration that the AI does not replace a clinician in a crisis is not a marketing risk but a condition for sustainable alliance.
Crisis routing. Alliance rests on safety. If a system has no built-in escalation protocol to a human and to local services in moments of suicidal ideation, the user's trust justifiably drops. This is the "Ethical safeguards" tier of the MIND-SAFE framework.
Limitations of the Schäfer et al. (2025) study
A fair reading of this work requires acknowledging its boundaries.
First, the sample is exclusively Western — UK, Germany, US. Applicability to Eastern Europe, Central Asia, and other contexts remains an open question.
Second, measurement at 3–5 days does not answer the question of long-term stability of alliance. The authors place this question themselves in the "future research" section.
Third, the construct validity of WAI-SR for an AI context is uncertain: the Bond subscale, originally developed for human relationships, may measure something different in an AI context than in face-to-face therapy.
Fourth, there is strong attrition: only 21 of 527 completed all 8 weeks, and non-completers had higher distress levels. This biases the picture toward "less severe" users in the long-term metrics.
Finally, the authors did not collect diagnoses or histories of prior or current therapy. Without this, we cannot say whether Clare replaces human help for those already in care, complements it, or serves as a bridge for those who haven't yet entered the system.
Frequently asked questions
What is therapeutic alliance in plain terms?
It is the working bond between client and clinician, composed of an emotional connection, agreement on goals, and agreement on methods of work (Bordin, 1979). Wampold (2015), in his review of the common factors of therapy, showed that the quality of alliance predicts treatment outcome more strongly than the chosen school of therapy.
Can someone really form an alliance with an AI chatbot?
Empirically, yes. In large samples (n = 36,070 for Woebot, n = 348 for Clare), users rate the alliance with AI at 3.6–3.8 out of 5 on WAI-SR (Darcy et al., 2021; Schäfer et al., 2025). This is comparable to in-person outpatient psychotherapy and group CBT.
Who is the AI chatbot especially well-suited for?
According to Schäfer et al. (2025), the strongest alliance with Clare was formed by lonely users (r = 0.25), people with marked anxiety or depression (r = 0.37), and social anxiety (r = 0.34). Men scored significantly higher on alliance than women (d = −0.34). The AI format removes the barrier of shame and social exposure typical of a first meeting with a human therapist.
Where does AI alliance fall short of human alliance?
In three respects. The empathy of LLMs is uneven across user subgroups (Gabriel et al., 2024). Models tend to "mirror" the interlocutor's beliefs, weakening the therapeutic function of challenge (Hadar-Shoval et al., 2023). Most systems lack long-term memory, breaking continuity in the relationship (Wang et al., 2025).
Does an AI therapist replace a human psychologist?
No. Schäfer et al. (2025) describe Clare as a "low-threshold" resource — an entry point for people for whom the barrier of in-person therapy is insurmountable (shame, location, cost). The authors state plainly that AI can reduce shame and nervousness around seeking help, but does not replace a clinician in a crisis or in severe clinical cases.
Practical takeaway
Alliance with an AI chatbot is a measurable quantity, and 3.76 out of 5 in Clare describes a real working connection, not a marketing illusion. But this connection works on its own terms: it is reinforced by loneliness, anxiety/depression symptoms, and a low social threshold; it is broken by absence of memory, formulaic empathy, and opaque limitations.
At Nearby we deliberately design the product against these constraints: CBT protocols rather than "universal empathy," memory across sessions for continuity, psychological typing for style personalization, and an explicit declaration of where the AI's competence ends and a human clinician's work begins. If you are considering an AI chatbot as a first step — it is a workable point of entry. If as a replacement for a clinician in a crisis — the data still favors the human.
Related reading: Self-help with AI inner dialogue, Why a multi-agent AI therapist outperforms an ordinary chatbot, MIND-SAFE: a safety standard for AI assistants.
References
Beatty, C., Malik, T., Meheli, S., & Sinha, C. (2022). Evaluating the therapeutic alliance with a free-text CBT conversational agent (Wysa): A mixed-methods study. Frontiers in Digital Health, 4, 847991. https://doi.org/10.3389/fdgth.2022.847991
Bordin, E. S. (1979). The generalizability of the psychoanalytic concept of the working alliance. Psychotherapy: Theory, Research & Practice, 16(3), 252–260. https://doi.org/10.1037/h0085885
Darcy, A., Daniels, J., Salinger, D., Wicks, P., & Robinson, A. (2021). Evidence of human-level bonds established with a digital conversational agent: Cross-sectional, retrospective observational study. JMIR Formative Research, 5(5), e27868. https://doi.org/10.2196/27868
De Choudhury, M., Pendse, S. R., & Kumar, N. (2023). Benefits and harms of large language models in digital mental health. arXiv. https://doi.org/10.48550/arxiv.2311.14693
Gabriel, S., Puri, I., Xu, X., Malgaroli, M., & Ghassemi, M. (2024). Can AI relate: Testing large language model response for mental health support. arXiv. https://doi.org/10.48550/arxiv.2405.12021
Hadar-Shoval, D., Elyoseph, Z., & Lvovsky, M. (2023). The plasticity of ChatGPT's mentalizing abilities: Personalization for personality structures. Frontiers in Psychiatry, 14, 1234397. https://doi.org/10.3389/fpsyt.2023.1234397
Horvath, A. O., & Greenberg, L. S. (1989). Development and validation of the Working Alliance Inventory. Journal of Counseling Psychology, 36(2), 223–233. https://doi.org/10.1037/0022-0167.36.2.223
Schäfer, L. M., Krause, T., & Köhler, S. (2025). User characteristics, motives, and therapeutic alliance in mental health conversational AI Clare. Frontiers in Digital Health, 7, 1576135. https://doi.org/10.3389/fdgth.2025.1576135
Wampold, B. E. (2015). How important are the common factors in psychotherapy? An update. World Psychiatry, 14(3), 270–277. https://doi.org/10.1002/wps.20238
Wang, M., Wang, P., Wu, L., Yang, X., Wang, D., Feng, S., Chen, Y., Wang, B., & Zhang, Y. (2025). AnnaAgent: Dynamic evolution agent system with multi-session memory for realistic seeker simulation. arXiv. https://doi.org/10.18653/v1/2025.findings-acl.1192