A meta-analysis of 35 studies involving more than 17,000 participants found that AI chatbots significantly reduce symptoms of depression (Hedges' g = 0.64) and psychological distress (g = 0.70). Generative models proved 2.4 times more effective than scripted systems. Below is a breakdown of the data, key findings, and limitations from the largest evidence review of AI therapy to date.

What is this meta-analysis and why does it matter?

In 2023, a team of researchers from the National University of Singapore and Northwestern University (USA) published a systematic review and meta-analysis in NPJ Digital Medicine — a top-quartile journal in digital health (Li et al., 2023). At the time of publication, it was the most comprehensive analysis of AI chatbots for mental health.

The authors searched 12 academic databases, selected 35 experimental studies (15 of which were randomized controlled trials), and synthesized data from 17,123 participants across 15 countries. The review covered 23 different systems, ranging from well-known platforms like Woebot and Wysa to less familiar ones such as Tess, Elomia, XiaoE, and VRECC.

Previous reviews (Vaidyam et al., 2019) focused primarily on scripted chatbots with predetermined conversation flows. This meta-analysis was the first to specifically isolate and compare AI-driven systems — those using natural language processing and generative models.

Does an AI chatbot reduce symptoms of depression?

Yes. A meta-analysis of 13 RCTs (1,744 participants) found a statistically significant reduction in psychological distress: Hedges' g = 0.70 (95% CI: 0.18–1.22). Individual outcomes broke down as follows:

Depression: g = 0.64 (95% CI: 0.17–1.12) — significant reduction
Anxiety: g = 0.65 (95% CI: −0.46–1.77) — not significant
General well-being: g = 0.32 (95% CI: −0.13–0.78) — not significant

An effect size of g = 0.64 is considered medium on Cohen's scale. This is comparable to the effect of several established psychotherapeutic interventions. Earlier meta-analyses of scripted chatbots reported more modest results — g ranging from 0.24 to 0.47 (Vaidyam et al., 2019).

Overall psychological well-being did not improve. The authors attribute this to the fact that well-being measures are more stable over time and less sensitive to short-term interventions. Additionally, only 8 RCTs contributed to this outcome — likely insufficient statistical power.

Generative AI vs scripted chatbots: a 2.4x difference

The most striking result was the gap between AI types. Generative models (GPT, BERT) produced an effect size of g = 1.24, while scripted (retrieval-based) systems yielded g = 0.52. The difference was statistically significant (F = 4.88, p = 0.019).

Generative models don't follow pre-written scripts — they produce responses from scratch, adapting to the conversation's context. This allows them to respond more accurately to emotional states, offer personalized recommendations, and sustain a more natural dialogue.

Of the 35 systems studied, only 5 (14.3%) used a generative approach, yet they delivered the largest therapeutic effects. One such system — Therabot — showed a 51% reduction in depression in a separate clinical trial (Sharma et al., 2023). Since 2023, the share of generative systems has been growing rapidly: new platforms increasingly build on large language models.

Who benefits most from AI therapy?

Subgroup analysis revealed several significant moderators of effectiveness:

Health status. People with clinical or subclinical symptoms (g = 1.07) benefited 10 times more than healthy participants (g = 0.11). This is consistent with a well-established principle: psychological interventions are most effective for those who genuinely need help (F = 7.15, p = 0.005).

Platform. Mobile apps (g = 0.96) and messaging platforms (g = 0.75) significantly outperformed web-based versions (g = −0.08). The smartphone remains the primary channel for accessing AI therapy (F = 3.26, p = 0.046).

Modality. Multimodal systems — combining text, voice, and visual elements — (g = 0.83) somewhat outperformed text-only systems (g = 0.67). A voice component strengthens the sense of social presence.

Age. Middle-aged and older adults (g = 0.85) benefited more than younger users (g = 0.64). Gender had no effect on outcomes. A separate 2025 meta-analysis confirmed that chatbots also reduce distress among young people, though with a smaller effect (Li et al., 2025).

What do users value in an AI therapist?

Sixteen of the 35 studies collected qualitative feedback from participants. The key drivers of positive experience:

Therapeutic alliance (8 studies): empathic communication, non-judgmental tone, regular check-ins, human-like personality
Content quality (6 studies): concrete therapeutic techniques, rich content, support in building coping skills
Accessibility (2 studies): around-the-clock availability, no waiting lists, no stigma

The primary source of negative experience was communication breakdowns (8 studies): the chatbot failed to understand context, gave irrelevant or formulaic responses. User experience research confirms that dialogue quality is a critical success factor for AI therapy (Song et al., 2024). It's the ability to sustain a genuine conversation — rather than deliver a canned response — that determines whether someone stays in therapy.

The safety problem: more than half of systems lacked safeguards

An alarming finding: only 15 of the 35 systems studied (43%) reported having safety measures in place. Of those:

Automatic crisis detection: 10 systems
Access to a human clinician: 3 systems
Adverse effect monitoring: 3 systems

This means that more than half of the systems operated without mechanisms for detecting suicidal ideation, without escalation to a human professional, and without monitoring for harmful reactions. For clinical use, this is unacceptable: deploying LLMs without dedicated safety mechanisms creates real risks (De Choudhury et al., 2023).

Limitations of the meta-analysis: what remains unproven

The review authors are transparent about significant limitations:

High heterogeneity (I² = 95.3%) — studies varied widely in design, populations, and measurement tools
Little long-term data — only 6 of 35 studies tracked outcomes after the intervention ended
Language bias — only English-language publications were analyzed
Few generative systems — 5 of 35, which limits conclusions about LLMs
Risk of bias — only 2 of 15 RCTs received a low risk-of-bias rating on the Cochrane scale

Still, this is the most comprehensive and up-to-date review of the evidence base for AI chatbots in mental health, published in a peer-reviewed Q1 journal with 248 citations.

What this means in practice

The meta-analysis confirms that AI chatbots are not a replacement for a therapist — but they are a meaningful support tool. The most effective approaches share these traits:

Generative models (not scripted ones)
Mobile apps (not web-based versions)
CBT-based systems designed for people with real symptoms
Platforms with built-in safety mechanisms

This is exactly the approach behind Nearby: generative AI grounded in CBT protocols with a multi-agent architecture, crisis detection mechanisms, and a mobile-first format. Not "another ChatGPT," but a specialized system designed around what the science has shown.

Frequently asked questions

Can an AI chatbot really help with depression?

Yes. A meta-analysis of 15 RCTs found a statistically significant reduction in depressive symptoms (Hedges' g = 0.64). The effect is comparable to some traditional psychotherapeutic interventions, although a direct comparison with in-person therapy was not conducted in this review.

What type of AI chatbot is most effective?

Generative models (GPT, BERT) showed an effect size 2.4 times larger than scripted systems. Mobile apps outperformed web-based versions. Multimodal systems (text + voice) outperformed text-only ones.

Are AI chatbots for mental health safe?

Not all of them. Only 43% of the systems studied had built-in safety mechanisms. When choosing a platform, it's important to verify that it can detect crisis situations, escalate to a human professional, and monitor for adverse effects.

Can an AI chatbot replace a therapist?

No. The meta-analysis authors emphasize that AI chatbots are not designed to replace professional help. They function as a complementary tool — available around the clock, without stigma or waiting lists.

Does an AI chatbot help with anxiety?

The evidence is not yet convincing. The meta-analysis did not find a statistically significant effect for anxiety (g = 0.65, CI: −0.46–1.77). This may be due to the small number of studies and high heterogeneity. Additional RCTs are needed.

References

Li, H., Zhang, R., Lee, Y.-C., Kraut, R. E., & Mohr, D. C. (2023). Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being. NPJ Digital Medicine, 6(1), 236. https://doi.org/10.1038/s41746-023-00979-5

Vaidyam, A. N., Wisniewski, H., Halamka, J. D., Kashavan, M. S., & Torous, J. B. (2019). Chatbots and conversational agents in mental health: A review of the psychiatric landscape. The Canadian Journal of Psychiatry, 64(7), 456–464. https://doi.org/10.1177/0706743719828977

Sharma, A., et al. (2023). Human-centered evaluation of generative AI-based therapy chatbot. NEJM AI, 1(2). https://doi.org/10.1056/AIoa2300127

Song, I., Pendse, S. R., Kumar, N., & De Choudhury, M. (2024). The typing cure: Experiences with large language model chatbots for mental health support. Proceedings of the ACM on Human-Computer Interaction. https://doi.org/10.1145/3757430

De Choudhury, M., Pendse, S. R., & Kumar, N. (2023). Benefits and harms of large language models in digital mental health. ArXiv. https://doi.org/10.48550/arxiv.2311.14693

Li, J., Li, Y., Hu, Y., Ma, D. C. F., Mei, X., Chan, E. A., & Yorke, J. (2025). Chatbot-delivered interventions for improving mental health among young people: A systematic review and meta-analysis. Worldviews on Evidence-Based Nursing. https://doi.org/10.1111/wvn.70059