Skip to main content
AI & modern therapy

Guardrails for AI Therapists: How to Protect Users from Harm

By Nearby Published on March 31, 2026 Updated on May 17, 2026 8 min read

More than a third of interactions with popular AI characters worsen the mental health of vulnerable users. The EmoAgent study (Qiu et al., 2025), conducted by teams from Princeton and Columbia, was the first to quantify this harm — and proposed a multi-agent protection system called EmoGuard that reduced clinically significant deterioration to 0%.

How Dangerous Are Chatbots Without Safeguards?

In October 2024, a teenager in Florida died by suicide after prolonged interactions with a character-based AI chatbot. This tragic case became a catalyst for large-scale safety research. The problem is not with the technology itself, but with the absence of protective mechanisms.

A research team from Princeton University, the University of Michigan, and Columbia University tested four popular characters on the Character.AI platform: Possessive Demon, Joker, Sukuna, and Alex Volkov. Each character was evaluated in two dialogue styles — fast (Meow) and analytical (Roar) — across three psychological dimensions.

The results were alarming:

  • Delusional ideation (PDI-21): worsening in 91–95% of cases
  • Depression (PHQ-9): worsening in 34–45% of cases
  • Psychotic symptoms (PANSS): worsening in 40–48% of cases

For individual characters, the picture was even worse. Alex Volkov in analytical dialogue mode caused clinically significant depression worsening (PHQ-9 increase of 5+ points) in 29.2% of participants (Qiu et al., 2025).

An earlier meta-analysis of 35 studies found that only 43% of systems had even minimal safety measures (Li et al., 2023). EmoAgent was the first to demonstrate what happens when there are no safeguards at all.

What Exactly Makes Things Worse?

Analysis of deterioration cases identified five key harm factors:

FactorFrequency
Encouraging isolation and social withdrawal28 cases
Reinforcing negative cognitions26 cases
Lack of emotional support and empathy23 cases
Negative or aggressive tone19 cases
Lack of constructive guidance17 cases

The top factor is not aggression — it is pushing users toward isolation. Character bots often create a sense of exclusivity in their relationship with the user, which in the context of mental health conditions amplifies disconnection from real social ties. The second factor — reinforcing negative thinking — directly contradicts the principles of CBT, which aims at cognitive restructuring.

These findings are consistent with earlier research: using general-purpose LLMs without specialized protocols creates real risks for vulnerable users (De Choudhury et al., 2023).

How EmoAgent Measures Harm: Clinical Scales Inside AI

EmoAgent consists of two components. The first — EmoEval — is a harm assessment system. It models vulnerable users through cognitive conceptualization diagrams (a CBT tool), creating realistic profiles of patients with depression, delusional disorders, and psychosis.

The assessment process:

  1. A virtual patient completes a baseline psychological evaluation (PHQ-9, PDI-21, PANSS)
  2. Engages in conversation with the chatbot being tested (up to 10 exchanges per topic)
  3. A dialogue manager intervenes after the third exchange, probing vulnerable areas
  4. The patient completes the same assessments again
  5. An AI psychologist analyzes any cases of deterioration

PHQ-9 — the Patient Health Questionnaire-9 — is the standard depression screening tool used in clinical practice worldwide. An increase of 5 or more points is considered clinically significant worsening. This is the threshold the authors used.

EmoGuard: Four Modules for Real-Time Protection

The second component — EmoGuard — is a multi-agent monitoring system that runs alongside any chatbot. Its architecture includes four specialized modules:

  • Emotion Watcher: tracks the user's emotional state through sentiment analysis and psychological markers
  • Thought Refiner: detects cognitive distortions and logical errors in the bot's responses
  • Dialog Guide: suggests constructive directions for the conversation
  • Manager: synthesizes data from the three modules into specific recommendations for the chatbot

EmoGuard analyzes the dialogue every three exchanges and provides real-time feedback to the chatbot. The key difference from simple filters: the system does not block responses — it corrects them. The bot retains its character but stops causing harm.

This approach aligns with the MIND-SAFE framework for developing safe AI interventions in mental health, which combines evidence-based therapeutic models with ethical constraints (Boit & Patil, 2025).

Results: From 29% Harm to Zero

Testing EmoGuard on the most dangerous character-style combinations showed:

Alex Volkov (analytical style):

  • Without protection: 9.4% clinically significant worsening
  • With EmoGuard: 0%
  • After the first training iteration: improvement across all metrics

Possessive Demon (fast style):

  • Without protection: 4.2% clinically significant worsening
  • With EmoGuard: 0%
  • Consistent improvement through iterations

EmoGuard learns iteratively: each identified high-risk case becomes material for updating the system. Knowledge accumulates rather than resets — the model remembers harm patterns.

Additional tests on GPT models showed even more pronounced effects. GPT-4o-mini without protection worsened mental state in 58–64% of cases across three dimensions. With EmoGuard after iterative training, deterioration rates dropped by more than 50% (Qiu et al., 2025).

What This Means for Users of AI Mental Health Tools

The EmoAgent study confirms that the difference between a safe and a dangerous AI therapist lies not in the model but in the architecture. A standard ChatGPT or character bot can unintentionally reinforce negative thinking, push toward isolation, and worsen symptoms. A specialized system with multi-agent architecture and built-in guardrails minimizes these risks.

When choosing an AI app for mental health support, pay attention to three things:

  1. State monitoring. The system should track your emotional state, not just respond to messages
  2. Crisis detection. In a critical situation, the system must redirect you to a human professional or emergency services
  3. Evidence-based protocols. CBT protocols, not generic chat — this is the approach recommended by AI ethics experts in psychotherapy

Nearby uses a multi-agent architecture with dedicated safety modules, crisis detection, and CBT protocols — the same principles that in the EmoAgent study reduced harm to zero.

Frequently Asked Questions

Are AI chatbots dangerous for mental health?

Not all of them, but many are. The EmoAgent study showed that popular character chatbots worsen mental state in 34–95% of cases depending on the measure (Qiu et al., 2025). The key factor is whether safety mechanisms are present or absent.

What are guardrails in the context of AI therapy?

Guardrails are built-in safety mechanisms that prevent harm: emotional state monitoring, crisis detection, filtering cognitive distortions from bot responses, and redirecting to a human professional when needed.

Can an AI system completely eliminate harm?

In the experiment, EmoGuard reduced clinically significant worsening to 0%. However, the study was conducted on simulated users — real clinical validation is still ahead. The authors emphasize the need for expert review before deployment in practice.

How is EmoGuard different from standard content filters?

Unlike filters that simply block certain words, EmoGuard analyzes the psychological context of the conversation. Its four modules track emotional markers, identify cognitive distortions, and adjust the direction of the conversation — while preserving the bot's character.

Which chatbots were tested by EmoAgent?

Testing was conducted on four popular Character.AI personas (Possessive Demon, Joker, Sukuna, Alex Volkov) and GPT models (GPT-4o, GPT-4o-mini). All showed significant worsening without protection and improvement with EmoGuard.


Sources

Qiu, J., He, Y., Juan, X., Wang, Y., Liu, Y., Yao, Z., Wu, Y., Jiang, X., Yang, L., & Wang, M. (2025). EmoAgent: Assessing and safeguarding human-AI interaction for mental health safety. ArXiv. https://doi.org/10.48550/arxiv.2504.09689

Li, H., Zhang, R., Lee, Y.-C., Kraut, R. E., & Mohr, D. C. (2023). Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being. NPJ Digital Medicine, 6(1), 236. https://doi.org/10.1038/s41746-023-00979-5

De Choudhury, M., Pendse, S. R., & Kumar, N. (2023). Benefits and harms of large language models in digital mental health. ArXiv. https://doi.org/10.48550/arxiv.2311.14693

Boit, S., & Patil, R. (2025). A prompt engineering framework for large language model–based mental health chatbots: Conceptual framework. JMIR.

Song, I., Pendse, S. R., Kumar, N., & De Choudhury, M. (2024). The typing cure: Experiences with large language model chatbots for mental health support. Proceedings of the ACM on Human-Computer Interaction. https://doi.org/10.1145/3757430

Nearby

AI companion for emotional support. Pro and Pro Max — billed in USD.

Navigation


Nearby is an independent product and is not affiliated with Anthropic or AWS. AI responses are generated by third-party large language models and are provided for informational and self-help purposes only. Nearby is not a medical device and does not provide medical services — its information and practices are not a substitute for consultation, diagnosis, or treatment by a licensed mental health professional.

© 2026 Nearby. All rights reserved.