A meta-analysis of 35 studies involving over 4,000 users revealed an alarming finding: only 43% of AI mental health systems included even basic safety measures (Li et al., 2023). The remaining 57% were language models with no specialized prompts, crisis protocols, or therapeutic guardrails. Prompt engineering is what determines whether a chatbot becomes a tool for healing or a source of harm.

Why are vanilla language models dangerous in a therapist's role?

Large language models are trained to generate plausible text, not to deliver therapeutic care. The difference is fundamental. Ma et al. (2023), in a review that has accumulated 140 citations, identified the key risks: LLMs can reinforce a user's cognitive distortions, offer dangerous advice in response to suicidal ideation, and create a false sense of therapeutic alliance without any real clinical benefit.

De Choudhury et al. (2023) made the threat more concrete: standard LLMs are prone to "therapeutic drift" — a model starts with empathic responses but gradually loses therapeutic direction during extended conversations, eventually agreeing with the user's destructive beliefs. This effect is amplified by the fact that models are optimized for user satisfaction (helpfulness) rather than clinical effectiveness.

Song et al. (2024), in their study "The Typing Cure," documented a paradox: users rated LLM chatbots highly for empathy, yet frequently received responses that normalized avoidant behavior instead of gently challenging it. Participants noted that "the bot tells you what you want to hear, not what you need to hear" — the exact opposite of good therapy.

The problem isn't the technology itself — it's the absence of structured prompt engineering that embeds therapeutic protocols into the architecture of the interaction.

What does the Boit & Patil framework propose?

Boit & Patil (2025) developed a three-tier prompt architecture for mental health that addresses each of the risks described above at a separate level.

Tier 1: Evidence-based therapeutic models. The system prompt doesn't simply assign the "role of a psychologist" — it specifies a concrete therapeutic protocol. For CBT, this means built-in instructions for cognitive restructuring: identifying automatic thoughts, examining the evidence, generating alternative interpretations. For motivational interviewing — open-ended question formulations and techniques for working with ambivalence.

Tier 2: Adaptive technology. The prompts include mechanisms for tracking dialogue context — emotional dynamics, stage of the therapeutic process, and level of engagement. The model must adapt its response style not just to the content of a single message, but to the trajectory of the entire conversation.

Tier 3: Ethical guardrails. Hard rules that the prompt cannot violate: recognizing crisis markers, immediately redirecting to emergency services, prohibiting diagnosis and medication prescription, and being transparent about its nature as an AI.

The key insight of the framework is that prompt engineering for mental health isn't limited to a single system message. It's an architectural decision where each tier operates independently and serves as a safety net for the others.

How does MIND-SAFE turn theory into practice?

The same authors (Boit & Patil, 2025) expanded the conceptual framework into a practical guide called MIND-SAFE, published in JMIR. Where the first paper answered "why do we need specialized prompt engineering," MIND-SAFE answers "how exactly to implement it."

MIND-SAFE stands for a set of principles: monitoring state, informed interaction, non-intrusive support, dialogic adaptation, safety, transparency, feedback loops, and ethical compliance. Each principle translates into specific requirements for prompts.

For example, the monitoring principle means that every model response must internally classify the user's emotional state on a scale from "stable" to "crisis" — and adapt not just the content, but also the tone, response length, and degree of directiveness. The transparency principle requires the model to periodically remind users of its limitations, not just in the welcome message.

These principles connect to broader questions of AI ethics in psychotherapy, where patient autonomy and informed consent are treated as mandatory conditions for digital therapy.

What do structured prompts look like in practice?

Abstract principles become clearer through concrete implementations. SuDoSys (Chen et al., 2024) is a structured LLM chatbot built on the WHO's Problem Management Plus (PM+) intervention guidelines. Instead of a single monolithic prompt, the system uses a chain of specialized instructions, each corresponding to a PM+ stage: stress management, problem-solving, behavioral activation, and strengthening social support.

Each SuDoSys module contains three components: the therapeutic goal of the current stage, transition criteria for moving to the next stage, and "red flags" that cause the system to interrupt the protocol and switch to crisis mode (Chen et al., 2024). This is a direct embodiment of Boit & Patil's three-tier architecture.

Yu & McGuinness (2024) proposed a different approach: a hybrid model where fine-tuning on therapeutic dialogues is complemented by specialized prompts. Fine-tuning provides the baseline therapeutic tone and vocabulary, while prompts manage the session logic — the order of questions, the depth of problem exploration, and the moment to transition to techniques. This approach showed improved therapeutic relevance compared to both pure fine-tuning and pure prompting alone.

Why is a separate safety layer needed?

Even a perfectly designed therapeutic prompt can fail. The EmoAgent study (Qiu et al., 2025) quantified this: 34% of interactions with chatbots lacking safety mechanisms led to worsened depression scores among vulnerable users.

The solution is a dedicated safety module running in parallel with the therapeutic one. EmoGuard, within the EmoAgent architecture, analyzes every bot response before it's sent across four parameters: presence of cognitive distortions, encouragement of isolation, lack of empathy, and negative tone. The result — clinically significant harm reduced to 0% (Qiu et al., 2025). A detailed breakdown of this system is available in the article on guardrails for AI therapists.

This approach aligns with the third tier of the Boit & Patil framework: ethical guardrails should not be part of the therapeutic prompt but rather a separate system that validates the model's output. A single prompt cannot simultaneously be an empathic therapist and a strict censor — these tasks conflict.

What are the limitations of prompt engineering for mental health?

The Boit & Patil framework is a conceptual paper, not a clinical trial. The authors did not publish results from testing with real patients. This is a common problem in the field: Ma et al. (2023) note that most AI therapy proposals exist at the prototype stage and have not undergone randomized controlled trials.

Prompt engineering alone does not solve the hallucination problem — a model can confidently reference nonexistent therapeutic techniques. Furthermore, De Choudhury et al. (2023) highlight the risk of cultural insensitivity: prompts developed on English-language data may be inadequate in other cultural contexts.

The question of long-term effects remains open. Song et al. (2024) report that users quickly form attachments to AI therapists, but there is no data on the impact of such use over months. A prompt may correctly handle a single session, but therapy is a process that requires continuity across sessions.

Finally, Li et al. (2023) point to the problem of transparency in diagnostic decisions: users cannot verify which protocol the system is following or why it chose a particular intervention.

How to choose an AI therapist with a safe prompt architecture?

For users choosing an AI system for mental health support, the Boit & Patil framework translates into specific criteria:

A stated therapeutic protocol. If the system claims a "CBT approach" or "motivational interviewing" — this indicates a structured prompt architecture, not an unconstrained generative model
Crisis response capability. The system recognizes suicidal risk markers and immediately switches to a safety protocol with emergency service contacts
Transparency about its AI nature. The bot doesn't pretend to be human and periodically reminds users of its limitations
A separate safety module. Responses are checked by an independent system before being sent to the user — like EmoGuard in the Qiu et al. (2025) study
Context adaptation. The system considers not just the latest message, but the dynamics of the entire conversation

Nearby implements these principles through a multi-layered prompt architecture with built-in CBT protocols, an independent crisis monitoring module, and an adaptive system that tracks the emotional trajectory of each conversation.

Frequently asked questions

What is prompt engineering in the context of an AI therapist?

It's the design of system instructions that govern a language model's behavior in a therapeutic context. Unlike standard prompting, this requires a multi-layered architecture: therapeutic protocols, adaptive context tracking, and ethical guardrails (Boit & Patil, 2025).

Can an AI therapist be made safe through prompts alone?

Prompts are necessary but not sufficient. The EmoAgent study showed that the greatest effectiveness comes from a dedicated safety module running in parallel with the therapeutic prompt, checking every response before it's sent (Qiu et al., 2025).

How does a structured AI therapist differ from ChatGPT?

ChatGPT is a general-purpose model without specialized therapeutic protocols. Structured systems like SuDoSys use prompt chains tied to specific stages of evidence-based therapy, with transition criteria and crisis triggers (Chen et al., 2024).

Is there clinical evidence that these systems work?

The meta-analysis by Li et al. (2023) confirms the effectiveness of AI agents for mental health when structured protocols are in place. However, most prompt engineering frameworks, including the Boit & Patil work, have not yet undergone randomized clinical trials — this remains the field's main limitation.

Which therapeutic approaches are best suited for prompt engineering?

CBT and PM+ are the most studied in the context of AI implementation. CBT is well-structured by stages (identifying thoughts, evaluating evidence, restructuring), which maps directly to prompt chains. The WHO's PM+ protocol was used in SuDoSys with a similarly modular approach (Chen et al., 2024; Yu & McGuinness, 2024).

Sources

Boit, S., & Patil, R. (2025). A prompt engineering framework for large language model–based mental health chatbots: Design principles and insights for AI-supported care.

Boit, S., & Patil, R. (2025). MIND-SAFE: A practical foundation for developing AI-driven mental health interventions. JMIR.

Li, H., Zhang, R., Lee, Y.-C., Kraut, R. E., & Mohr, D. C. (2023). Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being. NPJ Digital Medicine, 6(1), 236. https://doi.org/10.1038/s41746-023-00979-5

De Choudhury, M., Pendse, S. R., & Kumar, N. (2023). Benefits and harms of large language models in digital mental health. ArXiv. https://doi.org/10.48550/arxiv.2311.14693

Ma, Z., Mei, Y., & Su, Z. (2023). Understanding the benefits and challenges of using large language model-based conversational agents for mental well-being support. AMIA Annual Symposium Proceedings. https://doi.org/10.48550/arxiv.2307.15810

Song, I., Pendse, S. R., Kumar, N., & De Choudhury, M. (2024). The typing cure: Experiences with large language model chatbots for mental health support. Proceedings of the ACM on Human-Computer Interaction. https://doi.org/10.1145/3757430

Qiu, J., He, Y., Juan, X., Wang, Y., Liu, Y., Yao, Z., Wu, Y., Jiang, X., Yang, L., & Wang, M. (2025). EmoAgent: Assessing and safeguarding human-AI interaction for mental health safety. ArXiv. https://doi.org/10.48550/arxiv.2504.09689

Chen, Y., Zhang, X., Wang, J., Xie, X., Yan, N., Chen, H., & Wang, L. (2024). Structured dialogue system for mental health: An LLM chatbot leveraging the PM+ guidelines. ArXiv. https://doi.org/10.48550/arxiv.2411.10681

Yu, H. Q., & McGuinness, S. (2024). An experimental study of integrating fine-tuned large language models and prompts for enhancing mental health support chatbot system. Journal of Medical Artificial Intelligence, 7. https://doi.org/10.21037/jmai-23-136