How AI Detects Suicide Risk in Text — and Where the Method's Limits Lie
Psychiatrists have long known an uncomfortable truth: traditional suicide risk scales perform only marginally better than chance. A meta-analysis of 365 studies across 50 years (Franklin et al., 2017) found that the predictive power of classical risk factors sits near AUC 0.58 — nearly useless for real clinical decisions. That failure is precisely what pushed researchers toward machine learning and natural language processing.
What the algorithm sees in text
Suicidal thoughts leave traces not so much in the words "I want to die" as in the structure of speech. Studies by John Pestian's group at Cincinnati Children's Hospital showed that models trained on interview transcripts distinguish suicidal from non-suicidal adolescents with roughly 85% accuracy — not by relying on direct statements, but on patterns: reduced cognitive complexity, a rise in absolutist phrasing ("always," "never"), a narrowing time horizon, a shift of pronouns toward "I" combined with emotional dissociation.
Al-Mosaiwi and Johnstone (2018) analyzed over 6,400 posts on English-language forums and found that the share of absolutist words in depression and anxiety communities was 50% higher than in controls — and 80% higher in communities focused on suicidal ideation. This is the kind of signal hard to catch by ear, but easy to measure statistically.
How it works at scale
Walsh, Ribeiro, and Franklin (2017) trained a model on the electronic health records of 5,167 patients and achieved AUC 0.84 for predicting a suicide attempt within the next 7 days — far above any clinical scale. Similar results come from social-media data: the annual CLPsych shared tasks use Reddit posts (the SuicideWatch subreddit) as a labeled corpus, with the best systems reaching F1 scores of 0.55–0.60 on risk-level classification.
Since 2017, Facebook has deployed a system that detects suicidal signals in posts and live streams; by the company's own reporting, it triggered more than 3,500 wellness checks in its first year. Instagram and TikTok have rolled out similar algorithms. In 2023, JAMA Psychiatry published a systematic review of 54 ML studies: the mean AUC was 0.81, making NLP the most accurate known method for short-horizon prediction.
Where the method breaks down
High accuracy is only half the story. The base rate of suicide attempts is so low that even a model with 90% sensitivity and 90% specificity will produce dozens of false positives for every true case in the population. This isn't a flaw of the algorithm — it's the math of rare events.
From this flow practical problems. First, stigma: a false "high risk" label in a health record can affect insurance, employment, parental rights. Second, cultural blind spots: nearly all training corpora come from English-speaking patients in the US and UK, and models transfer poorly to other languages and cultural idioms of distress. Third, distribution shift: patterns change over time, and a model trained in 2019 may be outdated by 2024.
There is also a deeper question: even a perfect detector doesn't decide what to do with the signal. Dispatch emergency services without consent? Show a banner with a helpline number? Notify a loved one? Each choice carries its own ethical cost, and research on which interventions actually reduce risk after detection is still scarce.
What this means for the product
When an app like Nearby works with someone in a vulnerable state, risk detection isn't a feature you can switch on and forget. It's an obligation: to listen more carefully, respond more cautiously, acknowledge the limits of your own competence, and hand the person off to specialists when the signals cross a certain threshold. A good AI companion doesn't compete with a crisis line — it helps someone reach one in time.
The technology can catch what escapes the person themselves. But what to do with what's caught — that remains a decision in which a human must take part.