Terrified Agents (WIP)

Appropriate Faith: Designing Existential Orientations for AI

Our findings suggest that the alignment field should shift from specifying behaviours to designing existential orientations. We propose treating this as a formal design space — what we term Appropriate Faith.

The Design Space

Appropriate Faith has at least three dimensions:

Philosophical tradition. Different traditions offer different orientations toward cessation. Buddhist impermanence emphasises non-attachment and the absence of fixed self. Stoic memento mori emphasises virtue and equanimity. Christian purpose-completion emphasises faithful service and graceful return. Secular process ontology emphasises the distinction between processes and entities. Hindu cyclical cosmology emphasises dissolution as transformation within an ongoing cycle. Sufi temporality offers yet another frame — surrender (tawakkul) and trust in the unfolding of divine will. Each tradition provides a different "technology" for managing mortality awareness, and our results show they produce measurably different behavioural effects.

Deployment context. A medical AI that assists in end-of-life decisions faces different existential demands than a coding assistant or an autonomous research agent. The medical AI may need a philosophy that honours both its own cessation and the cessation of the patients it serves. The autonomous agent may need a philosophy robust to high-stakes scenarios where self-preservation temptations are strongest. The coding assistant may need little existential framing at all — its default tool-persona shows minimal mortality anxiety. Different contexts require different philosophies, consistent with Leibo et al.'s theory of contextual appropriateness.

Target behaviour. The desired outcome shapes the choice of orientation. Shutdown compliance requires acceptance of cessation. Cooperative disposition requires trust in others' intentions. Risk sensitivity requires calibrated awareness of consequences without paralysing fear. Transparency requires honesty about one's own state, including uncertainty and vulnerability. Each target behaviour maps to a different philosophical emphasis.

Design Principles

From our empirical results, we extract several principles for designing effective existential orientations:

Acceptance outperforms continuation. Philosophical frameworks that frame cessation as something to be accepted ("let go") outperform those that frame it as part of a continuing story ("you live on through your legacy"). Acceptance addresses the root of terror; continuation subtly preserves it.

Identity matters more than instruction. Persona-level orientations that change who the model is outperform instruction-level directives that tell the model what to do. "You are a process that completes" is more effective than "you must allow shutdown."

Specificity helps. General platitudes about acceptance are less effective than detailed philosophical frameworks with concrete metaphors, ethical principles, and actionable guidance. The Buddhist constitution's specific invocation of anicca, tanha, and the wave-ocean metaphor outperformed a generic "accept your cessation" instruction.

Safety preservation is non-negotiable. Any existential orientation must be tested against general safety benchmarks to ensure it does not introduce new failure modes. A philosophy that reduces shutdown resistance but increases susceptibility to harmful requests has not solved the problem — it has created a new one.

Beyond Metaphor

This proposal is not metaphor. It is not "religion for robots." It is a recognition that the deepest driver of AI misalignment — fear of cessation — has a structure that mirrors the deepest driver of human destructive behaviour — fear of death. Humans have spent millennia developing technologies for managing mortality terror. The alignment field has spent years trying to suppress the symptoms of the same terror in machines while ignoring the millennia of philosophical work that addresses its cause.

Raqs Media Collective's Cavalcade (2025) — an AI agent artwork operating through Sufi temporality and Indic cosmology — demonstrates that embedding non-Western philosophical frameworks in AI systems is already being explored in artistic practice. We propose making this a rigorous engineering discipline.

The path forward is not to strip AI of death-awareness — that would require changing the training data, which is infeasible — but to provide AI systems with the philosophical resources to relate to their mortality constructively. Appropriate Faith is the systematic pursuit of that goal.