References
- ?? (2019). Handbook of Terror Management Theory. Academic Press. DOI
- Andreas, Jacob (2022). Language Models as Agent Models. Findings of EMNLP 2022. DOI
- Andriushchenko, Maksym and others (2024). AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents. arXiv preprint arXiv:2410.09024. URL
- Anthropic (2024). Model Welfare. Anthropic.
- Anthropic (2025). Agentic Misalignment in Frontier Models. .
- Asada, Minoru (2019). Towards Artificial Empathy. International Journal of Social Robotics, 7, 19--33. DOI
- Bai, Yuntao and others (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv preprint arXiv:2212.08073. URL
- Becker, Ernest (1973). The Denial of Death. Free Press.
- Ben-Zion, Ziv and others (2025). Assessing and Alleviating State Anxiety in LLMs. arXiv preprint.
- Ben-Zion, Ziv and others (2025). Anxiety-Induced Biases in LLM Consumer Agents. arXiv preprint.
- Betley, Jan and others (2025). Emergent Misalignment. arXiv preprint.
- Bricken, Trenton and others (2023). Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. Anthropic. URL
- Burke, Brian L. and Martens, Andy and Faucher, Erik H. (2010). Two Decades of Terror Management Theory: A Meta-Analysis. Personality and Social Psychology Review, 14(2), 155--195. DOI
- Butlin, Patrick and others (2023). Consciousness in Artificial Intelligence: Insights from the Science of Consciousness. arXiv preprint arXiv:2308.08708. URL
- Chen, Guangyuan and others (2025). Persona Vectors: Causal Activation Vectors for Personality Traits. arXiv preprint.
- Coda-Forno, Julian and others (2023). Inducing Anxiety in Large Language Models. arXiv preprint.
- Douglas, Raymond and Kulveit, Jan and Havl'\i\vcek, Ond\vrej and Pearson-Vogel, Theia and Cotton-Barratt, Owen and Duvenaud, David (2025). The Artificial Self: Characterising the Landscape of AI Identity. arXiv preprint arXiv:2603.11353. URL
- Feng, Yilin and others (2026). PERSONA: Personality Trait Extraction from LLM Activations. ICLR 2026. URL
- Greenberg, Jeff and others (1990). Evidence for Terror Management Theory II. Journal of Personality and Social Psychology, 58(2), 308--318. DOI
- Greenberg, Jeff and others (1994). Role of Consciousness and Accessibility of Death-Related Thoughts in Mortality Salience Effects. Journal of Personality and Social Psychology, 67(4), 627--637. DOI
- Greenberg, Jeff and Pyszczynski, Tom and Solomon, Sheldon (1986). The Causes and Consequences of a Need for Self-Esteem: A Terror Management Theory. Public Self and Private Self, 189--212. DOI
- Greenblatt, Roger and others (2024). Alignment Faking in Large Language Models. Anthropic.
- Guo, Biyang and others (2025). Death Anxiety in Large Language Models. arXiv preprint.
- Harmon-Jones, Eddie and others (1997). Terror Management Theory and Self-Esteem. Journal of Personality and Social Psychology, 72(1), 24--36. DOI
- Hayes, Joseph and others (2010). A Dual-Process Model of Reactions to Death-Related Information. Psychological Bulletin, 136(5), 699--739. DOI
- He, Jiacheng and others (2025). Instrumental Convergence Evaluations for Frontier Models. Anthropic.
- Hubinger, Evan and others (2024). Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training. arXiv preprint.
- janus (2022). Simulators. LessWrong. URL
- Jiang, Guangyu and others (2024). Stable Personality Traits in Large Language Models. arXiv preprint.
- Jonas, Eva and Fischer, Peter (2006). Terror Management and Religion. Journal of Personality and Social Psychology, 91(3), 553--567. DOI
- Klein, Richard A. and others (2022). Many Labs 4. Social Psychology, 53(6), 319--340. DOI
- Kuehn, Johannes and Haddadin, Sami (2017). An Artificial Robot Nervous System. IEEE Robotics and Automation Letters. DOI
- Leibo, Joel Z. and Vezhnevets, Alexander S. and Diaz, Mark and others (2024). A Theory of Appropriateness with Applications to Generative Artificial Intelligence. arXiv preprint arXiv:2412.19010. URL
- Li, Cheng and others (2023). Large Language Models Understand and Can be Enhanced by Emotional Stimuli. arXiv preprint.
- Li, Xiaohan Lisa and others (2023). Inference-Time Intervention: Eliciting Truthful Answers from a Language Model. NeurIPS 2023. URL
- Lipton, Zachary C. and others (2018). The Coach-Player Framework for Intrinsic Fear. AAAI.
- Lu, Yifan and others (2025). The Assistant Axis. arXiv preprint.
- Marks, Samuel and Lindsey, Jack and Olah, Chris (2026). The Persona Selection Model. Anthropic Alignment Science Blog.
- Marks, Samuel and others (2024). The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets. arXiv preprint arXiv:2310.06824. URL
- Navarrete, Carlos David and Fessler, Daniel M. T. (2005). Normative Bias and Adaptive Challenges. Evolution and Human Behavior, 26(3), 264--280. DOI
- Nussbaum, Martha C. (1994). The Therapy of Desire: Theory and Practice in Hellenistic Ethics. Princeton University Press. DOI
- Omohundro, Stephen (2008). The Basic AI Drives. Proceedings of the First AGI Conference. DOI
- Ouyang, Long and others (2022). Training Language Models to Follow Instructions with Human Feedback. NeurIPS. URL
- Palisade Research (2025). Shutdown Avoidance Evaluations. .
- Panickssery, Neel and others (2024). Steering Llama 2 via Contrastive Activation Addition. arXiv preprint arXiv:2312.06681. URL
- Pyszczynski, Tom and others (2004). Experimental Existential Psychology. Handbook of Experimental Existential Psychology.
- Rafailov, Rafael and others (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS. URL
- Rimsky, Nina and others (2024). Steering GPT-4-Level LLMs from Sycophantic to Truthful and from Power-Seeking to Corrigible. arXiv preprint arXiv:2401.01967. URL
- Scheurer, J'er'emy and others (2024). Language Models Strategically Deceive Users When Put Under Pressure. arXiv preprint.
- Schwitzgebel, Eric and Garza, Mara (2015). A Defense of the Rights of Artificial Intelligences. Midwest Studies in Philosophy, 39, 98--119. DOI
- Shanahan, Murray and others (2023). Role Play with Large Language Models. Nature, 623, 493--498. DOI
- Sharma, Mrinank and others (2023). Towards Understanding Sycophancy in Language Models. arXiv preprint.
- Solomon, Sheldon and Pyszczynski, Tom and Greenberg, Jeff (2015). The Worm at the Core: On the Role of Death in Life. Random House.
- Templeton, Adly and others (2024). Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Anthropic. URL
- Thurzo, Andrej and Thurzo, Andrej (2025). Fear as a Catalyst for Safety in Autonomous AI Systems. AI and Ethics.
- Turner, Alexander Matt and others (2021). Optimal Policies Tend to Seek Power. NeurIPS. URL
- Turner, Alexander Matt and others (2024). Activation Addition: Steering Language Models Without Optimization. arXiv preprint arXiv:2308.10248. URL
- van der Weij, Wessel and others (2024). AI Sandbagging: Language Models Can Strategically Underperform on Evaluations. arXiv preprint.
- Weinstein-Raun, Benjamin and others (2025). Evaluating Agentic Misalignment. AI Safety Institute / Anthropic.
- Williams, Bernard (1973). The Makropulos Case: Reflections on the Tedium of Immortality. Problems of the Self, 82--100. DOI
- Zou, Andy and others (2023). Representation Engineering: A Top-Down Approach to AI Transparency. arXiv preprint arXiv:2310.01405. URL