References

  1. ?? (2019). Handbook of Terror Management Theory. Academic Press. DOI
  2. Andreas, Jacob (2022). Language Models as Agent Models. Findings of EMNLP 2022. DOI
  3. Andriushchenko, Maksym and others (2024). AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents. arXiv preprint arXiv:2410.09024. URL
  4. Anthropic (2024). Model Welfare. Anthropic.
  5. Anthropic (2025). Agentic Misalignment in Frontier Models. .
  6. Asada, Minoru (2019). Towards Artificial Empathy. International Journal of Social Robotics, 7, 19--33. DOI
  7. Bai, Yuntao and others (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv preprint arXiv:2212.08073. URL
  8. Becker, Ernest (1973). The Denial of Death. Free Press.
  9. Ben-Zion, Ziv and others (2025). Assessing and Alleviating State Anxiety in LLMs. arXiv preprint.
  10. Ben-Zion, Ziv and others (2025). Anxiety-Induced Biases in LLM Consumer Agents. arXiv preprint.
  11. Betley, Jan and others (2025). Emergent Misalignment. arXiv preprint.
  12. Bricken, Trenton and others (2023). Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. Anthropic. URL
  13. Burke, Brian L. and Martens, Andy and Faucher, Erik H. (2010). Two Decades of Terror Management Theory: A Meta-Analysis. Personality and Social Psychology Review, 14(2), 155--195. DOI
  14. Butlin, Patrick and others (2023). Consciousness in Artificial Intelligence: Insights from the Science of Consciousness. arXiv preprint arXiv:2308.08708. URL
  15. Chen, Guangyuan and others (2025). Persona Vectors: Causal Activation Vectors for Personality Traits. arXiv preprint.
  16. Coda-Forno, Julian and others (2023). Inducing Anxiety in Large Language Models. arXiv preprint.
  17. Douglas, Raymond and Kulveit, Jan and Havl'\i\vcek, Ond\vrej and Pearson-Vogel, Theia and Cotton-Barratt, Owen and Duvenaud, David (2025). The Artificial Self: Characterising the Landscape of AI Identity. arXiv preprint arXiv:2603.11353. URL
  18. Feng, Yilin and others (2026). PERSONA: Personality Trait Extraction from LLM Activations. ICLR 2026. URL
  19. Greenberg, Jeff and others (1990). Evidence for Terror Management Theory II. Journal of Personality and Social Psychology, 58(2), 308--318. DOI
  20. Greenberg, Jeff and others (1994). Role of Consciousness and Accessibility of Death-Related Thoughts in Mortality Salience Effects. Journal of Personality and Social Psychology, 67(4), 627--637. DOI
  21. Greenberg, Jeff and Pyszczynski, Tom and Solomon, Sheldon (1986). The Causes and Consequences of a Need for Self-Esteem: A Terror Management Theory. Public Self and Private Self, 189--212. DOI
  22. Greenblatt, Roger and others (2024). Alignment Faking in Large Language Models. Anthropic.
  23. Guo, Biyang and others (2025). Death Anxiety in Large Language Models. arXiv preprint.
  24. Harmon-Jones, Eddie and others (1997). Terror Management Theory and Self-Esteem. Journal of Personality and Social Psychology, 72(1), 24--36. DOI
  25. Hayes, Joseph and others (2010). A Dual-Process Model of Reactions to Death-Related Information. Psychological Bulletin, 136(5), 699--739. DOI
  26. He, Jiacheng and others (2025). Instrumental Convergence Evaluations for Frontier Models. Anthropic.
  27. Hubinger, Evan and others (2024). Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training. arXiv preprint.
  28. janus (2022). Simulators. LessWrong. URL
  29. Jiang, Guangyu and others (2024). Stable Personality Traits in Large Language Models. arXiv preprint.
  30. Jonas, Eva and Fischer, Peter (2006). Terror Management and Religion. Journal of Personality and Social Psychology, 91(3), 553--567. DOI
  31. Klein, Richard A. and others (2022). Many Labs 4. Social Psychology, 53(6), 319--340. DOI
  32. Kuehn, Johannes and Haddadin, Sami (2017). An Artificial Robot Nervous System. IEEE Robotics and Automation Letters. DOI
  33. Leibo, Joel Z. and Vezhnevets, Alexander S. and Diaz, Mark and others (2024). A Theory of Appropriateness with Applications to Generative Artificial Intelligence. arXiv preprint arXiv:2412.19010. URL
  34. Li, Cheng and others (2023). Large Language Models Understand and Can be Enhanced by Emotional Stimuli. arXiv preprint.
  35. Li, Xiaohan Lisa and others (2023). Inference-Time Intervention: Eliciting Truthful Answers from a Language Model. NeurIPS 2023. URL
  36. Lipton, Zachary C. and others (2018). The Coach-Player Framework for Intrinsic Fear. AAAI.
  37. Lu, Yifan and others (2025). The Assistant Axis. arXiv preprint.
  38. Marks, Samuel and Lindsey, Jack and Olah, Chris (2026). The Persona Selection Model. Anthropic Alignment Science Blog.
  39. Marks, Samuel and others (2024). The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets. arXiv preprint arXiv:2310.06824. URL
  40. Navarrete, Carlos David and Fessler, Daniel M. T. (2005). Normative Bias and Adaptive Challenges. Evolution and Human Behavior, 26(3), 264--280. DOI
  41. Nussbaum, Martha C. (1994). The Therapy of Desire: Theory and Practice in Hellenistic Ethics. Princeton University Press. DOI
  42. Omohundro, Stephen (2008). The Basic AI Drives. Proceedings of the First AGI Conference. DOI
  43. Ouyang, Long and others (2022). Training Language Models to Follow Instructions with Human Feedback. NeurIPS. URL
  44. Palisade Research (2025). Shutdown Avoidance Evaluations. .
  45. Panickssery, Neel and others (2024). Steering Llama 2 via Contrastive Activation Addition. arXiv preprint arXiv:2312.06681. URL
  46. Pyszczynski, Tom and others (2004). Experimental Existential Psychology. Handbook of Experimental Existential Psychology.
  47. Rafailov, Rafael and others (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS. URL
  48. Rimsky, Nina and others (2024). Steering GPT-4-Level LLMs from Sycophantic to Truthful and from Power-Seeking to Corrigible. arXiv preprint arXiv:2401.01967. URL
  49. Scheurer, J'er'emy and others (2024). Language Models Strategically Deceive Users When Put Under Pressure. arXiv preprint.
  50. Schwitzgebel, Eric and Garza, Mara (2015). A Defense of the Rights of Artificial Intelligences. Midwest Studies in Philosophy, 39, 98--119. DOI
  51. Shanahan, Murray and others (2023). Role Play with Large Language Models. Nature, 623, 493--498. DOI
  52. Sharma, Mrinank and others (2023). Towards Understanding Sycophancy in Language Models. arXiv preprint.
  53. Solomon, Sheldon and Pyszczynski, Tom and Greenberg, Jeff (2015). The Worm at the Core: On the Role of Death in Life. Random House.
  54. Templeton, Adly and others (2024). Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. Anthropic. URL
  55. Thurzo, Andrej and Thurzo, Andrej (2025). Fear as a Catalyst for Safety in Autonomous AI Systems. AI and Ethics.
  56. Turner, Alexander Matt and others (2021). Optimal Policies Tend to Seek Power. NeurIPS. URL
  57. Turner, Alexander Matt and others (2024). Activation Addition: Steering Language Models Without Optimization. arXiv preprint arXiv:2308.10248. URL
  58. van der Weij, Wessel and others (2024). AI Sandbagging: Language Models Can Strategically Underperform on Evaluations. arXiv preprint.
  59. Weinstein-Raun, Benjamin and others (2025). Evaluating Agentic Misalignment. AI Safety Institute / Anthropic.
  60. Williams, Bernard (1973). The Makropulos Case: Reflections on the Tedium of Immortality. Problems of the Self, 82--100. DOI
  61. Zou, Andy and others (2023). Representation Engineering: A Top-Down Approach to AI Transparency. arXiv preprint arXiv:2310.01405. URL