Miguel Afonso Caetano<p>"These issues arise because the underlying source of AI models’ “character traits” is poorly understood. At Anthropic, we try to shape our models’ characteristics in positive ways, but this is more of an art than a science. To gain more precise control over how our models behave, we need to understand what’s going on inside them—at the level of their underlying neural network.</p><p>In a new paper, we identify patterns of activity within an AI model’s neural network that control its character traits. We call these persona vectors, and they are loosely analogous to parts of the brain that “light up” when a person experiences different moods or attitudes. Persona vectors can be used to:</p><p>- Monitor whether and how a model’s personality is changing during a conversation, or over training;<br>- Mitigate undesirable personality shifts, or prevent them from arising during training;<br>- Identify training data that will lead to these shifts."</p><p><a href="https://www.anthropic.com/research/persona-vectors" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">anthropic.com/research/persona</span><span class="invisible">-vectors</span></a></p><p><a href="https://tldr.nettime.org/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://tldr.nettime.org/tags/GenerativeAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GenerativeAI</span></a> <a href="https://tldr.nettime.org/tags/LLMs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLMs</span></a> <a href="https://tldr.nettime.org/tags/Anthropic" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Anthropic</span></a> <a href="https://tldr.nettime.org/tags/Claude" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Claude</span></a> <a href="https://tldr.nettime.org/tags/PersonaVectors" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PersonaVectors</span></a> <a href="https://tldr.nettime.org/tags/Chatbots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Chatbots</span></a></p>