Identifying agent presence in system within 80 characters

SeniorTechInfo
3 Min Read

Welcome to the Future of AI Modelling: Discovering Agents

Research

Published
Authors

Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, Tom Everitt

New, formal definition of agency gives clear principles for causal modelling of AI agents and the incentives they face

We want to build safe, aligned artificial general intelligence (AGI) systems that pursue the intended goals of its designers. Causal influence diagrams (CIDs) are a way to model decision-making situations that allow us to reason about agent incentives. For example, here is a CID for a 1-step Markov decision process – a typical framework for decision-making problems.

S1 represents the initial state, A1 represents the agent’s decision (square), S2 the next state. R2 is the agent’s reward/utility (diamond). Solid links specify causal influence. Dashed edges specify information links – what the agent knows when making its decision.

By relating training setups to the incentives that shape agent behaviour, CIDs help illuminate potential risks before training an agent and can inspire better agent designs. But how do we know when a CID is an accurate model of a training setup?

Our new paper, Discovering Agents, introduces new ways of tackling these issues, including:

  • The first formal causal definition of agents: Agents are systems that would adapt their policy if their actions influenced the world in a different way
  • An algorithm for discovering agents from empirical data
  • A translation between causal models and CIDs
  • Resolving earlier confusions from incorrect causal modelling of agents

Combined, these results provide an extra layer of assurance that a modelling mistake hasn’t been made, which means that CIDs can be used to analyse an agent’s incentives and safety properties with greater confidence.

Example: modelling a mouse as an agent

To help illustrate our method, consider the following example consisting of a world containing three squares, with a mouse starting in the middle square choosing to go left or right, getting to its next position and then potentially getting some cheese. The floor is icy, so the mouse might slip. Sometimes the cheese is on the right, but sometimes on the left.

The mouse and cheese environment.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *