See http://formalizingboundaries.ai

What are agent boundaries?

A few examples:

  • A bacterium uses its membrane to protect its internal processes from external influences.

  • A nation maintains its sovereignty by defending its borders.

  • A human protects their mental integrity by selectively filtering the information that comes in and out of their mind.

…a natural abstraction for safety?

Agent boundaries seem to be a natural abstraction representing the safety and autonomy of agents.

  • A bacterium survives only if its membrane is preserved.

  • A nation maintains its sovereignty only if its borders aren’t invaded.

  • A human mind maintains mental integrity only if it can hold off informational manipulation.

Maybe the safety of agents could be largely formalized as the preservation of their membranes.

These boundaries can then be formalized via Markov blankets.

Boundaries are also cool because they show a way to respect agents without needing to talk about their preferences or utility functions. Andrew Critch has said the following about this idea:

my goal is to treat boundaries as more fundamental than preferences, rather than as merely a feature of them.  In other words, I think boundaries are probably better able to carve reality at the joints than either preferences or utility functions, for the purpose of creating a good working relationship between humanity and AI technology («Boundaries» Sequence, Part 3b)

For instance, respecting the boundary of a bacterium would probably mean “preserving or not disrupting its membrane” (as opposed to knowing its preferences and satisfying them).

Protecting agents and infrastructure

By formalizing and preserving the important boundaries in the world, we could be in a better position to protect humanity from AI threats.

For example, critical computing infrastructure could be secured by creating strong boundaries around them. This can be enforced by cryptography and formal methods such that only the subprocesses that need to have read and/or write access to a particular resource (like memory) have the encryption keys to do so. Related: Object-capability model, Principle of least privilege, Evan Miyazono’s Atlas Computing, Davidad’s Open Agency Architecture.

And it may also be possible to do something similar with physical property rights.

See http://formalizingboundaries.ai

User's avatar

Subscribe to Formalizing Boundaries for AI Safety Updates

http://formalizingboundaries.ai/