The three pillars of behavioral AI

AI characters are a backbone of games today. Roughly half of games feature them in some form or another, and many genres simply wouldn't work without them (including RPGs, adventures, first-person shooters and pet games).
A scene from "Free Guy" (Disney 2021)
It's no secret that today's AI characters are disappointing to a majority of players. The field of "game AI" has largely abandoned the effort of advancing them. This has created a vacuum which is currently being filled by two exciting and complementary developments: behavioral AI and conversational AI.

Behavioral versus conversational AI

Simply put, conversational AI is about what a character says, whereas behavioral AI is about what the character does. "Talking" is a narrow, well-defined activity, while "doing" is extremely broad and requires a bit of explanation. Real animals, including humans, do things by moving parts of their bodies with their muscles. AI characters are no different, as it is through their bodies that they fully inhabit virtual worlds, move around and interact with things or each other.
Ernie is "talking" in one scene, and "doing" in both!
Behavioral AI is the science and art of making virtual characters move their bodies in a way that sustains the illusion of life for users that interact with them.
Behavioral AI enables characters to inhabit virtual worlds, move around, interact with objects, with each other and with human users, and altogether act as if they are living creatures inside a believable world.
A classic rule of thumb holds that 93% of communication is non-verbal, while only 7% is verbal. This is illustrated by the fact that words are essentially useless compared to body language to tell if someone is lying. Furthermore, humans spend less than 8% of their time conversing, whereas both humans and animals spend 100% of their time behaving! This reveals a fundamental asymmetry between behavior and conversation.
Asymmetry between behavior and conversation. Conversation is a form of behavior, not the other way around. Some AI characters talk some of the time, but all characters behave all of the time.
Believable behavior controlled by AI is therefore an absolute necessity for games and virtual worlds that aim to be immersive.

Zeroing in on behavioral AI

What is the overarching goal for a system that controls the behavior of AI characters. A common intuition says that making AI characters feel alive is about giving them the ability to make smart decisions. This intuition is understandable but deeply flawed. Intelligence is only loosely related to behavior, and decision-making happens only at discrete moments in time. Yet behavior streams are continuous, and calling them smart is often wrong or simply irrelevant.
Over more than 10 years of R&D, we have found that behavioral AI can instead be divided into three very distinct problems. Solving them well is key to making AI characters feel alive, interactive and engaging.

Problem 1: Virtual cognition

Agents that inhabit a world, whether real or virtual, need to understand it in order to live in it. This world comprises objects and other agents, and these may move or display behavior of their own.
KuteEngine addresses these dynamic changes via two connected systems.
  1. 1.
    Firstly, our proprietary navigation system supports fully dynamic spatial cognition. By running experiments involving millions of dynamic changes, we have found that our system is more than 100x faster than those built into standard game engines such as Unity and Unreal. Changing the position of a dynamic obstacle and conducting pathfinding in the updated navmesh takes typically 0.5-2 miliseconds. In virtue of this performance, it is not necessary anymore to bake the navmesh at runtime, as it is constructed rapidly at runtime and automatically kept up-to-date.
  2. 2.
    Furthermore, we have built a feature-rich interaction system which tracks dynamic objects and their properties, making it easy for agents to discover them through sight or sound, to interact with them and even to change them. Interactivity is facilitated via object affordances that tell agents how an object can be dealt with. This may include grabbing, nudging, activation, ingestion or any other conceivable form of action and response.

Problem 2: Behavior synthesis

When we say that agents 'behave', we mean for the most part that they use their muscles in order to make their various body parts move. As we mentioned earlier, these movements need to obey the twelve laws of behavior, which makes them incredibly complicated. For stylized or cartoonish characters they also need to respect Disney's twelve principles of animation.
The process of moving the body parts of an AI character in order to generate believable behavior is called behavior synthesis. Why 'synthesis'? Because the end result is always a single, unified behavior stream.
As an example, check out Tropico, our little bird of paradise in the GIF above. He is a very cuddly fellow, but he doesn't like being touched on the tail. When the player teases him there anyway (using virtual hands on a VR headset), Tropico reacts not only with his entire body (hopping around to face the offender), but also adjusts his gaze and emotes with his beak. Perhaps less obviously, there are additional adjustments happening (breathing, posture change, tail movement, and so on).
All these movements and adjustments occur on individual timelines, because smaller body parts (such as the eyes or mouth) generally respond more quickly than larger ones, and more central ones faster than those further away from the center of mass (tail, crest, hands, ...).
KuteEngine uses a patent-pending architecture called Behavior Composition™ to perform behavior synthesis. They key advantage of Behavior Composition is that it supports all 24 principles from above while letting behavior designers stay in charge. Designers decide exactly how much control they want over interactive behaviors versus how much they want to delegate to the various subsystems of KuteEngine in order to make their lives easier. This encourages a high-level, top-down oriented design approach where unique forms of behavior and interactivity can easily be assembled like LEGO® bricks.

Problem 3: Procedural animation

The last of the three problems is probably the most straightforward one. Once agents have understood the dynamic environment that they're currently in and figured out how to use their various body parts in order to act in it, these action intentions need to be translated into actual movements of their limbs. This is the problem of procedural animation.
Procedural animation is a strange problem in that it is simultaneously easy and hard. The easy part is making the virtual skeleton of an AI character assume a pose in order to interact with the world, for example keeping Tropico's claws attached to a virtual hand while he tries to maintain the position of his head in world-space like a real birb.
This is the challenge of inverse kinematics (IK), and there exist well-known algorithms such as CCD or FABRIK to solve it with arbitrary precision. The problem with these off-the-rack solutions is that they are iterative, hence slow. At VIRTUAL BEINGS, our goal has always been to create AI characters with a minimal performance footprint, so that developers can use many of them at the same time directly on-device (including on low-end devices such as mobile phones or XR headsets).
For this reason, KuteEngine uses a different, relatively complicated but extremely performant approach called one-shot IK. This approach offers a non-iterative solution for fast full-body IK (FFBIK). Without sacrificing quality, the pose can be computed roughly 10 times faster than a typical solution cycle using CCD or FABRIK. In practice, this means that it only takes KuteEngine about 120-150 microseconds to solve the entire skeleton of a human or a quadruped on a single core of a standard laptop.
A second aspect that makes procedural animation challenging is time. IK can only compute poses, but movements occur along timelines, and they are often emotionally expressive. For example, IK can tell a virtual hand to grip a virtual door handle, but it cannot by itself decide the path or the speed at which the hand advances. The hand's movement can express a multitude of emotions from hesitation to aggression.
KuteEngine supports such expressive movements via tight integration between procedural animation, authored animation and behavior synthesis. Authored animation can be handcrafted by animators, motion-captured or result from other systems, such as synthesis via machine learning (ML).
All in all, this illustrates that behavioral AI is about addressing disparate challenges in a tightly integrated technology stack that controls the entire lifecycle of an AI character.