How Gestura Works

The easiest way to understand Gestura is to think of it as an intent pipeline. You provide input through voice, chat, CLI, or gestures, Gestura normalizes that into a shared intent, resolves it inside the current session, and then returns output through the surface you are using.

The six stages users should understand

Capture: voice, text, GUI actions, CLI commands, hotkeys, and optional gesture-based triggers provide input in their native form.
Normalization: Gestura converts each modality into a shared intent shape with common action, context, and confidence metadata.
Session & context: the current session supplies history, knowledge, durable memory, approvals, and relevant state.
Resolve & trust: permissions, policy, and tool availability determine what can happen automatically and what needs confirmation.
Execute & verify: local tools, shell access, file operations, Git, web search, and MCP servers run through the same core loop, with optional verification when work becomes complex.
Respond: results come back through the GUI, CLI, logs, sounds, Haptic Harmony feedback, and optional reflection-guided improvement.

What happens when you ask for something

You ask Gestura to do something by speaking, typing, triggering a gesture, or starting a CLI action.
Gestura normalizes that input into a shared intent inside the current session.
It decides whether the task can be answered directly or whether it needs knowledge, prior memory, tools, or MCP integrations.
If the action is sensitive, Gestura asks for confirmation based on the current permission level.
It executes the task, reports progress, and may optionally use reflection or verification loops to improve weak turns before returning a response.

Where the GUI and CLI fit

The desktop app and the CLI are two interfaces to the same system. Use the desktop app when you want richer visibility, approval prompts, or live voice/chat interaction. Use the CLI when you need repeatable commands such as initialization, MCP setup, config checking, or quick listening and chat loops.

How privacy and permissions work together

Gestura is designed to keep important decisions visible. Voice processing defaults to a local provider. Tool access is controlled by global defaults and session-level overrides. Sensitive capabilities such as screenshots, screen recording, permission editing, or dynamic MCP management are off or restricted by default.

How to reason about Gestura day to day

Session: where your context lives.
Knowledge & memory: what prior expertise or durable context can be retrieved.
Permissions: what the agent is allowed to do.
Tools/MCP: what capabilities are available.
Model settings: how responses are generated.
Reflection: whether low-quality turns can self-correct and preserve lessons.
Feedback: how you know what happened and what needs attention.

← What is Gestura.app?

Installation →