Skip to main content

Teach by recording

Teaching by recording is Agivar's most distinctive ability: demonstrate a workflow once, and the AI can learn it, remember it, and follow it later.

A picture beats a thousand words; a demo beats a thousand pictures. Instead of laboriously describing "click here, then click there", record a screen demo — how the mouse moves, what the keyboard typed, how the screen changed, all captured precisely. The AI doesn't have to guess.

What problem it solves

Teaching a GUI workflow with text usually gets stuck in two places:

  1. Can't write it precisely — "click that submit button" — which one? what color? in which area? Text easily leaves ambiguity.
  2. Can't write it completely — small steps the demonstrator takes for granted (click the field first to activate it, scroll to the bottom of the page…) often get left out.

A recording solves both at once:

  • A precise action sequence — while recording, mouse clicks/drags/scrolls, keyboard input, and hotkeys are captured in sync, with a timeline. Nothing is missed.
  • The matching frames — after recording, the system extracts keyframes automatically: the frame just before every action that visibly changes the screen (click, double-click, drag, pressing Enter to submit…) is kept, plus periodic samples during quiet stretches. The AI can both "see" what the screen looked like before an action and compare what it became after.
  • Your spoken explanations — press Alt+I anytime during recording to drop in a text note (see below); these are merged into the recording's timeline.
  • Distilled into reusable know-how — the recording is processed into a structured "operation description" (Overview / Initial state / Step-by-step / a "stage result" after each step), which goes straight into the memory base; the entry also records the ID and key clips of the "demo recording", so Task Mode can later ask targeted questions about details in it.

The full flow

Step 1: Start recording

  1. Start a new conversation and turn on Teach Mode (toggle Teach in the menu next to +).
  2. Click + in the input bar → Record Screen.
  3. A floating recorder control bar appears on screen — semi-transparent, draggable, always on top, with a timer, Stop, and ✕ Cancel.
The control bar isn't recorded

This control bar uses Windows' "exclude from screen capture" feature — the captured footage contains whatever is behind it, so it won't appear in the final recording. No need to drag it into a corner.

📷 Screenshot

Place a screenshot of the "floating recorder control bar" here (static/img/recorder-bar.png).

Step 2: Demonstrate (press Alt+I to explain while recording)

Just go do the operations you want to demonstrate. Recommendations:

  • Slow down a bit — leave a small pause between steps so keyframes can reliably catch the pre-action screen.
  • Demonstrate one complete workflow at a time — from a clear starting point (e.g. "browser is open") to a clear endpoint ("coins given successfully"). Don't record several unrelated things together.

When you want to add a text note for a step, press Alt+I. A small explain dialog pops up at the bottom-center of the screen:

  • Type your note, e.g. "wait for the page to fully load before clicking here, otherwise the button won't respond".
  • Enter sends, Shift+Enter inserts a newline, Esc cancels.
  • The note is merged into the recording's action timeline at the moment it happened.
  • While the explain dialog is open, what you type into it is not recorded as an action (the triggering Alt+I keystroke, and the trailing Alt release afterward, are filtered out too).

When is Alt+I worth it? — for anything "you can't tell from the footage": why you waited, why you picked this and not that, what the step is for, what the gotcha is.

📷 Screenshot

Place a screenshot of the "Alt+I explain dialog" here (static/img/explain-dialog.png).

Step 3: Stop recording

Click Stop on the control bar. (Clicking ✕ Cancel discards this recording — nothing is kept.)

After stopping, the system packages what you recorded and starts background processing — this step needs the network, and how long it takes depends on the recording length, typically tens of seconds to a few minutes. During processing:

  • you can keep typing in the input bar and add other attachments;
  • but the Send button is temporarily disabled until processing finishes, so you can't send the message (with the recording) yet;
  • the recording attachment shows elapsed time / estimated time remaining.
📷 Screenshot

Place a screenshot of the "recording attachment card while processing" here (static/img/recording-processing.png).

Step 4: Preview (optional)

After processing (or before), open the recording preview dialog to play back what you just recorded and confirm the demo is complete:

  • ▶ Start / ⏸ Stop / ⟲ Replay.
  • Not happy with it? Delete the attachment and re-record.
📷 Screenshot

Place a screenshot of the "recording preview dialog" here (static/img/recording-preview.png).

Step 5: Send it — over to the Teach Agent

Once processing is done and the Send button is back, send the message (you can also type a few notes alongside it). When the Teach Agent receives it:

  • it first sees a summary of the recording and the structured operation description — so it knows the whole workflow without any extra lookups;
  • if it needs more detail from inside the recording, it can ask targeted questions about the recording content (internally via a dedicated "video Q&A" capability, e.g. "what does the popup in that frame say");
  • when something is incomplete or ambiguous, it asks you to confirm.

Step 6: Distilling into a memory entry

The Teach Agent organizes the workflow into the memory base (filed as platform/topic). The entry looks roughly like:

# Overview
This demonstrates the user opening the "非十科技" video on Bilibili and giving it two coins.

# Initial state
The browser is open.

# Steps

## Step 1: Open a new tab and navigate to Bilibili
1. Click the "+" button to the right of the browser tab bar to open a new tab.
2. Click the address bar at the top of the browser to activate input.
3. Type bilibili.com in the address bar.
4. Press Enter to navigate.
Stage result: successfully reached the bilibili.com homepage, showing the recommended video list and the top search bar.

## Step 2: …

Stage result: …

# Related demo recording
Recording ID: 00046; summary: …; key clips: …

A few details to note:

  • Steps are split into several abstract steps, each with its concrete operations underneath, and each ends with a "stage result" — when Task Mode follows along it can use this to judge whether the current stage is complete.
  • On-screen targets are written unambiguously — "click the blue 'Confirm' button at the bottom-right of the popup", "click the magnifier icon to the right of the search box", rather than just "click confirm".
  • The entry records the demo recording's ID at the end, so later, when Task Mode runs a related task, it can pull up this recording and ask about details.

What actually happens during processing

For your understanding, "background processing" is roughly four stages (you don't do anything):

  1. Upload keyframes — the keyframes picked from the recording are uploaded to the cloud.
  2. Annotate frame by frame — the AI looks at these keyframes, combines them with the mouse/keyboard action timeline, and produces a detailed description with frame references.
  3. Generate a summary — condenses the detailed description into a short video summary.
  4. Generate the operation description — produces, in the "Overview / Initial state / Step-by-step / stage result" structure, the "operation description" that can go straight into a memory entry.

The recording's raw frames and mouse/keyboard events stay locally under ~/.agivar/; the processed result (keyframes, summary, operation description) is stored in the cloud, tied to your account.

Best practices for teaching by recording

  • One workflow per recording, with clear start and end.
  • Slow down, leave a pause between steps.
  • Press Alt+I at key moments to explain the "why" — that's the information the recording can't give and only you can.
  • Set up the environment before demonstrating (log in / open what's needed beforehand), and state it clearly in the "Initial state".
  • Preview once before sending.
  • ⚠️ Don't record sensitive screens like CAPTCHAs or passwords — when a step like that is truly needed, pause there, use Alt+I to say "user logs in here", and skip the actual input.
  • ⚠️ With multiple monitors, the primary monitor is recorded — do the demo on the primary screen.

Next

  • The memory base — see what a recording-taught workflow ends up stored as
  • Task Mode — how this know-how gets reused when running tasks
  • FAQ