How Virtual Run Works
Virtual Run is the feature that makes the Glass Box principle real. It’s the system that lets you preview exactly what a workflow will do — with real data, realistic generated content, and accurate cost estimates — before anything actually executes.
I touched on Virtual Run briefly in the launch post. Here I want to go deeper into how it actually works under the hood — the execution model, the content preview system, and the cost estimation in detail.
The core idea: read live, simulate writes
When you click Preview on a workflow, Virtual Run walks through every step in order, but it treats different types of operations differently.
READ operations execute live. If your workflow fetches emails from Gmail, pulls data from a Google Sheet, or reads issues from Linear, Virtual Run makes those actual API calls. You see real data — your actual emails, your actual spreadsheet values, your actual issues. This matters because it means the rest of the simulation operates on genuine inputs, not placeholder data.
WRITE operations get simulated. If your workflow sends a Slack message, posts a tweet, creates a HubSpot deal, or sends an email, Virtual Run intercepts the action before it reaches the external service. Instead of executing it, the system generates a realistic preview of what would be sent. For content-heavy steps — emails, social posts, Slack messages, Notion pages — an LLM generates the actual text or message based on the workflow’s instructions and the real data flowing in from earlier steps.
TRANSFORM operations execute live. LLM calls that analyse, summarise, or classify data run for real during the preview. If your workflow uses AI to score a lead based on enrichment data, the scoring happens during the preview with the actual model and the actual enrichment results. You see the real score, not a simulated one.
This three-way split means the preview is as close to a real run as possible without causing any side effects. You see real inputs flowing through real processing into previewed outputs.
What happens when you click Preview
Here’s the sequence, step by step.
1. Connection readiness check. Before simulation begins, Virtual Run scans the workflow for every service it needs — Gmail, Slack, Google Sheets, whatever — and checks whether you have an active OAuth connection for each one. If Twitter needs to be connected, you’ll see that immediately, with a direct link to set up the connection. No more discovering a missing connection mid-run.
The system identifies required services automatically by inspecting each step’s action type. If a step calls SLACK_SEND_MESSAGE, it knows Slack is required. If a step calls GMAIL_FETCH_EMAILS, it knows Gmail is required. You get a clear count: “4 of 5 connections ready” with the missing one highlighted.
2. Dependency resolution. The runner analyses the workflow structure to determine execution order. Steps with no dependencies can run in parallel. Steps that depend on outputs from earlier steps wait for those outputs. Conditional branches are evaluated to determine which path to take. This is the same ordering logic the real execution engine uses — the preview follows the same path the actual run would.
3. Step-by-step simulation. For each step, in execution order:
- The runner resolves any variable references. If a step uses
{{step_2.output.email_subject}}, the runner substitutes the actual value from step 2’s result. - It classifies the operation as READ, WRITE, or TRANSFORM.
- READ and TRANSFORM operations execute against real APIs and models.
- WRITE operations are intercepted. For content generation steps (emails, social posts, messages), the system’s ContentGenerator creates realistic preview content using an LLM that considers the platform, the audience, and the data flowing into the step.
- The step result — whether live or simulated — is recorded in the execution manifest and made available to subsequent steps as input.
4. Cost calculation. Throughout the simulation, a CostManager tracks every resource consumed. LLM calls are tracked via OpenRouter’s exact usage cost. Service calls — Composio actions, E2B sandbox time, web searches, image generation — are tracked at their known rates. The total is presented as both our underlying cost and the user price (which is a transparent 2x markup on what we pay).
5. Manifest assembly. When all steps are complete, the results are compiled into an ExecutionManifest — a structured record of everything that happened. This includes the outcome of every step (success, would-fail, skipped, or conditional-branch), all generated content, all costs, all warnings, and all connection statuses.
The content preview system
For write operations, the quality of the preview matters. If the preview shows you a vague “would send an email,” that’s not very useful. You need to see the actual email.
Virtual Run uses a ContentGenerator that produces realistic content for each platform. When your workflow would send a Gmail, the preview generates the full email — subject line, body, formatting — based on the workflow’s instructions and the data available at that step. When it would post a tweet, you see the tweet text, including character count.
The generator knows platform-specific conventions. It uses different tones for Slack (casual, emoji-friendly) versus LinkedIn (professional, longer-form) versus Twitter (concise, punchy). It respects character limits and formatting constraints.
For image generation steps, the preview actually generates the image. If your workflow creates a social media post with an accompanying image, the Virtual Run produces the image so you can see exactly what would be published. The system automatically selects the right aspect ratio for the target platform — 16:9 for Twitter, 1:1 for Instagram, 1.91:1 for LinkedIn.
Cost estimation in practice
The cost preview is one of the most asked-about features, so let me explain how it works in some detail.
During a Virtual Run, every operation that costs money gets tracked:
- LLM calls: Tracked using OpenRouter’s reported
usage.cost, which reflects the actual token consumption and the model’s current pricing. No estimates — we use the real cost from the provider. - Integration actions: Each Composio action is tracked at $0.002 per call.
- Image generation: Imagen Fast at $0.02 per image, Imagen Ultra at $0.06.
- Code execution: E2B sandbox time at $0.005/minute (standard) or $0.02/minute (large).
- Web search: Serper at $0.001 per query, Tavily at $0.008 per query.
- Browser automation: BrightData at $8.00/GB with a minimum of $0.002 per operation.
All of these are our underlying costs. The price you pay is 2x that amount. Both figures are visible in the preview, so you can see exactly what drives the cost.
For a typical workflow — say, fetch data from an API, analyse it with an LLM, generate a summary, and post it to Slack — the preview might show something like: LLM analysis $0.003, LLM summary $0.002, Composio Slack action $0.002, total underlying cost $0.007, your cost $0.014. For most workflows, we’re talking pennies per run.
The preview cost is an estimate — the actual run may vary slightly because LLM token usage isn’t perfectly deterministic — but it’s accurate enough to make informed decisions. If a workflow is going to cost $2 per run, you’ll know before you commit.
A walkthrough: the Sales Lead Qualification demo
Let me make this concrete with one of our demo workflows.
The Sales Lead Qualification Pipeline triggers via webhook when a new lead arrives. Here’s what a Virtual Run looks like:
Step 1 — Trigger (webhook). The preview uses test data: a simulated lead with name, email, company, and source. In a real run, this would be the actual webhook payload.
Step 2 — Enrich with Clearbit. This is a READ operation, so in the preview it would execute live against the Clearbit API (if connected) and return real enrichment data — company size, industry, job title. If Clearbit isn’t connected, the system simulates a realistic response.
Step 3 — AI Score Lead. This is a TRANSFORM operation. The LLM receives the enrichment data and scores the lead 1-100 with reasoning. This executes for real in the preview, so you see the actual score the model assigns.
Step 4 — Conditional Branch. Score > 70? The preview evaluates this based on the real score from step 3 and follows the appropriate path.
Step 5a (high score) — Create HubSpot Deal. WRITE operation. The preview shows exactly what deal would be created: name, pipeline, stage, amount. All populated from real data.
Step 5b (high score) — Slack Alert. WRITE operation. You see the actual Slack message that would be sent to the sales channel, including the lead’s details and score.
Step 5c (low score) — Add to Mailchimp Nurture. WRITE operation. You see which list the lead would be added to and what tags would be applied.
Step 6 — Log to Airtable. WRITE operation. You see the exact row that would be created.
At the bottom of the preview, you see the total cost estimate, which steps succeeded, which connections are ready, and any warnings (like “this workflow creates new records on each run — be careful if scheduling it”).
The preview report
After a Virtual Run, you can export the entire manifest. The report includes every step’s outcome, all generated content, full cost breakdown, connection status, and any warnings or suggestions. It’s designed to be shareable — useful for getting approval from a manager, documenting a workflow for a team, or just keeping a record of what you tested.
Honest limitations
The Virtual Run system works well, but I want to be transparent about its boundaries.
The READ/WRITE classification is pattern-based. The system maintains lists of destructive patterns (DELETE, SEND, POST, CREATE, etc.) and uses them to classify operations. This covers the vast majority of cases correctly, but there may be edge cases with unusual integrations where a read is misclassified as a write or vice versa. We’re continuously expanding the classification rules as we encounter new services.
Cost estimates are close but not exact. LLM calls can vary in token usage between runs depending on the model’s output length. The estimate will be in the right ballpark, but don’t treat it as a guarantee to the penny.
Preview content quality depends on the data available. If earlier steps in the workflow didn’t return rich data (perhaps because a connection wasn’t set up), the generated preview content will be less specific.
Despite these limitations, the Virtual Run consistently provides something that no competitor offers: a genuine preview of your workflow’s behaviour, built from real data, with accurate cost projections, before you spend a thing.
That’s the Glass Box in action.
Questions about Virtual Run? Feedback on the preview system? I’d love to hear from you at gloriamundo.com.