Skip to main content
Checklist

Agent Launch Day Checklist.

The exact 20-point checklist we use before deploying every agent to production. Covers testing, security, monitoring, and go-live.

This is the same checklist AlphaForge uses internally for every client deployment.

1

Phase 1

Pre-Build Validation

1.Workflow mapping complete

Document every step the agent will handle. Map inputs, outputs, decision points, and handoffs. Missing steps mean missed edge cases that surface in production.

2.Success criteria defined

What does "working" look like? Define measurable outcomes before you build. Without clear targets, you cannot distinguish a functioning agent from a broken one.

3.Integration credentials verified

Test API keys, email access, CRM permissions, and database connections. Do not discover auth issues on launch day. Verify every credential against live endpoints.

4.Sample data prepared

Create 10-20 realistic test inputs covering normal cases, edge cases, and error scenarios. Include malformed inputs, missing fields, and boundary values.

5.Escalation path documented

Define exactly when the agent should hand off to a human, and how. Specify the channel, the format, and the information included in every escalation.

2

Phase 2

Build & Configuration

6.System prompt tested against edge cases

Run adversarial prompts. Try to break it. Feed it contradictory instructions, ambiguous inputs, and out-of-scope requests. Fix what breaks before it reaches users.

7.Tool permissions scoped to minimum required

Agents should only access what they need. No admin credentials, no broad filesystem access, no unnecessary write permissions. Principle of least privilege applies.

8.Rate limits configured

Prevent runaway API costs. Set daily and hourly caps on LLM calls, external API requests, and email sends. A misconfigured loop can burn through a monthly budget in hours.

9.Error handling for every integration

What happens when the CRM is down? When the email API times out? When the LLM returns an empty response? Handle every failure mode gracefully with retries and fallbacks.

10.Memory and context limits set

Configure conversation history limits to prevent context window overflow. Define compaction thresholds, token budgets, and what gets pruned first when limits are reached.

3

Phase 3

Security & Compliance

11.Sensitive data access restricted

The agent cannot read .env files, SSH keys, or credentials outside its designated scope. Audit every file path and environment variable the agent can touch.

12.Command execution governance

Dangerous commands like sudo, rm -rf, and chmod require human approval. Define an allowlist of safe commands and block everything else by default.

13.Prompt injection defenses active

Test for injection attacks across every input surface. Ensure the agent ignores malicious instructions embedded in emails, form fields, and external data sources.

14.Audit logging enabled

Every action the agent takes should be logged with a timestamp, the triggering input, and the resulting output. Logs must be immutable and retained for compliance review.

15.Data retention policy configured

How long does the agent store conversation data? Define retention windows, purge schedules, and data handling procedures that comply with your privacy requirements.

4

Phase 4

Go-Live & Monitoring

16.Monitoring and alerting configured

Set up alerts for errors, high latency, token budget spikes, and unusual request patterns. If the agent fails at 2 AM, you need to know before your client does.

17.Rollback plan documented

If something goes wrong, how do you turn it off? Document the kill switch, the rollback procedure, and who is responsible for executing it.

18.Client walkthrough completed

Demo 3-5 real scenarios with the client before going live. Walk through normal operations, edge cases, and the escalation flow so nothing is a surprise.

19.Usage dashboard accessible

The client can see what the agent is doing, how many tasks it has processed, error rates, and results. Transparency builds trust and surfaces issues early.

20.First 48-hour review scheduled

Plan to review agent performance after 48 hours and tune as needed. Check accuracy, response times, escalation rates, and cost per action against your success criteria.

Skip the checklist

We handle it all.

Every AlphaForge deployment follows this checklist automatically. Our team handles testing, security, monitoring, and go-live so you can focus on results.