Agent Launch Day Checklist.
The exact 20-point checklist we use before deploying every agent to production. Covers testing, security, monitoring, and go-live.
This is the same checklist AlphaForge uses internally for every client deployment.
Phase 1
Pre-Build Validation
1.Workflow mapping complete
Document every step the agent will handle. Map inputs, outputs, decision points, and handoffs. Missing steps mean missed edge cases that surface in production.
2.Success criteria defined
What does "working" look like? Define measurable outcomes before you build. Without clear targets, you cannot distinguish a functioning agent from a broken one.
3.Integration credentials verified
Test API keys, email access, CRM permissions, and database connections. Do not discover auth issues on launch day. Verify every credential against live endpoints.
4.Sample data prepared
Create 10-20 realistic test inputs covering normal cases, edge cases, and error scenarios. Include malformed inputs, missing fields, and boundary values.
5.Escalation path documented
Define exactly when the agent should hand off to a human, and how. Specify the channel, the format, and the information included in every escalation.
Phase 2
Build & Configuration
6.System prompt tested against edge cases
Run adversarial prompts. Try to break it. Feed it contradictory instructions, ambiguous inputs, and out-of-scope requests. Fix what breaks before it reaches users.
7.Tool permissions scoped to minimum required
Agents should only access what they need. No admin credentials, no broad filesystem access, no unnecessary write permissions. Principle of least privilege applies.
8.Rate limits configured
Prevent runaway API costs. Set daily and hourly caps on LLM calls, external API requests, and email sends. A misconfigured loop can burn through a monthly budget in hours.
9.Error handling for every integration
What happens when the CRM is down? When the email API times out? When the LLM returns an empty response? Handle every failure mode gracefully with retries and fallbacks.
10.Memory and context limits set
Configure conversation history limits to prevent context window overflow. Define compaction thresholds, token budgets, and what gets pruned first when limits are reached.
Phase 3
Security & Compliance
11.Sensitive data access restricted
The agent cannot read .env files, SSH keys, or credentials outside its designated scope. Audit every file path and environment variable the agent can touch.
12.Command execution governance
Dangerous commands like sudo, rm -rf, and chmod require human approval. Define an allowlist of safe commands and block everything else by default.
13.Prompt injection defenses active
Test for injection attacks across every input surface. Ensure the agent ignores malicious instructions embedded in emails, form fields, and external data sources.
14.Audit logging enabled
Every action the agent takes should be logged with a timestamp, the triggering input, and the resulting output. Logs must be immutable and retained for compliance review.
15.Data retention policy configured
How long does the agent store conversation data? Define retention windows, purge schedules, and data handling procedures that comply with your privacy requirements.
Phase 4
Go-Live & Monitoring
16.Monitoring and alerting configured
Set up alerts for errors, high latency, token budget spikes, and unusual request patterns. If the agent fails at 2 AM, you need to know before your client does.
17.Rollback plan documented
If something goes wrong, how do you turn it off? Document the kill switch, the rollback procedure, and who is responsible for executing it.
18.Client walkthrough completed
Demo 3-5 real scenarios with the client before going live. Walk through normal operations, edge cases, and the escalation flow so nothing is a surprise.
19.Usage dashboard accessible
The client can see what the agent is doing, how many tasks it has processed, error rates, and results. Transparency builds trust and surfaces issues early.
20.First 48-hour review scheduled
Plan to review agent performance after 48 hours and tune as needed. Check accuracy, response times, escalation rates, and cost per action against your success criteria.
We handle it all.
Every AlphaForge deployment follows this checklist automatically. Our team handles testing, security, monitoring, and go-live so you can focus on results.