🚀 Executive Summary
TL;DR: AI-driven E2E tests often become excessively slow in CI due to the high volume of chatty, synchronous Playwright calls made by the AI agent, leading to significant accumulated network latency. By implementing a shared Redis cache for these repetitive Playwright commands, execution times were drastically reduced from over 15 minutes to 50 milliseconds for subsequent runs in the CI pipeline.
🎯 Key Takeaways
- AI test agents can suffer from severe performance bottlenecks in CI due to the sheer volume of synchronous Playwright calls, where each small action (e.g., `isVisible()`, `click()`, `type()`) incurs a separate network round trip.
- Implementing a shared Redis cache for Playwright calls, with intelligent invalidation based on the `git commit SHA`, effectively transforms slow E2E tests into blindingly fast ones (e.g., 15 minutes to 50 milliseconds) for subsequent runs on the same codebase.
- For the most extreme performance gains, ‘Command Batching’ allows the AI agent to generate and send a script of multiple Playwright commands for single-request execution, eliminating chatter but requiring a more sophisticated agent architecture.
By caching the verbose, repetitive Playwright calls an AI test agent makes, we slashed our E2E test execution time from over 15 minutes to an astonishing 50 milliseconds in our CI pipeline.
We Nuked Our E2E Test Times to 50ms. Here’s the Dirty Little Caching Trick We Used.
I remember it like it was yesterday. It was 11 PM on a Thursday, release night. We were trying to push a critical hotfix, but the CI pipeline for `prod-release-candidate-v2.1.4` was glowing red. The culprit? The brand-new, “intelligent” E2E test suite. A single, flaky test was taking 12 minutes to run, timing out, and blocking the entire deployment. The dev team was getting antsy, my manager was pinging me on Slack, and all I could think was, “This AI was supposed to make our lives *easier*.” That night, I swore I’d figure out why these “next-gen” tests felt so sluggish and brittle.
The Real Bottleneck Isn’t the AI, It’s the Chatter
After a lot of digging and logging, we realized the problem wasn’t the AI model’s thinking time or even the browser rendering. The AI agent, designed to mimic a human user, was communicating with the Playwright browser driver over a network socket. The issue was the sheer *volume* of that communication. Every tiny action—find an element, get its text, click it, type a single character—was a separate, synchronous network round trip. It’s death by a thousand paper cuts.
Imagine this simplified flow for logging in:
| AI Agent Command | Playwright Action | Time (ms) |
| “Is the username field visible?” | page.locator('#username').isVisible() |
~15ms |
| “Okay, click it.” | page.locator('#username').click() |
~20ms |
| “Type ‘d’.” | page.locator('#username').type('d') |
~10ms |
| “Type ‘a’.” | page.locator('#username').type('a') |
~10ms |
| …and so on… | …for every single character… | … |
Individually, these calls are fast. But when an AI needs to make hundreds or thousands of them to explore a page and complete a task, the accumulated latency brings your CI pipeline to its knees. The “thinking” was fast, the execution was slow.
Three Ways to Slay the Latency Dragon
We attacked this problem from a few angles. Depending on your team’s needs and timeline, one of these might be right for you.
1. The Quick Fix: The In-Memory “Duct Tape” Cache
This is the hacky-but-effective solution we implemented that Thursday night to get the release out. We added a simple dictionary or hash map right inside the test agent’s process. The key was the Playwright command (e.g., “page.locator(‘#username’).isVisible()”), and the value was the result (e.g., “true”).
The logic is brutally simple: before making a real Playwright call, check if the exact same call is in our cache. If yes, return the cached result instantly. If no, make the real call, and store the result in the cache before returning.
# Super simple Python pseudocode
playwright_call_cache = {}
def call_playwright_with_cache(command, args):
cache_key = f"{command}:{str(args)}"
if cache_key in playwright_call_cache:
# Cache hit! Return the stored result instantly.
print(f"CACHE HIT: {cache_key}")
return playwright_call_cache[cache_key]
else:
# Cache miss. Make the real, slow call.
print(f"CACHE MISS: {cache_key}")
result = real_playwright_call(command, args)
playwright_call_cache[cache_key] = result
return result
Warning: This is a dirty fix. The cache is stateful and lives only as long as the test runner process. If the UI changes in the middle of a test run (which it shouldn’t, but hey, things happen), this cache will return stale data and cause your tests to fail in very confusing ways. Use with caution.
2. The Permanent Fix: A Shared Redis Cache
Once the fire was out, we implemented a real solution. We stood up a dedicated Redis instance (`redis-cache-cluster-01`) for our CI runners. This approach professionalizes the quick fix.
Instead of a local dictionary, the agent checks a shared Redis cache. This has several massive advantages:
- Stateless Runners: Your test runners remain stateless. You can spin up ten parallel runners (`ci-runner-prod-01` through `10`), and they’ll all benefit from the same shared cache.
- Intelligent Invalidation: The cache can be much smarter. We set the cache key to include the git commit SHA. When a new commit is pushed, it naturally uses a new set of keys, instantly invalidating the old cache. No more stale data.
- Persistence: The cache can survive beyond a single test run, speeding up subsequent pipeline runs for the same commit (e.g., re-running a failed job).
# Still pseudocode, but now with Redis
import redis
redis_client = redis.Redis(host='redis-cache-cluster-01', port=6379)
COMMIT_SHA = get_current_commit_sha() # Function to get git SHA
def call_playwright_with_redis(command, args):
cache_key = f"{COMMIT_SHA}:{command}:{str(args)}"
cached_result = redis_client.get(cache_key)
if cached_result is not None:
return deserialize(cached_result) # e.g., JSON.loads()
else:
result = real_playwright_call(command, args)
redis_client.set(cache_key, serialize(result), ex=3600) # Cache for 1 hour
return result
3. The ‘Nuclear’ Option: Command Batching
This solution attacks the root cause differently. Instead of trying to make each chatty call faster, it eliminates the chatter altogether. The idea is to have the AI agent generate a whole *script* of Playwright commands and then send that entire script to be executed in one go.
Instead of this:
- Agent -> Playwright: “Click #username”
- Playwright -> Agent: “OK”
- Agent -> Playwright: “Type ‘testuser’”
- Playwright -> Agent: “OK”
You do this:
- Agent -> Playwright: “Here’s a script: `page.locator(‘#username’).click(); page.locator(‘#username’).fill(‘testuser’);` Now run it and tell me the final result.”
- Playwright -> Agent: “OK, all done.”
Pro Tip: This is a significant architectural change and requires a much more sophisticated agent that can think ahead and bundle instructions. It eliminates the latency problem almost entirely but adds complexity to your AI agent’s logic. This is the path to true sub-second performance, but it’s a much bigger project.
So, Was It Worth It?
Absolutely. For our CI pipeline, where 90% of the test runs are on code that hasn’t changed the UI, the Redis caching approach was the perfect sweet spot. The first run on a new commit is still slow as it populates the cache. But any subsequent run—a retry, a parallel job, a teammate’s identical run—is blindingly fast. We’re talking 15 minutes down to 50 milliseconds for a fully cached E2E flow. It unblocked our developers, stabilized our deployments, and let me get a full night’s sleep. And you can’t put a price on that.
🤖 Frequently Asked Questions
âť“ Why are my AI-driven E2E tests so slow in CI, even with fast AI models?
AI-driven E2E tests are often slow not due to the AI model’s thinking time, but because the agent makes a high volume of synchronous, fine-grained Playwright calls over a network socket, causing accumulated latency from many small round trips.
âť“ How does caching Playwright calls compare to other methods for speeding up AI-driven E2E tests?
Caching Playwright calls, especially with a shared Redis instance, provides a significant speedup for repetitive UI interactions with minimal architectural changes. Command batching, while more complex, offers even greater performance by eliminating chatter entirely through script execution, but requires a more sophisticated AI agent.
âť“ What’s a common implementation pitfall when caching AI-driven Playwright calls, and how can it be avoided?
A common pitfall is using a simple in-memory cache, which can return stale data if the UI changes during a test run. This is avoided by using a shared, persistent cache like Redis with an intelligent invalidation strategy, such as incorporating the `git commit SHA` into the cache key to ensure fresh data for new code.
Leave a Reply