🚀 Executive Summary
TL;DR: Manually tracking competitor client additions is inefficient and error-prone. Automate this process using scripts and cloud services, leveraging HTML content hashing or visual regression with headless browsers to reliably detect website changes and send instant alerts.
🎯 Key Takeaways
- HTML content hashing provides a cost-effective method for detecting changes on competitor websites, using tools like `curl` and `md5sum` for basic cron jobs, or `requests` and `BeautifulSoup4` within serverless functions for enhanced reliability.
- A robust competitor monitoring system can be built using a serverless architecture (e.g., AWS Lambda, EventBridge, S3, SNS) to schedule checks, store state, and deliver notifications without manual infrastructure management.
- For dynamic, client-side rendered content or subtle visual changes, headless browsers (e.g., Puppeteer, Playwright) combined with pixel-comparison libraries offer a powerful, albeit more complex, solution for visual regression testing.
Learn how to automate competitor client monitoring using simple scripts and cloud services. We’ll cover everything from quick cron jobs to robust serverless solutions for tracking new client logos on competitor websites.
Automating Competitor Intelligence: How We Track When They Land a New Client
I still remember the Slack message from Brenda in Marketing. It was a Monday morning, and the request was simple enough: “Hey Darian, can you check CompetitorXYZ’s partner page every day this week and let me know if they add anyone new? The CEO is asking.” Simple, yes. A good use of my time? Absolutely not. That one-off request became a weekly task, then a daily one. It was manual, error-prone, and exactly the kind of soul-crushing toil that automation was invented to destroy. So, we automated it. Here’s how.
The Root of the Problem: Manual Toil is a Bug
The core issue isn’t the business request; competitive intelligence is critical. The problem is the process. Manually checking a website is unreliable. People forget, go on vacation, or miss subtle changes. The competitor’s marketing team might change the page layout, breaking your mental model of where to look. What we needed was a system that was reliable, cheap to run, and fired off an alert to the right people without any human intervention. We’re engineers; this is a classic change-detection problem that’s just begging for a script.
Here are three ways to tackle this, from the “get it done now” approach to the “build it to last” architecture.
Solution 1: The Quick & Dirty Fix (The Cron Job Special)
This is the classic “I need this done by lunch” solution. You log into a utility server that’s already running other tasks (we all have a util-prod-01 somewhere, don’t lie) and hammer out a quick shell script. The goal is to detect any change within a specific part of the competitor’s webpage.
The logic is simple:
- Use
curlto download the raw HTML of the page. - Pipe it through a tool like
pup(a command-line HTML parser) or evengrepto isolate the specific div containing the client logos. - Generate an MD5 hash of that specific HTML block.
- Compare that hash to the one you stored the last time the script ran.
- If the hashes are different, a change has occurred. Send an email or a Slack webhook notification.
#!/bin/bash
# A very simple competitor monitor script
URL="https://competitor-xyz.com/clients"
# Use browser dev tools to find a unique CSS selector for the logo area
CSS_SELECTOR="#client-logo-grid"
HASH_FILE="/var/tmp/competitor_hash.md5"
# Fetch the specific HTML block and calculate a new hash
# Assumes 'pup' is installed: https://github.com/ericchiang/pup
# On Debian/Ubuntu: sudo apt install pup
CURRENT_HASH=$(curl -s "$URL" | pup "$CSS_SELECTOR" text{} | md5sum)
# Check if the hash file exists
if [ ! -f "$HASH_FILE" ]; then
echo "No previous hash found. Creating one now."
echo "$CURRENT_HASH" > "$HASH_FILE"
exit 0
fi
PREVIOUS_HASH=$(cat "$HASH_FILE")
if [ "$CURRENT_HASH" != "$PREVIOUS_HASH" ]; then
echo "Change detected on $URL !"
# Add your notification logic here (e.g., curl to a Slack webhook)
# Update the hash file for the next run
echo "$CURRENT_HASH" > "$HASH_FILE"
else
echo "No change detected."
fi
Warning: This method is incredibly brittle. A simple class name change by the competitor’s front-end dev will break your selector, and the script might fail silently. It’s a quick win, but it’s not a long-term solution. It also depends on that one utility server being up and running.
Solution 2: The “We Should Probably Do This Right” Fix (The Serverless Approach)
This is my preferred solution and what we ultimately implemented. It’s more resilient, infinitely scalable, and the cost is effectively zero. We leverage a basic serverless architecture using standard cloud provider services (I’ll use AWS as the example).
| Component | Purpose |
| Amazon EventBridge | A serverless scheduler. We set up a rule to trigger our function once every hour. No cron maintenance needed. |
| AWS Lambda | The brains of the operation. A small Python function that contains our checking logic. |
| Amazon S3 | A simple object storage bucket to hold our state file (the last known hash). This decouples state from the execution environment. |
| Amazon SNS | A notification service. If a change is detected, the Lambda function publishes a message to an SNS topic. |
The flow is nearly identical to the script, but each piece is a managed service. The Lambda function uses libraries like requests and BeautifulSoup4 to parse the HTML more reliably than command-line tools.
# A simplified Python Lambda function example
import boto3
import requests
from bs4 import BeautifulSoup
import hashlib
import os
S3_BUCKET = os.environ['S3_BUCKET']
S3_KEY = 'hashes/competitor_hash.txt'
SNS_TOPIC_ARN = os.environ['SNS_TOPIC_ARN']
URL_TO_CHECK = "https://competitor-xyz.com/clients"
CSS_SELECTOR = "div#client-logo-grid"
s3 = boto3.client('s3')
sns = boto3.client('sns')
def get_previous_hash():
try:
obj = s3.get_object(Bucket=S3_BUCKET, Key=S3_KEY)
return obj['Body'].read().decode('utf-8')
except s3.exceptions.NoSuchKey:
print("Hash file not found in S3. Will create a new one.")
return None
def lambda_handler(event, context):
response = requests.get(URL_TO_CHECK)
soup = BeautifulSoup(response.content, 'html.parser')
logo_div = soup.select_one(CSS_SELECTOR)
if not logo_div:
print(f"Error: Could not find element with selector '{CSS_SELECTOR}'")
return
current_html_content = str(logo_div)
current_hash = hashlib.md5(current_html_content.encode('utf-8')).hexdigest()
previous_hash = get_previous_hash()
if current_hash != previous_hash:
print(f"Change detected! New hash: {current_hash}")
# Send SNS notification
sns.publish(
TopicArn=SNS_TOPIC_ARN,
Subject="Competitor Client Page Changed!",
Message=f"A change was detected on the client page: {URL_TO_CHECK}"
)
# Update the hash in S3
s3.put_object(Bucket=S3_BUCKET, Key=S3_KEY, Body=current_hash)
else:
print("No change detected.")
return {'status': 'success'}
Pro Tip: By subscribing a Slack channel’s email address or a webhook integration to the SNS topic, Marketing gets instant, automated alerts right where they work. This is a massive win for cross-team collaboration.
Solution 3: The “Overkill but Awesome” Option (Visual Regression)
What if the competitor’s site is a single-page app built with React or Vue, where the client logos are rendered client-side by JavaScript? A simple curl or requests call won’t see them. Or what if they just change the image file of a logo without altering the HTML? For that, you need to see what a user sees.
This approach uses a headless browser like Puppeteer or Playwright to render the page fully, just like Chrome would.
The logic becomes:
- Launch a headless browser instance (this can be done in Lambda using a container image).
- Navigate to the URL.
- Wait for the specific logo grid element to be visible on the page, ensuring all JS has executed.
- Take a screenshot of just that element.
- Use a pixel-comparison library (like
pixelmatch) to diff the new screenshot against a baseline image stored in S3. - If the number of different pixels exceeds a set threshold, you’ve found a change. Fire the alert and update the baseline image in S3.
This is the most robust method against site structure changes, but it’s also the most complex and expensive to set up and run. For 95% of use cases, it’s overkill. But for that tricky 5%, it’s an incredibly powerful tool to have in your back pocket. We haven’t needed to deploy this yet, but it’s our nuclear option if the serverless hash check starts failing.
Conclusion
That simple request from Brenda in Marketing turned into a fun little automation project that delivered real business value. We went with Solution 2, and it’s been running flawlessly for over a year, costing pennies per month. It’s a perfect example of how a small investment in DevOps automation can eliminate manual toil and create a more reliable intelligence stream for the entire company.
🤖 Frequently Asked Questions
âť“ How can I monitor competitor client changes automatically?
Automate competitor client monitoring by regularly fetching webpage content, isolating relevant sections (e.g., client logo grids), and comparing a hash of that content against a previously stored baseline. If hashes differ, a change is detected, triggering an alert.
âť“ How does this compare to alternatives?
Manual checking is unreliable and inefficient. Basic cron jobs are quick but brittle due to CSS selector changes. Serverless solutions offer scalability, resilience, and cost-effectiveness for HTML-based changes. Visual regression with headless browsers is the most robust for dynamic content but adds complexity and cost.
âť“ What is a common implementation pitfall when setting up competitor monitoring?
A common pitfall is using brittle CSS selectors for content extraction, which can break silently if the competitor’s website layout or class names change. To mitigate this, use more robust HTML parsing libraries like `BeautifulSoup4` or opt for visual regression testing with headless browsers.
Leave a Reply