🚀 Executive Summary

TL;DR: Users often struggle to combine Screaming Frog data from various tabs into a single custom report. This guide provides three battle-tested methods: in-app custom extraction, Python with Pandas for post-export merging, and database integration for large-scale, continuous analysis.

🎯 Key Takeaways

  • Custom Extraction allows in-app data combination using XPath or CSS selectors, but adding multiple extractors increases crawl time and memory usage.
  • Python with Pandas provides a robust, scriptable solution for merging exported CSVs, typically using a ‘left’ join on the ‘Address’ (URL) column to retain all primary URLs.
  • For massive crawls or continuous SEO monitoring, storing Screaming Frog data directly into a database enables powerful SQL-based custom reporting and integration with visualization tools.

Is it possible to combine data from different tabs/reports into a single custom table before exporting in Screaming Frog?

Struggling to merge data from different Screaming Frog tabs? I’ll show you three battle-tested methods, from in-app hacks to robust database solutions, to create the unified custom report you actually need.

Wrangling the Frog: A DevOps Approach to Merging Screaming Frog Reports

I remember one night, around 2 AM, when a critical service was flapping. The logs on the load balancer showed clean 200s, the app server logs were spitting out cryptic Java stack traces, and the database on prod-db-01 was just… quiet. None of them told the whole story. It took manually stitching together log files with timestamps and trace IDs to finally see the single, malformed user query that was poisoning the connection pool. That feeling—of having all the necessary data scattered across different silos—is exactly what I see in that Reddit thread about combining Screaming Frog reports. You’ve got your crawl data, your H1s, your directives, all in separate tabs, and you just want one clean view. Let’s fix that.

First, Why Is It Like This?

Before we dive into the solutions, it’s important to understand the “why.” Screaming Frog isn’t a business intelligence tool or a database. It’s a crawler, optimized for speed and memory efficiency. It processes specific chunks of data and files them away into dedicated tabs. Each tab is like a pre-built, indexed report. Asking the UI to perform real-time, cross-tabular joins on a million-page crawl would grind it to a halt. The separation is a feature for performance, not a bug. But once you understand that, you can work around it.

The Three Levels of Combining Data

I see this as a tiered problem, just like we handle system alerts. There’s the quick fix to stop the bleeding, the permanent fix to prevent it from happening again, and the architectural change for massive scale. Let’s break them down.

Solution 1: The “In-App” Hack with Custom Extraction

This is your quick and dirty, “I need this report for a meeting in 10 minutes” solution. You don’t leave the Screaming Frog UI. Instead, you force one tab to pull data that normally lives in another. The tool for this is Custom Extraction.

Let’s say you want a simple table with URL, Status Code, Word Count, and the H1 tag. The first three are on the ‘Internal’ tab, but the H1 is on its own. Here’s the hack:

  1. Go to Configuration > Custom > Extraction.
  2. Click ‘Add’ and use XPath or CSS selectors to grab the data you need.
  3. Set the ‘Extractor’ to ‘Extract Text’ to get the clean H1 content.

Here’s the XPath you’d use to grab the first H1 tag on a page:

//h1[1]

And for the meta description:

//meta[@name='description']/@content

Now, when you re-run your crawl, you’ll have new columns right on your main ‘Internal’ tab containing the H1 and Meta Description. You can export that single tab and you’re done.

Warning: This is a hack for a reason. Each custom extraction you add increases the crawl time and memory usage because the crawler has to do extra parsing on every page. Use it for a handful of elements, not for twenty.

Solution 2: The “Proper” Fix with Python & Pandas

This is the standard DevOps approach. It’s repeatable, scriptable, and reliable. The philosophy is simple: let the crawler do its job, export the raw data, and then use a dedicated tool to join it. My weapon of choice here is always a simple Python script using the Pandas library.

The Workflow:

  1. Run your crawl in Screaming Frog.
  2. Export the reports you need (e.g., internal_all.csv, h1_all.csv, meta_description_all.csv).
  3. Run a script to merge them based on the ‘Address’ (URL) column.

Here’s a basic script that gets the job done. It’s self-contained and you can run it every time you need an updated report.

import pandas as pd

# Load the raw CSVs from Screaming Frog
internal_df = pd.read_csv('internal_all.csv', low_memory=False)
h1_df = pd.read_csv('h1_all.csv', low_memory=False)

# We only need the Address and the H1-1 columns from the h1 report
h1_subset = h1_df[['Address', 'H1-1']]

# Merge the two dataframes using the URL as the key
# A 'left' join ensures we keep all URLs from the main internal_all report
final_report = pd.merge(internal_df, h1_subset, on='Address', how='left')

# Save the shiny new combined report
final_report.to_csv('combined_seo_report.csv', index=False)

print("Combined report 'combined_seo_report.csv' has been created.")

Pro Tip: Always use a how='left' merge. This ensures that every URL from your primary file (internal_all.csv) is kept, even if a corresponding H1 wasn’t found. An ‘inner’ join would drop pages that are missing an H1, which could hide issues.

Solution 3: The “Scalable” Option with a Database

Sometimes you’re dealing with a crawl so massive (millions of URLs) that CSVs become unwieldy. Or, you need to run analysis on this data daily. This is when you bring in the heavy machinery. Screaming Frog can store its crawl data directly into a database (Mode > Storage > Database).

Once you’ve configured it and run a crawl, Screaming Frog creates a set of tables in your database (e.g., `internal`, `h1`, `meta_description`) that mirror the tabs in the UI. Now, you can write a simple SQL query to build any report you can imagine.

For example, to get your URL, status code, word count, H1, and meta description, you’d connect to your database and run:

SELECT
  i.address,
  i.status_code,
  i.word_count,
  h.h1_1,
  md.meta_description_1
FROM
  internal i
LEFT JOIN
  h1 h ON i.address = h.address
LEFT JOIN
  meta_description md ON i.address = md.address
WHERE
  i.status_code = 200;

This is the “nuclear” option, but for large-scale, continuous SEO monitoring and analysis, it’s the only sane way to operate. You can hook this up to visualization tools like Metabase or Tableau and have live dashboards. It’s how we handle our largest clients at TechResolve.

The Right Tool for the Job

At the end of the day, there’s no single “best” way. If you need a one-off report, the Custom Extraction hack is fast and effective. If this is a report you’ll need every month, write a script and automate it. And if you’re managing a site the size of a small country, use a database. Stop fighting your tools and start building a workflow that solves the actual problem: getting the right data, combined in the right way, so you can make a decision and get back to work.

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ How can I combine data from different Screaming Frog tabs into one report?

You can combine data using three methods: in-app Custom Extraction for quick, limited merges; Python with Pandas for scriptable, repeatable merging of exported CSVs; or database integration for large-scale, continuous analysis via SQL queries.

âť“ What are the trade-offs between the different methods for combining Screaming Frog data?

Custom Extraction is quick for one-off, limited merges but increases crawl time. Python/Pandas is repeatable and reliable for exported CSVs. Database integration is highly scalable for millions of URLs and continuous analysis but requires database setup and SQL knowledge.

âť“ What is a common pitfall when merging Screaming Frog data with Python/Pandas and how can it be avoided?

A common pitfall is using an ‘inner’ join, which would drop URLs that don’t have corresponding data in all merged files. This can be avoided by always using a `how=’left’` merge to ensure all URLs from your primary file are retained.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading