🚀 Executive Summary

TL;DR: The article provides a Python script to automate the migration of articles from an aging Joomla database to Jekyll Markdown files. This solution converts Joomla’s HTML content into Jekyll-compatible Markdown, including front matter, significantly reducing manual effort for static site modernization.

🎯 Key Takeaways

  • A Python script utilizes `mysql-connector-python`, `html2text`, and `python-dotenv` to connect to a Joomla database, convert HTML content to Markdown, and securely manage credentials.
  • The migration process combines Joomla’s `introtext` and `fulltext` fields, generates Jekyll front matter with metadata like title and date, and formats filenames as `YYYY-MM-DD-alias.md`.
  • A critical step is correctly identifying the `DB_TABLE_PREFIX` from the Joomla `configuration.php` file for accurate database queries, and manually creating the `_posts` output directory before running the script.

Migrate Joomla Articles to Markdown for Jekyll

Migrate Joomla Articles to Markdown for Jekyll

Hey there, Darian here. Let’s talk about modernizing a web stack. A while back, a client’s marketing site, built on an aging Joomla version, landed on my desk. The content was solid, but performance was lagging and the content workflow was a pain. Instead of a massive CMS overhaul, we opted for a much leaner approach: a static site generated by Jekyll. The biggest challenge was liberating years of articles from the Joomla database. This guide walks you through the Python script I built to automate that process. It turned a week of potential manual work into a ten-minute coffee break.

Prerequisites

Before we dive in, make sure you have the following ready to go:

  • Direct read-access to your Joomla MySQL or MariaDB database.
  • A Python 3 environment.
  • A basic understanding of SQL queries.
  • A Jekyll project structure ready to receive the posts.

The Step-by-Step Guide

Step 1: Setting Up Your Workspace

First things first, let’s get our environment sorted. I always start by creating a new project directory to keep things clean. I’ll skip the standard virtualenv setup steps since you probably have a preferred workflow for that. The important part is to install the necessary Python libraries. For this script, we’ll need three main packages:

  • mysql-connector-python: To talk to the Joomla database.
  • html2text: The magic ingredient that converts HTML to clean Markdown.
  • python-dotenv: For securely managing our database credentials.

You can get these installed in your active environment using pip. Once that’s done, create two files in your project directory: migrate.py for our script and config.env for our database credentials.

Your config.env file should look something like this. Be sure to replace the placeholder values with your actual Joomla database details.

DB_HOST=localhost
DB_USER=your_joomla_db_user
DB_PASSWORD=your_super_secret_password
DB_NAME=your_joomla_db_name
DB_TABLE_PREFIX=jos_

Pro Tip: That DB_TABLE_PREFIX is crucial. Joomla allows you to set a custom prefix for its tables during installation. The default is often jos_, but yours might be different. You can find the correct prefix in your Joomla site’s configuration.php file.

Step 2: The Logic Explained

Before I drop the full script on you, let’s walk through the plan. Our script will perform a sequence of logical steps:

  1. Load Configuration: It starts by loading the database credentials securely from our config.env file.
  2. Connect to Database: Using those credentials, it establishes a connection to the Joomla database.
  3. Fetch Articles: It runs a specific SQL query to select all published articles. We’ll grab the title, content, creation date, and alias (the URL-friendly version of the title).
  4. Process Each Article: The script will then loop through every article it fetched.
    • It combines Joomla’s introtext and fulltext fields into a single block of HTML.
    • It converts that HTML into Markdown using html2text.
    • It constructs the Jekyll “front matter”—the YAML block at the top of a Markdown file that contains metadata like the title, layout, and date.
    • It creates a filename in the Jekyll-required format: YYYY-MM-DD-alias.md.
  5. Write to File: Finally, it writes the combined front matter and Markdown content into a new file inside an _posts directory.

This process ensures every article becomes a well-formatted, Jekyll-compatible Markdown file.

Step 3: The Full Python Script

Alright, here is the complete script. Save this as migrate.py. I’ve added comments throughout to explain what each part does. Make sure you have an empty directory named _posts in the same folder where you run this script.


import os
import mysql.connector
import html2text
from dotenv import load_dotenv
from datetime import datetime

def migrate_articles():
    """
    Connects to a Joomla database, fetches articles, converts them to Markdown,
    and saves them as Jekyll-compatible post files.
    """
    load_dotenv('config.env')

    # --- Load configuration from config.env file ---
    db_host = os.getenv('DB_HOST')
    db_user = os.getenv('DB_USER')
    db_password = os.getenv('DB_PASSWORD')
    db_name = os.getenv('DB_NAME')
    table_prefix = os.getenv('DB_TABLE_PREFIX')

    # --- Create the output directory if it doesn't exist ---
    output_dir = '_posts'
    if not os.path.exists(output_dir):
        # I'm describing this step instead of running a shell command.
        # You should create a directory named '_posts' here manually.
        print(f"Please create the '{output_dir}' directory and run the script again.")
        return

    try:
        # --- Connect to the MySQL database ---
        print("Connecting to the database...")
        conn = mysql.connector.connect(
            host=db_host,
            user=db_user,
            password=db_password,
            database=db_name
        )
        cursor = conn.cursor(dictionary=True)
        print("Connection successful.")

        # --- SQL query to fetch published articles ---
        # We only want articles that are published (state=1) and not in the trash.
        content_table = f"{table_prefix}content"
        query = f"""
            SELECT title, introtext, `fulltext`, created, alias
            FROM {content_table}
            WHERE state = 1
        """

        print("Fetching articles...")
        cursor.execute(query)
        articles = cursor.fetchall()
        print(f"Found {len(articles)} articles to migrate.")

        # --- Initialize the HTML to Markdown converter ---
        h = html2text.HTML2Text()
        h.body_width = 0  # Prevents line wrapping

        # --- Process each article ---
        for article in articles:
            title = article['title']
            created_date = article['created']
            alias = article['alias']
            
            # Combine intro and full text
            full_html_content = article['introtext'] + article['fulltext']
            
            # Convert HTML to Markdown
            markdown_content = h.handle(full_html_content)
            
            # Format date for Jekyll filename and front matter
            jekyll_date_str = created_date.strftime('%Y-%m-%d')
            front_matter_date = created_date.strftime('%Y-%m-%d %H:%M:%S')

            # Create the Jekyll front matter
            front_matter = f"""---
layout: post
title: "{title.replace('"', '"')}"
date: {front_matter_date}
---

"""
            
            # Create the full file content
            final_content = front_matter + markdown_content
            
            # Create the filename
            filename = f"{jekyll_date_str}-{alias}.md"
            filepath = os.path.join(output_dir, filename)
            
            # Write the content to the file
            with open(filepath, 'w', encoding='utf-8') as f:
                f.write(final_content)
        
        print(f"\nMigration complete! All {len(articles)} articles have been saved to the '{output_dir}' directory.")

    except mysql.connector.Error as err:
        print(f"Error: {err}")
        return
    finally:
        if 'conn' in locals() and conn.is_connected():
            cursor.close()
            conn.close()
            print("Database connection closed.")

if __name__ == '__main__':
    migrate_articles()

To run it, simply execute the script from your terminal. It will connect, fetch, convert, and save everything for you.

Common Pitfalls (Where I Usually Mess Up)

  • Incorrect Table Prefix: This is the number one issue. If the script reports finding 0 articles but you know they exist, double-check the DB_TABLE_PREFIX in your config.env file against your Joomla configuration.php.
  • Character Encoding: Databases can have tricky encoding. I’ve written the script to write files in UTF-8, which is standard, but if you see garbled characters in your output, your source database might have a different encoding that needs to be handled during the connection.
  • Image Paths: This script converts text content but does not migrate images. Your old Joomla image paths (e.g., images/my-photo.jpg) will remain in the Markdown. You’ll need to manually copy your Joomla /images directory into your Jekyll project and ensure the paths still work.

Pro Tip: When testing, I recommend adding LIMIT 5 to the end of the SQL query. This lets you run the script on a small batch of articles to check the formatting and file output without processing thousands of posts at once. Once you’re happy with the result, you can remove the limit and run it on the entire database.

Conclusion

And that’s it. With this workflow, you can extract your valuable content from a legacy system and move it into a modern, fast, and version-controllable static site structure. It’s a huge step toward reducing technical debt and simplifying your content management. In my production setups, this has been a massive time-saver, and I hope it helps you out as well. Happy migrating!

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ How can I efficiently migrate Joomla articles to Jekyll Markdown?

Utilize the provided Python script, which connects to your Joomla MySQL database, fetches published articles, converts their HTML content to Markdown using `html2text`, and saves them as Jekyll-compatible files with appropriate front matter.

âť“ How does this automated migration compare to manual methods or full CMS overhauls?

This script automates a process that would otherwise take weeks of manual effort, offering a lean, fast, and version-controllable static site solution. It avoids a complete CMS overhaul, focusing solely on content liberation for better performance and simplified content management.

âť“ What is a common implementation pitfall when migrating Joomla articles with this script?

The most common pitfall is an incorrect `DB_TABLE_PREFIX` in the `config.env` file. Verify this prefix against your Joomla site’s `configuration.php` to ensure the script can correctly query your database tables.

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading