🚀 Executive Summary

TL;DR: OneNote’s proprietary format and clunky export options often lead to data lock-in. This guide provides a Python-based solution to programmatically migrate OneNote notebooks to Obsidian, converting content to future-proof Markdown files using the Microsoft Graph API.

🎯 Key Takeaways

  • Secure API access is established via an Azure AD Application, requiring `Notes.Read.All` delegated permission and admin consent in the Azure Portal.
  • The Python script uses `msal` for OAuth2 authentication, `requests` for Graph API calls, `python-dotenv` for secure credential management, and `BeautifulSoup4` for HTML to Markdown conversion.
  • The migration process involves programmatically traversing OneNote notebooks, sections, and pages using the Graph API, fetching HTML content, and converting it into `.md` files within the specified Obsidian vault.
  • Critical security practices include storing credentials in `config.env` and adding it to `.gitignore`, while `time.sleep` helps prevent Microsoft Graph API throttling.
  • The `convert_html_to_markdown` function provides basic conversion for text, headers, and lists; complex OneNote content like tables or embedded files may require post-migration manual cleanup.

Migrate OneNote Notebooks to Obsidian (Markdown)

Migrate OneNote Notebooks to Obsidian (Markdown)

Hey team, Darian here. For the longest time, my notes were locked away in OneNote. It’s a solid tool, but it’s a walled garden. I finally hit a breaking point when I realized I was spending way too much time hunting for siloed information and fighting with its clunky export options. Migrating everything to Obsidian, a system built on plain Markdown files, was a total game-changer. Now my notes are future-proof, easily version-controlled with Git, and link together beautifully. This process saved me from that proprietary lock-in, and I want to walk you through how I did it.

Prerequisites

Before we dive in, make sure you have the following ready to go. This will make the process much smoother.

  • A Microsoft 365 Account with the OneNote notebooks you want to migrate.
  • Python 3.8+ installed on your machine.
  • An Azure AD Application set up to access the Microsoft Graph API. This is the key to letting our script talk to your Microsoft account securely.
  • Your destination Obsidian Vault created and ready on your local machine.

The Guide: From OneNote to Your Vault

Step 1: Configure API Access in Azure

First, we need to give our script permission to read your OneNote data. This is the most admin-heavy part, but you only do it once.

  1. Log into the Azure Portal and navigate to “Azure Active Directory”.
  2. Go to “App registrations” and click “New registration”. Give it a clear name like `onenote-obsidian-migrator`.
  3. Once created, note down the Application (client) ID and Directory (tenant) ID. You’ll need these.
  4. Next, go to “Certificates & secrets”. Create a “New client secret”. Copy the Value immediately—it disappears after you navigate away. Treat this like a password.
  5. Finally, head to “API permissions”. Click “Add a permission”, select “Microsoft Graph”, then “Delegated permissions”. Search for and add Notes.Read.All. Click the “Grant admin consent” button for your directory. This is a critical step.

Step 2: Set Up Your Python Environment

I’ll skip the standard virtual environment setup since you likely have your own workflow for that. The important part is getting the right libraries installed. In your activated environment, you’ll need to run the commands to install the necessary packages. I typically use `pip` for this:

pip install requests msal beautifulsoup4 python-dotenv

Next, create a project directory. Inside, create your main Python script (e.g., `migrate.py`) and a configuration file named `config.env`. This file will securely store your credentials so you don’t hardcode them.

Your `config.env` file should look like this:


CLIENT_ID=your-application-client-id
CLIENT_SECRET=your-client-secret-value
TENANT_ID=your-directory-tenant-id
OBSIDIAN_VAULT_PATH=/path/to/your/obsidian/vault

Pro Tip: Always add your `config.env` file to `.gitignore`. In my production setups, we never, ever commit credentials to source control. This is non-negotiable for security.

Step 3: The Python Script – Authentication & Fetching Data

Now for the fun part. We’ll use the MSAL library to handle the OAuth2 flow and get an access token from Microsoft Graph. This token proves our script is authorized to read your notes.

Here’s the core logic. We’ll read the config, authenticate, and then write a function to make API calls.


import os
import requests
import msal
from dotenv import load_dotenv
from bs4 import BeautifulSoup
import time
import re

# Load environment variables from config.env
load_dotenv('config.env')

CLIENT_ID = os.getenv('CLIENT_ID')
CLIENT_SECRET = os.getenv('CLIENT_SECRET')
TENANT_ID = os.getenv('TENANT_ID')
VAULT_PATH = os.getenv('OBSIDIAN_VAULT_PATH')

AUTHORITY = f"https://login.microsoftonline.com/{TENANT_ID}"
SCOPE = ["https://graph.microsoft.com/.default"]
GRAPH_API_ENDPOINT = 'https://graph.microsoft.com/v1.0/me/onenote'

def get_access_token():
    """Authenticates and returns an access token."""
    app = msal.ConfidentialClientApplication(
        CLIENT_ID, authority=AUTHORITY, client_credential=CLIENT_SECRET
    )
    result = app.acquire_token_silent(SCOPE, account=None)
    if not result:
        print("No suitable token in cache, acquiring a new one...")
        result = app.acquire_token_for_client(scopes=SCOPE)
    if "access_token" in result:
        return result['access_token']
    else:
        print(f"Error acquiring token: {result.get('error_description')}")
        return None

def make_api_call(endpoint, token):
    """Makes a GET request to the specified Graph API endpoint."""
    headers = {'Authorization': f'Bearer {token}'}
    try:
        response = requests.get(endpoint, headers=headers)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as err:
        print(f"HTTP Error: {err}")
        return None

# --- We will add the migration logic below this line ---

This code sets up the foundation. The `get_access_token` function handles the handshake with Azure AD, and `make_api_call` is a reusable helper for querying the Graph API.

Step 4: The Core Migration Logic

Now we’ll build out the functions to loop through notebooks, sections, and pages. The OneNote API returns page content as HTML, so we’ll need to convert it.

Here’s the heart of the script. Add this to your `migrate.py` file.


def sanitize_filename(name):
    """Removes invalid characters from a string to make it a valid filename."""
    return re.sub(r'[\\/*?:"<>|]', "", name)

def convert_html_to_markdown(html_content, page_title):
    """A simple HTML to Markdown converter."""
    soup = BeautifulSoup(html_content, 'html.parser')
    
    # Start with the page title as the main header
    markdown = f"# {page_title}\n\n"

    for element in soup.find_all(['p', 'h1', 'h2', 'h3', 'h4', 'li', 'pre']):
        if element.name == 'p':
            markdown += element.get_text() + "\n\n"
        elif element.name.startswith('h'):
            level = int(element.name[1])
            markdown += '#' * level + ' ' + element.get_text() + "\n\n"
        elif element.name == 'li':
            # This is a simplified list handling
            markdown += f"- {element.get_text()}\n"
        elif element.name == 'pre':
            # Handle code blocks
            markdown += f"```\n{element.get_text()}\n```\n\n"
            
    # Add a newline after list items for better formatting
    markdown = markdown.replace("-\n-", "-\n\n-")
    return markdown

def start_migration():
    """Main function to run the migration process."""
    token = get_access_token()
    if not token:
        print("Failed to get access token. Aborting.")
        return

    print("Successfully authenticated. Fetching notebooks...")
    notebooks_data = make_api_call(f"{GRAPH_API_ENDPOINT}/notebooks", token)
    if not notebooks_data or 'value' not in notebooks_data:
        print("Could not fetch notebooks.")
        return

    for notebook in notebooks_data['value']:
        notebook_name = sanitize_filename(notebook['displayName'])
        notebook_path = os.path.join(VAULT_PATH, notebook_name)
        if not os.path.exists(notebook_path):
            os.makedirs(notebook_path)
        print(f"\nProcessing Notebook: {notebook_name}")

        sections_data = make_api_call(notebook['sectionsUrl'], token)
        if not sections_data or 'value' not in sections_data:
            continue

        for section in sections_data['value']:
            section_name = sanitize_filename(section['displayName'])
            print(f"  - Section: {section_name}")
            
            pages_data = make_api_call(section['pagesUrl'], token)
            if not pages_data or 'value' not in pages_data:
                continue
                
            for page in pages_data['value']:
                page_title = sanitize_filename(page['title'])
                print(f"    - Page: {page_title}")
                
                # Fetch page content, which is HTML
                content_url = page['contentUrl']
                page_content_response = make_api_call(content_url, token)
                
                if page_content_response:
                    # The content is HTML, so we pass it to the converter
                    markdown_content = convert_html_to_markdown(page_content_response, page['title'])
                    file_path = os.path.join(notebook_path, f"{page_title}.md")
                    
                    with open(file_path, 'w', encoding='utf-8') as f:
                        f.write(markdown_content)
                
                # Be a good API citizen and avoid throttling
                time.sleep(0.5)

if __name__ == "__main__":
    start_migration()

Pro Tip: OneNote’s HTML for things like tables and embedded files can be incredibly complex. This script’s converter is basic and focuses on text, headers, and lists. For complex pages, expect to do some manual cleanup after the migration. The goal here is to automate 90% of the work, not 100%.

Common Pitfalls

Here’s where I’ve stumbled in the past, so you can avoid it:

  • API Permissions Not Consented: If you get authorization errors, double-check that you clicked “Grant admin consent” in Azure AD for the `Notes.Read.All` permission. It’s easy to miss.
  • API Throttling: If you have thousands of notes, the script might fail with a `429 Too Many Requests` error. The `time.sleep(0.5)` call in the loop helps prevent this, but you might need to increase the delay if you have a massive library.
  • Filename Sanitization: Note titles can contain characters that are invalid for filenames (`/`, `\`, `:`). The `sanitize_filename` function is crucial to prevent the script from crashing when it tries to save a file.

Conclusion

And that’s the core of it. This script provides a solid foundation for liberating your notes from OneNote and moving them into a flexible, plain-text system like Obsidian. You now own your data in a way you didn’t before. It might take a bit of tweaking based on your specific notes, but the heavy lifting is done. You’ve just built a reusable pipeline for your most valuable information.

Happy migrating!

– Darian

Darian Vance - Lead Cloud Architect

Darian Vance

Lead Cloud Architect & DevOps Strategist

With over 12 years in system architecture and automation, Darian specializes in simplifying complex cloud infrastructures. An advocate for open-source solutions, he founded TechResolve to provide engineers with actionable, battle-tested troubleshooting guides and robust software alternatives.


🤖 Frequently Asked Questions

âť“ What are the essential prerequisites for migrating OneNote notebooks to Obsidian using this method?

Key prerequisites include a Microsoft 365 Account, Python 3.8+, an Azure AD Application with `Notes.Read.All` delegated permission granted in Azure AD, and an initialized Obsidian Vault.

âť“ How does this programmatic migration compare to OneNote’s native export features?

This method offers a robust, automated pipeline using the Microsoft Graph API to extract notes directly, converting them to future-proof Markdown. OneNote’s native export options are often clunky and less flexible, typically exporting to proprietary formats or static PDFs.

âť“ What are common technical challenges during the migration and how are they addressed?

Common challenges include unconsented API permissions (requiring ‘Grant admin consent’ for `Notes.Read.All`), API throttling (mitigated by `time.sleep` delays), and invalid filename characters (handled by the `sanitize_filename` function).

Leave a Reply

Discover more from TechResolve - SaaS Troubleshooting & Software Alternatives

Subscribe now to keep reading and get access to the full archive.

Continue reading