🚀 Executive Summary
TL;DR: Relying solely on a single product identifier like UPC often leads to critical system failures due to the diverse nature of real-world product data. Solutions range from quick ‘Identifier Alias’ tables to permanent schema evolution using JSONB, or even a dedicated ‘Product Master Service’ for systemic architectural overhauls.
🎯 Key Takeaways
- A single product identifier (e.g., UPC) is insufficient for modern commerce due to varied standards like MPN, ISBN, EAN, and vendor-specific SKUs, leading to rigid data models and potential system failures.
- The ‘Identifier Alias’ table is a quick, tactical patch that links various identifier strings to a canonical `product_id`, useful for immediate crisis resolution but introduces significant technical debt.
- The ‘Schema Evolution’ approach, particularly using a JSONB column in the main `products` table, offers a permanent, flexible solution to store multiple identifiers (UPC, MPN, EAN) as first-class citizens, simplifying application logic and future extensibility.
- For widespread, systemic product identity issues across multiple enterprise systems, a ‘Product Master Service’ provides a dedicated microservice as the single source of truth, requiring a major architectural undertaking.
- Implementing a GIN index on a JSONB column (e.g., `idx_products_identifiers ON products USING GIN (identifiers)`) is crucial for fast lookups when using the schema evolution approach.
Struggling with product identification beyond the standard UPC? Learn how to add extra unique identifiers like MPN or custom SKUs to your system with three real-world solutions, from a quick database patch to a full architectural overhaul.
Beyond the Barcode: Adding More Unique Product Identifiers When UPC Isn’t Enough
I still remember the 2 AM PagerDuty alert. A critical data import job, one that fed our entire pricing engine, was failing with a cascade of foreign key violations. After a frantic half-hour of digging through logs on `prod-etl-worker-03`, I found the culprit. One of our biggest suppliers had decided, without warning, to start sending us Manufacturer Part Numbers (MPNs) in the same field they’d been sending UPCs for five years. Our entire system, built on the sacred assumption that `product.upc` was the one true unique key, was grinding to a halt. We were losing money with every minute of downtime. That night, hunched over a shell on `prod-db-01`, I learned a hard lesson: relying on a single product identifier is a time bomb waiting to go off.
The “Why”: The Myth of the Single Identifier
We all start out with good intentions. When you’re first building a system, using a single, well-known standard like a UPC as your primary key seems clean, simple, and correct. The problem is that the real world of commerce is anything but clean. You’ve got:
- Suppliers who use their own internal SKU system.
- Manufacturers who use MPNs.
- Books that use ISBNs.
- European partners who use EANs.
- Products sold in “packs” or “cases” that have their own barcode, different from the individual item inside.
The root cause of this problem isn’t bad data; it’s a rigid data model that fails to reflect the complexity of the real world. Your schema assumes one identifier to rule them all, but reality is a messy federation of different standards.
The Solutions: From Duct Tape to a New Foundation
So, you’re in this mess. How do you get out? I’ve seen this movie before, and here are the three main ways to fix it, ranging from “get me through the night” to “let’s fix this for good.”
Solution 1: The Quick Fix (The ‘Identifier Alias’ Table)
This is the “stop the bleeding” approach. It’s not elegant, but it will get your system back online fast, and you can implement it without a massive code deployment. The idea is to offload the complexity into a separate lookup table.
You create a new table, let’s call it product_identifiers, that links various identifier strings back to your canonical product_id.
-- Your existing, problematic table
CREATE TABLE products (
id SERIAL PRIMARY KEY,
upc VARCHAR(12) UNIQUE NOT NULL, -- The source of our pain!
name VARCHAR(255),
...
);
-- The quick-fix solution table
CREATE TABLE product_identifiers (
id SERIAL PRIMARY KEY,
product_id INTEGER NOT NULL REFERENCES products(id),
identifier_type VARCHAR(20) NOT NULL, -- e.g., 'UPC', 'MPN', 'EAN', 'VENDOR_SKU'
identifier_value VARCHAR(100) NOT NULL,
UNIQUE (identifier_type, identifier_value) -- Critical for lookups!
);
How it works: You modify your data import logic. When a new identifier comes in, instead of trying to match it against products.upc, you query product_identifiers. If you find a match, you get the correct internal product_id. If not, you handle it as a new product. It’s a tactical patch that isolates the problem.
Warning: This is tech debt. You’re adding a layer of indirection and complexity. Your application code now needs to be aware of this second table for all product lookups. Use this to get out of a crisis, but have a plan to move to a more permanent solution.
Solution 2: The Permanent Fix (Schema Evolution)
This is the “eat your vegetables” approach. It’s about fixing the core data model to properly reflect reality. Instead of treating the UPC as a special snowflake, you treat all identifiers as first-class citizens. My preferred way to do this, especially if you’re using a database like PostgreSQL, is with a flexible JSONB column.
You alter your main products table to store a collection of identifiers.
-- Step 1: Add the new flexible column
ALTER TABLE products ADD COLUMN identifiers JSONB;
-- Step 2: (Carefully, in a migration) Backfill the new column from the old one
UPDATE products SET identifiers = jsonb_build_object('UPC', upc);
-- Step 3: Create a GIN index for fast lookups inside the JSONB
CREATE INDEX idx_products_identifiers ON products USING GIN (identifiers);
-- Step 4: (Eventually) Drop the old, rigid column
ALTER TABLE products DROP COLUMN upc;
A product’s data would now look something like this in the identifiers column:
{
"UPC": "123456789012",
"MPN": "ABC-XYZ-123",
"EAN": "9876543210987",
"VENDOR_SKU_12": "WIDGET-BLUE"
}
This is a much cleaner, more future-proof model. Your application logic is simplified because all identifiers live in one place on the product record. Adding a new type of identifier in the future doesn’t require another schema change.
| Aspect | Quick Fix (Alias Table) | Permanent Fix (JSONB) |
|---|---|---|
| Speed to Implement | High (Can be done in hours) | Low (Requires careful migration & code changes) |
| Tech Debt | High | Low |
| Query Complexity | Increases (Requires JOINs) | Decreases (Querying a single table) |
| Flexibility | Moderate | High |
Solution 3: The ‘Nuclear’ Option (A Dedicated Product Master Service)
Sometimes, the problem isn’t just in one database; it’s systemic. Your e-commerce site, your warehouse management system, and your finance platform all have their own slightly different versions of the “product.” In this scenario, patching one table isn’t enough. You need to pull product data out into its own dedicated microservice.
This “Product Master Service” becomes the single source of truth for all product information in the entire company. Its sole job is to ingest product data from all sources, de-duplicate it, and provide a clean, canonical API for every other service to use.
This approach involves:
- Building a new, standalone service with its own database.
- Defining a clear API (e.g.,
GET /products?mpn=ABC-XYZ-123) for looking up products by any identifier. - Migrating all other services in your ecosystem to stop using their local product tables and call this new service instead.
Pro Tip: This is a massive architectural undertaking. Don’t even think about this unless the problem of product identity is causing widespread, repeated, and expensive problems across multiple teams. This isn’t a weekend project; it’s a multi-quarter strategic initiative. But when you need it, it’s the only thing that will truly solve the problem at scale.
At the end of the day, that 2 AM failure was a gift. It forced us to confront a flawed assumption we’d been living with for years. We started with Solution 1 to get through the week, but we immediately planned and executed Solution 2 over the next sprint. It was a pain, but it made our system more resilient, and I haven’t had a PagerDuty alert for that import job since.
🤖 Frequently Asked Questions
âť“ Why is relying on a single product identifier problematic in e-commerce systems?
Relying on a single identifier like UPC is problematic because real-world commerce involves diverse identifier types such as Manufacturer Part Numbers (MPNs), International Standard Book Numbers (ISBNs), European Article Numbers (EANs), and vendor-specific SKUs, leading to data import failures and system rigidity.
âť“ How do the ‘Identifier Alias’ table and ‘Schema Evolution’ (JSONB) solutions compare?
The ‘Identifier Alias’ table is a quick fix with high implementation speed but high technical debt, increasing query complexity. ‘Schema Evolution’ using JSONB is a permanent fix with lower implementation speed (due to migration) but low technical debt, decreasing query complexity by storing all identifiers in a single, flexible column on the product record.
âť“ What is a common implementation pitfall when using the ‘Identifier Alias’ table solution?
A common pitfall is accumulating significant technical debt. This solution adds a layer of indirection, requiring application code to be aware of the separate `product_identifiers` table for all product lookups, rather than addressing the fundamental flaw in the core data model.
Leave a Reply