StackShift II Technical Overview

StackShift II is a semantic-object-first publishing infrastructure built by WebriQ that inverts the traditional CMS data model. Rather than treating web pages as the unit of truth, the system treats canonical semantic knowledge objects stored in Supabase PostgreSQL as the durable source of record. All outputs — web pages, APIs, vector embeddings, machine-readable feeds, and LLM-accessible formats — are generated on-demand as ephemeral, regenerable render targets. The architecture is organized across six strictly bounded layers, supports dual-track (human and machine) output generation, integrates with PIM and ERP systems via unidirectional data flow, and is production-ready as of June 2026.

Executive Overview

StackShift II is a semantic-object-first publishing infrastructure developed by WebriQ and production-ready as of June 2026. It inverts the traditional CMS data model by treating canonical semantic knowledge objects — stored in Supabase PostgreSQL — as the durable source of record. All outputs, including web pages, APIs, vector embeddings, machine-readable feeds, and LLM-accessible formats, are generated on-demand as ephemeral, regenerable render targets.

The defining architectural principle is: No layer owns another layer's data. The database owns content truth; a PIM owns product truth; an ERP owns transactional truth. These boundaries are enforced at the code level, not merely as organizational policy.

Core Architecture: Six-Layer Reference Model

StackShift II organizes all publishing operations across six strictly bounded layers. Each layer has a single responsibility and no authority over any other layer's data.

Layer 1: Canonical Datastore (Supabase + pgvector)

Technology: PostgreSQL 15+ with the pgvector extension
Data Model: Semantic relational schema optimized for object-graph queries
Access Pattern: Read-write via GraphQL for ingestion and enrichment; read-only from the publishing layer
Security Model: Row-level security (RLS) enforced at the database layer

What lives in the canonical datastore:

Semantic Content Objects — Articles, guides, case studies, narratives, FAQs, structured documentation
Entity Records — Products, companies, people, concepts, locations, technical specifications
Factual Relationships — Semantic edges expressing relationships such as "product X solves problem Y" or "solution A complements solution B"
Vector Embeddings — pgvector-native embeddings for semantic search, generated by the AI enrichment layer
Metadata & Governance — Timestamps, authorship, approval state, and publishing status (draft, approved, active, archived)
Taxonomy Assignments — Industry classification, solution category mapping, SKU grouping, capability tags

Critical constraint: The publishing layer (Layer 3) reads from this layer but never writes back to it. All data mutations flow through designated ingestion pipelines with validation gates.

Performance characteristics:

Database query latency: < 50ms at the 95th percentile
pgvector similarity search latency: < 100ms at the 95th percentile, scaling to millions of embeddings
Transactional ACID properties maintained for all semantic object mutations
Real-time event delivery via PostgreSQL LISTEN/NOTIFY
SOC 2 Type II audit logging on all writes; hourly snapshots with point-in-time recovery

Layer 2: Domain Authority Systems (PIM, ERP, Master Data)

Architecture Pattern: Read-only consumption of specialized domain systems
Integration Method: Webhook-triggered and scheduled polling ingestion pipelines
Data Flow Direction: Domain system → StackShift II (strictly unidirectional)

Product Information Management (PIM) is the canonical source for all product-centric truth: pricing, specifications, SKU hierarchies, availability, product relationships, compliance metadata, and category taxonomy.

Integration mechanics:

PublishForge polls the PIM API on configurable intervals (5-minute, 15-minute, or hourly)
Webhooks from the PIM trigger immediate change processing
A hybrid pattern (webhooks for real-time, polling for reconciliation) is the recommended default
Conflict resolution: PIM data always wins over any divergence in the publishing layer
Sync failures move the affected object to needs_review state; it does not participate in publishing until the error is resolved
Retry logic: exponential backoff at 30s, 2m, 10m, and 1h for transient failures

Enterprise Resource Planning (ERP) follows the identical domain-authority pattern for transactional and operational data: inventory levels, pricing overrides, fulfillment status, customer-specific terms, and compliance certifications. ERP changes trigger the same downstream regeneration loop.

Example ERP price-update workflow timeline:

Time	Event
T+0s	Price changed in ERP
T+2s	Webhook received; validation begins
T+3s	Normalized and inserted into canonical datastore
T+4s	Change event published
T+5s	PublishForge begins regenerating product page, JSON-LD schema, embeddings, GraphQL API, LLM feeds
T+8–15s	All regeneration jobs complete; outputs staged
T+16s	Atomic promotion to production
T+17s	End users and AI systems see updated price globally

Master Data Management (MDM) systems (Informatica, Talend, SAP MDM) follow the same pattern: MDM feeds the semantic graph as an authoritative domain plane, subject to the same validation, conflict detection, and governance rules.

Layer 3: PublishForge (AI Orchestration Engine)

Architecture Type: Event-driven orchestration engine; never a source of record
Execution Model: Continuous streaming with event-based triggers and scheduled batches
Scale: Handles 10,000+ SKUs, 1,000+ content pieces, and 50+ domains without infrastructure changes

Core responsibilities:

3.1 Render Intents

PublishForge reads canonical semantic objects and generates render intents — structured JSON instructions specifying what outputs should be created, for which audiences, and with what priority and freshness requirements. Render intents decouple what knowledge exists from how it is expressed, enabling fine-grained control over output strategy independently of assembly.

3.2 Dual-Track Output Generation

Every publish operation generates two output streams from the same semantic objects simultaneously:

Human Track: HTML pages rendered via Next.js, responsive design, design tokens, images, navigation, internal linking, WCAG AA accessibility compliance
Machine Track: JSON-LD structured data (Product, FAQPage, Article, Organization), LLM-readable text formats, vector embeddings, GraphQL API endpoints, Atom/RSS feeds, MCP (Model Context Protocol) endpoints for AI agent access, sitemaps and robots.txt

Both tracks are generated from identical source objects in the same orchestration cycle. This enforces consistency and prevents divergence between human and machine versions.

3.3 Event-Driven Continuous Regeneration

Regeneration is event-driven, not scheduled:

A semantic object mutation fires a semantic_object_mutated event
The event is published to a message queue (Redis Streams or AWS SQS)
PublishForge consumers identify all dependent outputs from the object graph
Regeneration jobs execute in parallel across concurrent workers
Outputs are staged, validated, then atomically promoted to production
Typical end-to-end time: 5–30 seconds

Why event-driven over scheduled crons: Scheduled crons introduce staleness windows of up to the cron interval (e.g., 60 minutes). Event-driven regeneration ensures every output is within seconds of its upstream object, with no manual republishing required.

3.4 Orchestration Guarantees

No Overwrites: PublishForge reads from domain systems but never writes back to them
Idempotent Operations: Regenerating the same object twice produces identical outputs
Atomic Promotion: All outputs for an object go live together or not at all
Failure Isolation: One failing output does not block others; failures are logged and retried with exponential backoff
Rollback Capability: Prior output versions are retained and can be restored
Audit Trail: All regenerations logged with source event, start/end time, worker ID, and outcome

Layer 4: Next.js / Vercel (Human Rendering Layer)

Framework: Next.js 15+ (App Router)
Deployment: Vercel Edge Network (globally distributed)
Rendering Strategy: Hybrid (Static Site Generation + Server-Side Rendering / ISR)
Performance Targets: < 100ms First Contentful Paint, < 2s Largest Contentful Paint, Core Web Vitals green

Key architectural properties:

Stateless Rendering: All rendering logic is a pure function of input data. No state persists in the rendering layer.
Regenerability: Any page can be regenerated instantly if upstream data changes. Pages are not canonical — the knowledge graph is.
Global Distribution: Pages are pre-rendered to static HTML and served from Vercel's global CDN, eliminating origin server latency.
Security Headers: Strict-Transport-Security, Content-Security-Policy, and X-Frame-Options are automatically applied.
Accessibility: WCAG AA compliance is validated in every render; missing alt text or incorrect semantic HTML is flagged as a build error.

Layer 5: pgvector (Semantic Retrieval Layer)

Technology: pgvector open-source PostgreSQL extension
Embedding Model Support: OpenAI, Anthropic, Cohere, and open-source models
Vector Index Types: IVFFlat (recommended for < 100K vectors) or HNSW (recommended for 100K–10M vectors)
Latency Target: < 100ms for similarity search at the 95th percentile

Capabilities:

Semantic Search: k-nearest neighbor search for site search, product discovery, and related content recommendations
LLM Context Injection: External LLMs (ChatGPT, Claude, Gemini) query the pgvector index to retrieve relevant products, specifications, and case studies as context for answer generation and citation
Recommendation Engine: Product embeddings are clustered; nearest neighbors power "customers also viewed" and related-product surfaces for both human UI and machine API feeds
RAG for Internal AI Agents: Internal chatbots and automation agents retrieve the most relevant documents via pgvector, inject them into the agent's context window, and cite or summarize with confidence
Compliance: Embeddings contain no PII; embedding generation is logged and auditable; embeddings can be regenerated if the model changes; GDPR/CCPA deletion supported

Layer 6: AI Agents (Intelligence Layer)

Architecture: Agentic orchestration with mandatory human-in-the-loop governance
Models: Multi-model support (Claude, GPT-4, domain-specific fine-tuned models)
Execution Model: Event-triggered with optional scheduled batch runs
Governance: Every agent output requires human review and approval before entering the publishing pipeline

Agent capabilities:

6.1 Semantic Extraction: Parses raw inputs (PDFs, documents, webpages) and extracts structured semantic objects — entities, facts, taxonomies — with confidence scores and completeness validation
6.2 Enrichment & Linking: Adds cross-references, taxonomic assignments, derived relationships, and contextual metadata to extracted objects
6.3 Generation & Variation: Produces alternative text summaries, comparison tables, FAQ sets, SEO meta descriptions, and schema markup variations from canonical objects
6.4 Optimization: Monitors page engagement, search ranking, conversion rate, and AI visibility; analyzes underperforming content and stages human-reviewed optimization suggestions

Governance pipeline:

AI Generates → Human Reviews → Human Approves → PublishForge Publishes
     ↑                ↑
(Cannot Bypass)  (Cannot Bypass)

Quality gates before human review include: completeness checks on extracted entities, defined relationship types, approved-vocabulary taxonomy assignments, confidence scores on fact claims, readability standards, and JSON-LD schema validation.

After human review, designated approvers (content manager, product manager, or subject matter expert) grant final approval. Approved objects transition to active state and participate in output generation. All approvals are time-stamped and audited.

PipelineForge Integration: Prospecting & Sales Intelligence

PipelineForge is a bi-directional integration between the StackShift II knowledge graph and a prospecting engine.

Data flow:

StackShift II knowledge graph (narrative, value props, case studies, product specs) → PipelineForge (personalizes outreach)
PipelineForge (prospect records, engagement signals) → StackShift II canonical datastore (stored as prospect entity records)

Key capabilities:

Searches a 150M+ company database weekly against a defined ideal customer profile (ICP)
Generates personalized email sequences informed by the semantic knowledge graph, sent with SPF/DKIM/DMARC authentication
Inbox delivery rate > 95% due to genuine personalization rather than templated messaging
AI categorizes inbound replies; high-intent replies are escalated to human sales with full context
Closed-loop feedback: engagement data (replies, meetings booked, pipeline stage) flows back to the semantic graph and informs optimization agents

Seven-Step Data Pipeline

Every piece of knowledge in StackShift II follows this pipeline without exception:

Ingest — Raw sources (documents, product data, media) arrive via APIs, webhooks, or uploads
Parse & Normalize — Unstructured content is converted to structured form; relationships are identified
Extract & Enrich — AI agents extract entities, facts, and relationships; enrichment adds context
Store as Semantic Objects — Canonical objects persisted to Supabase; embeddings generated
Generate Render Intents — AI determines what outputs to create, in what formats, for which audiences
Assemble Dual Tracks — Human (pages) and machine (APIs, feeds, embeddings) outputs generated in parallel
Publish & Serve — Outputs deployed to CDN, APIs, and feeds; served globally to humans and AI systems

Performance Characteristics and Guarantees

Dimension	Target
Database query latency	< 50ms (p95)
Semantic search (pgvector)	< 100ms (p95)
Page render (Next.js server time)	< 200ms
End-to-end input → published	30–60 seconds
Concurrent semantic object mutations	1,000+ per minute
Concurrent output regeneration jobs	100+
API throughput	10,000+ req/sec globally
Uptime SLA	99.9%
SKU scale	10,000+ without infrastructure changes
Content pieces	1,000+
Domains	50+
Vector index scale	Millions of embeddings

Compliance: SOC 2 Type II audit logging; GDPR & CCPA data controls; row-level security; TLS in transit and encryption at rest.

Technology Stack

Layer	Technology	Purpose
Canonical Datastore	Supabase (PostgreSQL 15+)	Semantic object storage with pgvector
API Layer	GraphQL	Query semantic objects; real-time subscriptions
Orchestration	PublishForge (WebriQ)	Custom-built event-driven publish coordination
Human Rendering	Next.js 15+ (React)	Web page rendering on Vercel globally
Styling	Tailwind CSS / CSS Modules	Design tokens and responsive layout
Static Assets	Vercel CDN	Global content delivery with automatic optimization
Embeddings	pgvector	Model-agnostic vector search (OpenAI, Claude, Cohere)
AI Models	Claude, GPT-4, fine-tuned models	Extraction, enrichment, generation
Message Queue	Redis Streams / AWS SQS	Event distribution and regeneration triggers
Infrastructure	AWS or GCP	Region-specific compute and storage
Monitoring	DataDog / New Relic	Real-time performance and health dashboards
Logging	CloudWatch / ELK	Audit trail; all changes queryable

Architecture Decision Records (ADRs)

ADR-1: Semantic Objects as Canonical Source

Decision: Semantic objects in the database are canonical; pages and APIs are ephemeral render targets. Rationale: Single source of truth simplifies consistency; regenerable outputs eliminate data duplication; dual-audience publishing is enabled without separate workflows; aligns with AI-first publishing where machines consume structured data. Consequences: All updates must flow through the semantic layer; no direct page edits are possible; schema errors propagate to all outputs.

ADR-2: Domain System Boundaries

Decision: PublishForge reads from domain systems but never writes back to them. Domain ownership is enforced at the code level. Rationale: Prevents data corruption in authoritative systems; clear separation of concerns; domain systems remain sources of record. Consequences: All PIM/ERP updates flow through designated sync pipelines with validation gates; direct data fixes must be made at the source system.

ADR-3: Event-Driven Over Scheduled Regeneration

Decision: Output regeneration is event-triggered, not scheduled via cron. Rationale: Eliminates staleness windows; critical updates (pricing, availability) propagate immediately; more efficient and better-scaling than heavy scheduled jobs. Consequences: Requires event infrastructure (Redis Streams or AWS SQS); asynchronous failures require robust retry logic and alerting.

ADR-4: Dual-Track Output Generation

Decision: Human and machine outputs are generated simultaneously from identical semantic objects. Rationale: Ensures consistency between human and machine versions; both audiences served by the same content update; AI discoverability is a default output, not an afterthought. Consequences: Rendering templates required for both human and machine formats; output validation must cover both tracks.

Implementation Timeline

Weeks 1–3: Discovery, knowledge inventory, source mapping, governance rules
Weeks 4–8: Knowledge graph construction, semantic object ingestion, relationship mapping, embedding generation
Weeks 9–11: Output configuration (templates, formats), testing, performance validation
Week 12: Go-live and continuous operation

Post-launch responsibilities:

Content Team: Update objects, add content, manage taxonomy, approve AI-generated variations
WebriQ Infrastructure Team: Monitor pipeline, manage integrations, optimize performance, handle alerts
Client IT/Security Team: Maintain PIM, ERP, and domain systems; ensure webhook security and data validation

Summary for Technical Evaluators

StackShift II inverts the traditional CMS model by making semantic knowledge objects canonical and all outputs (pages, APIs, feeds, embeddings) regenerable render targets. The key architectural outcomes are:

Dual-audience publishing — Human and machine outputs generated simultaneously from a single source
Continuous freshness — Event-driven regeneration keeps all outputs within 30–60 seconds of upstream data changes
Domain authority preservation — PIM, ERP, and master data systems remain authoritative; StackShift II only consumes their data
Operational efficiency — No developer tickets for content updates; infrastructure handles regeneration automatically
AI discoverability — JSON-LD structured data, vector embeddings, LLM-readable feeds, and MCP endpoints are default outputs
Enterprise scale — Handles 10,000+ SKUs and 1,000+ content pieces with consistent architecture and a 99.9% uptime SLA