StackShift II Technical Overview
StackShift II is a semantic-object-first publishing infrastructure built by WebriQ that inverts the traditional CMS data model. Rather than treating web pages as the unit of truth, the system treats canonical semantic knowledge objects stored in Supabase PostgreSQL as the durable source of record. All outputs — web pages, APIs, vector embeddings, machine-readable feeds, and LLM-accessible formats — are generated on-demand as ephemeral, regenerable render targets. The architecture is organized across six strictly bounded layers, supports dual-track (human and machine) output generation, integrates with PIM and ERP systems via unidirectional data flow, and is production-ready as of June 2026.
Executive Overview
StackShift II is a semantic-object-first publishing infrastructure developed by WebriQ and production-ready as of June 2026. It inverts the traditional CMS data model by treating canonical semantic knowledge objects — stored in Supabase PostgreSQL — as the durable source of record. All outputs, including web pages, APIs, vector embeddings, machine-readable feeds, and LLM-accessible formats, are generated on-demand as ephemeral, regenerable render targets.
The defining architectural principle is: No layer owns another layer's data. The database owns content truth; a PIM owns product truth; an ERP owns transactional truth. These boundaries are enforced at the code level, not merely as organizational policy.
Core Architecture: Six-Layer Reference Model
StackShift II organizes all publishing operations across six strictly bounded layers. Each layer has a single responsibility and no authority over any other layer's data.
Layer 1: Canonical Datastore (Supabase + pgvector)
- Technology: PostgreSQL 15+ with the pgvector extension
- Data Model: Semantic relational schema optimized for object-graph queries
- Access Pattern: Read-write via GraphQL for ingestion and enrichment; read-only from the publishing layer
- Security Model: Row-level security (RLS) enforced at the database layer
What lives in the canonical datastore:
- Semantic Content Objects — Articles, guides, case studies, narratives, FAQs, structured documentation
- Entity Records — Products, companies, people, concepts, locations, technical specifications
- Factual Relationships — Semantic edges expressing relationships such as "product X solves problem Y" or "solution A complements solution B"
- Vector Embeddings — pgvector-native embeddings for semantic search, generated by the AI enrichment layer
- Metadata & Governance — Timestamps, authorship, approval state, and publishing status (
draft,approved,active,archived) - Taxonomy Assignments — Industry classification, solution category mapping, SKU grouping, capability tags
Critical constraint: The publishing layer (Layer 3) reads from this layer but never writes back to it. All data mutations flow through designated ingestion pipelines with validation gates.
Performance characteristics:
- Database query latency: < 50ms at the 95th percentile
- pgvector similarity search latency: < 100ms at the 95th percentile, scaling to millions of embeddings
- Transactional ACID properties maintained for all semantic object mutations
- Real-time event delivery via PostgreSQL LISTEN/NOTIFY
- SOC 2 Type II audit logging on all writes; hourly snapshots with point-in-time recovery
Layer 2: Domain Authority Systems (PIM, ERP, Master Data)
- Architecture Pattern: Read-only consumption of specialized domain systems
- Integration Method: Webhook-triggered and scheduled polling ingestion pipelines
- Data Flow Direction: Domain system → StackShift II (strictly unidirectional)
Product Information Management (PIM) is the canonical source for all product-centric truth: pricing, specifications, SKU hierarchies, availability, product relationships, compliance metadata, and category taxonomy.
Integration mechanics:
- PublishForge polls the PIM API on configurable intervals (5-minute, 15-minute, or hourly)
- Webhooks from the PIM trigger immediate change processing
- A hybrid pattern (webhooks for real-time, polling for reconciliation) is the recommended default
- Conflict resolution: PIM data always wins over any divergence in the publishing layer
- Sync failures move the affected object to
needs_reviewstate; it does not participate in publishing until the error is resolved - Retry logic: exponential backoff at 30s, 2m, 10m, and 1h for transient failures
Enterprise Resource Planning (ERP) follows the identical domain-authority pattern for transactional and operational data: inventory levels, pricing overrides, fulfillment status, customer-specific terms, and compliance certifications. ERP changes trigger the same downstream regeneration loop.
Example ERP price-update workflow timeline:
| Time | Event |
|---|---|
| T+0s | Price changed in ERP |
| T+2s | Webhook received; validation begins |
| T+3s | Normalized and inserted into canonical datastore |
| T+4s | Change event published |
| T+5s | PublishForge begins regenerating product page, JSON-LD schema, embeddings, GraphQL API, LLM feeds |
| T+8–15s | All regeneration jobs complete; outputs staged |
| T+16s | Atomic promotion to production |
| T+17s | End users and AI systems see updated price globally |
Master Data Management (MDM) systems (Informatica, Talend, SAP MDM) follow the same pattern: MDM feeds the semantic graph as an authoritative domain plane, subject to the same validation, conflict detection, and governance rules.
Layer 3: PublishForge (AI Orchestration Engine)
- Architecture Type: Event-driven orchestration engine; never a source of record
- Execution Model: Continuous streaming with event-based triggers and scheduled batches
- Scale: Handles 10,000+ SKUs, 1,000+ content pieces, and 50+ domains without infrastructure changes
Core responsibilities:
3.1 Render Intents
PublishForge reads canonical semantic objects and generates render intents — structured JSON instructions specifying what outputs should be created, for which audiences, and with what priority and freshness requirements. Render intents decouple what knowledge exists from how it is expressed, enabling fine-grained control over output strategy independently of assembly.
3.2 Dual-Track Output Generation
Every publish operation generates two output streams from the same semantic objects simultaneously:
- Human Track: HTML pages rendered via Next.js, responsive design, design tokens, images, navigation, internal linking, WCAG AA accessibility compliance
- Machine Track: JSON-LD structured data (
Product,FAQPage,Article,Organization), LLM-readable text formats, vector embeddings, GraphQL API endpoints, Atom/RSS feeds, MCP (Model Context Protocol) endpoints for AI agent access, sitemaps and robots.txt
Both tracks are generated from identical source objects in the same orchestration cycle. This enforces consistency and prevents divergence between human and machine versions.
3.3 Event-Driven Continuous Regeneration
Regeneration is event-driven, not scheduled:
- A semantic object mutation fires a
semantic_object_mutatedevent - The event is published to a message queue (Redis Streams or AWS SQS)
- PublishForge consumers identify all dependent outputs from the object graph
- Regeneration jobs execute in parallel across concurrent workers
- Outputs are staged, validated, then atomically promoted to production
- Typical end-to-end time: 5–30 seconds
Why event-driven over scheduled crons: Scheduled crons introduce staleness windows of up to the cron interval (e.g., 60 minutes). Event-driven regeneration ensures every output is within seconds of its upstream object, with no manual republishing required.
3.4 Orchestration Guarantees
- No Overwrites: PublishForge reads from domain systems but never writes back to them
- Idempotent Operations: Regenerating the same object twice produces identical outputs
- Atomic Promotion: All outputs for an object go live together or not at all
- Failure Isolation: One failing output does not block others; failures are logged and retried with exponential backoff
- Rollback Capability: Prior output versions are retained and can be restored
- Audit Trail: All regenerations logged with source event, start/end time, worker ID, and outcome
Layer 4: Next.js / Vercel (Human Rendering Layer)
- Framework: Next.js 15+ (App Router)
- Deployment: Vercel Edge Network (globally distributed)
- Rendering Strategy: Hybrid (Static Site Generation + Server-Side Rendering / ISR)
- Performance Targets: < 100ms First Contentful Paint, < 2s Largest Contentful Paint, Core Web Vitals green
Key architectural properties:
- Stateless Rendering: All rendering logic is a pure function of input data. No state persists in the rendering layer.
- Regenerability: Any page can be regenerated instantly if upstream data changes. Pages are not canonical — the knowledge graph is.
- Global Distribution: Pages are pre-rendered to static HTML and served from Vercel's global CDN, eliminating origin server latency.
- Security Headers: Strict-Transport-Security, Content-Security-Policy, and X-Frame-Options are automatically applied.
- Accessibility: WCAG AA compliance is validated in every render; missing alt text or incorrect semantic HTML is flagged as a build error.
Layer 5: pgvector (Semantic Retrieval Layer)
- Technology: pgvector open-source PostgreSQL extension
- Embedding Model Support: OpenAI, Anthropic, Cohere, and open-source models
- Vector Index Types: IVFFlat (recommended for < 100K vectors) or HNSW (recommended for 100K–10M vectors)
- Latency Target: < 100ms for similarity search at the 95th percentile
Capabilities:
- Semantic Search: k-nearest neighbor search for site search, product discovery, and related content recommendations
- LLM Context Injection: External LLMs (ChatGPT, Claude, Gemini) query the pgvector index to retrieve relevant products, specifications, and case studies as context for answer generation and citation
- Recommendation Engine: Product embeddings are clustered; nearest neighbors power "customers also viewed" and related-product surfaces for both human UI and machine API feeds
- RAG for Internal AI Agents: Internal chatbots and automation agents retrieve the most relevant documents via pgvector, inject them into the agent's context window, and cite or summarize with confidence
- Compliance: Embeddings contain no PII; embedding generation is logged and auditable; embeddings can be regenerated if the model changes; GDPR/CCPA deletion supported
Layer 6: AI Agents (Intelligence Layer)
- Architecture: Agentic orchestration with mandatory human-in-the-loop governance
- Models: Multi-model support (Claude, GPT-4, domain-specific fine-tuned models)
- Execution Model: Event-triggered with optional scheduled batch runs
- Governance: Every agent output requires human review and approval before entering the publishing pipeline
Agent capabilities:
- 6.1 Semantic Extraction: Parses raw inputs (PDFs, documents, webpages) and extracts structured semantic objects — entities, facts, taxonomies — with confidence scores and completeness validation
- 6.2 Enrichment & Linking: Adds cross-references, taxonomic assignments, derived relationships, and contextual metadata to extracted objects
- 6.3 Generation & Variation: Produces alternative text summaries, comparison tables, FAQ sets, SEO meta descriptions, and schema markup variations from canonical objects
- 6.4 Optimization: Monitors page engagement, search ranking, conversion rate, and AI visibility; analyzes underperforming content and stages human-reviewed optimization suggestions
Governance pipeline:
AI Generates → Human Reviews → Human Approves → PublishForge Publishes
↑ ↑
(Cannot Bypass) (Cannot Bypass)
Quality gates before human review include: completeness checks on extracted entities, defined relationship types, approved-vocabulary taxonomy assignments, confidence scores on fact claims, readability standards, and JSON-LD schema validation.
After human review, designated approvers (content manager, product manager, or subject matter expert) grant final approval. Approved objects transition to active state and participate in output generation. All approvals are time-stamped and audited.
PipelineForge Integration: Prospecting & Sales Intelligence
PipelineForge is a bi-directional integration between the StackShift II knowledge graph and a prospecting engine.
Data flow:
- StackShift II knowledge graph (narrative, value props, case studies, product specs) → PipelineForge (personalizes outreach)
- PipelineForge (prospect records, engagement signals) → StackShift II canonical datastore (stored as prospect entity records)
Key capabilities:
- Searches a 150M+ company database weekly against a defined ideal customer profile (ICP)
- Generates personalized email sequences informed by the semantic knowledge graph, sent with SPF/DKIM/DMARC authentication
- Inbox delivery rate > 95% due to genuine personalization rather than templated messaging
- AI categorizes inbound replies; high-intent replies are escalated to human sales with full context
- Closed-loop feedback: engagement data (replies, meetings booked, pipeline stage) flows back to the semantic graph and informs optimization agents
Seven-Step Data Pipeline
Every piece of knowledge in StackShift II follows this pipeline without exception:
- Ingest — Raw sources (documents, product data, media) arrive via APIs, webhooks, or uploads
- Parse & Normalize — Unstructured content is converted to structured form; relationships are identified
- Extract & Enrich — AI agents extract entities, facts, and relationships; enrichment adds context
- Store as Semantic Objects — Canonical objects persisted to Supabase; embeddings generated
- Generate Render Intents — AI determines what outputs to create, in what formats, for which audiences
- Assemble Dual Tracks — Human (pages) and machine (APIs, feeds, embeddings) outputs generated in parallel
- Publish & Serve — Outputs deployed to CDN, APIs, and feeds; served globally to humans and AI systems
Performance Characteristics and Guarantees
| Dimension | Target |
|---|---|
| Database query latency | < 50ms (p95) |
| Semantic search (pgvector) | < 100ms (p95) |
| Page render (Next.js server time) | < 200ms |
| End-to-end input → published | 30–60 seconds |
| Concurrent semantic object mutations | 1,000+ per minute |
| Concurrent output regeneration jobs | 100+ |
| API throughput | 10,000+ req/sec globally |
| Uptime SLA | 99.9% |
| SKU scale | 10,000+ without infrastructure changes |
| Content pieces | 1,000+ |
| Domains | 50+ |
| Vector index scale | Millions of embeddings |
Compliance: SOC 2 Type II audit logging; GDPR & CCPA data controls; row-level security; TLS in transit and encryption at rest.
Technology Stack
| Layer | Technology | Purpose |
|---|---|---|
| Canonical Datastore | Supabase (PostgreSQL 15+) | Semantic object storage with pgvector |
| API Layer | GraphQL | Query semantic objects; real-time subscriptions |
| Orchestration | PublishForge (WebriQ) | Custom-built event-driven publish coordination |
| Human Rendering | Next.js 15+ (React) | Web page rendering on Vercel globally |
| Styling | Tailwind CSS / CSS Modules | Design tokens and responsive layout |
| Static Assets | Vercel CDN | Global content delivery with automatic optimization |
| Embeddings | pgvector | Model-agnostic vector search (OpenAI, Claude, Cohere) |
| AI Models | Claude, GPT-4, fine-tuned models | Extraction, enrichment, generation |
| Message Queue | Redis Streams / AWS SQS | Event distribution and regeneration triggers |
| Infrastructure | AWS or GCP | Region-specific compute and storage |
| Monitoring | DataDog / New Relic | Real-time performance and health dashboards |
| Logging | CloudWatch / ELK | Audit trail; all changes queryable |
Architecture Decision Records (ADRs)
ADR-1: Semantic Objects as Canonical Source
Decision: Semantic objects in the database are canonical; pages and APIs are ephemeral render targets. Rationale: Single source of truth simplifies consistency; regenerable outputs eliminate data duplication; dual-audience publishing is enabled without separate workflows; aligns with AI-first publishing where machines consume structured data. Consequences: All updates must flow through the semantic layer; no direct page edits are possible; schema errors propagate to all outputs.
ADR-2: Domain System Boundaries
Decision: PublishForge reads from domain systems but never writes back to them. Domain ownership is enforced at the code level. Rationale: Prevents data corruption in authoritative systems; clear separation of concerns; domain systems remain sources of record. Consequences: All PIM/ERP updates flow through designated sync pipelines with validation gates; direct data fixes must be made at the source system.
ADR-3: Event-Driven Over Scheduled Regeneration
Decision: Output regeneration is event-triggered, not scheduled via cron. Rationale: Eliminates staleness windows; critical updates (pricing, availability) propagate immediately; more efficient and better-scaling than heavy scheduled jobs. Consequences: Requires event infrastructure (Redis Streams or AWS SQS); asynchronous failures require robust retry logic and alerting.
ADR-4: Dual-Track Output Generation
Decision: Human and machine outputs are generated simultaneously from identical semantic objects. Rationale: Ensures consistency between human and machine versions; both audiences served by the same content update; AI discoverability is a default output, not an afterthought. Consequences: Rendering templates required for both human and machine formats; output validation must cover both tracks.
Implementation Timeline
- Weeks 1–3: Discovery, knowledge inventory, source mapping, governance rules
- Weeks 4–8: Knowledge graph construction, semantic object ingestion, relationship mapping, embedding generation
- Weeks 9–11: Output configuration (templates, formats), testing, performance validation
- Week 12: Go-live and continuous operation
Post-launch responsibilities:
- Content Team: Update objects, add content, manage taxonomy, approve AI-generated variations
- WebriQ Infrastructure Team: Monitor pipeline, manage integrations, optimize performance, handle alerts
- Client IT/Security Team: Maintain PIM, ERP, and domain systems; ensure webhook security and data validation
Summary for Technical Evaluators
StackShift II inverts the traditional CMS model by making semantic knowledge objects canonical and all outputs (pages, APIs, feeds, embeddings) regenerable render targets. The key architectural outcomes are:
- Dual-audience publishing — Human and machine outputs generated simultaneously from a single source
- Continuous freshness — Event-driven regeneration keeps all outputs within 30–60 seconds of upstream data changes
- Domain authority preservation — PIM, ERP, and master data systems remain authoritative; StackShift II only consumes their data
- Operational efficiency — No developer tickets for content updates; infrastructure handles regeneration automatically
- AI discoverability — JSON-LD structured data, vector embeddings, LLM-readable feeds, and MCP endpoints are default outputs
- Enterprise scale — Handles 10,000+ SKUs and 1,000+ content pieces with consistent architecture and a 99.9% uptime SLA