Recommended PostgreSQL or MySQL schema if you send site analytics through BatchPipe into your own database.
To create pipes, Postgres destinations, and limits from your own backend, use the management automation contract (narrative + OpenAPI) and optionally the JavaScript SDK API reference.
Yes, this split is very common. A pageview is one row with full context: URL, IP, user, session, device, browser vs document language, acquisition parameters, and anything else you slice reports by.
An event is usually something that happens after that load on the same page: clicks, scroll depth, form steps, checkout steps, and so on. You normally expect many events per pageview, all pointing at the same pageview_id.
Product analytics (page + event stream), ad tech (impression + downstream events), and tools like Google Analytics (page / screen vs custom events) follow variants of this idea. Events with no page context can leave pageview_id null.
Use a partition key such as workspace_id. The example DDL uses nullable TEXT so you can shape it however you want (single workspace, per-site, account-wide, or hierarchical identifiers). It does not need to reference BatchPipe unless you add a foreign key.
BatchPipe’s website analytics model is five tables: two append-only fact tables for the page + event stream, and three funnel tables for CMS-managed definitions and idempotent daily step counts (see Funnel metadata and daily stats).
pageview — one row per page (or screen); every dimension you keep is a typed column (no JSON catch-all).page_event — one row per downstream action; usually pageview_id points at the pageview.funnel — one row per funnel per workspace_id; scope, counting mode, active flag.funnel_step — ordered steps (pageview match or page_event_type match, optional match_extra).funnel_daily_step_stats — one row per funnel step per calendar day: distinct actors through each step; your job fills this from facts + steps (idempotent upsert or replace per day).pageview
Table name is pageview; primary key pageview_id. Dimension columns use the page_ prefix so stays readable without repeating “website_impression”.
page_url
The URL for this pageview: what was actually loaded (often window.location.href), including query string and hash if you keep them here. Use page_path / page_query_string for reporting if you strip tracking noise server-side.
A separate canonical URL (<link rel="canonical">, CMS “primary” URL) is only worth a column if you actually group reports by that value to merge duplicate URL variants. Many teams dedupe using normalized path or rules in the pipeline instead; this guide’s DDL omits page_canonical_url—add page_canonical_url TEXT when you need it.
That is the same thing as the first pageview in a session: the page_url on the row with the earliest page_occurred_ts for a given page_session_id (per workspace_id if sessions are scoped that way). You do not need a separate landing URL column unless you want denormalization (same value copied onto every pageview in the visit so simple queries and exports do not need a window function or a join). This guide’s DDL omits that column; add page_landing_page_url yourself if your stack benefits from the duplicate.
On the web these fields are often inferred (probabilistic models, enrichment vendors, signed-in profile you already store) rather than asked on every pageview. prefer_not_to_say belongs in explicit surveys, not in an inference pipeline—when you have no usable signal, use unknown or leave the column null. Use the canonical string values below so reports stay comparable; add a CHECK constraint or application validation if you want to enforce the lists. Inference and profiling still need a lawful basis and clear disclosure—follow your privacy policy and applicable law.
page_age_bracket — one of: under_18, 18_24, 25_34, 35_44, 45_54, 55_64, 65_plus, unknown (no confident bracket). These bands match the usual digital-audience / census-style groupings. If users declare age in a form, map into the same brackets or add separate declared fields—do not overload inference with survey-only tokens.page_gender — for coarse inference or legacy ad-tech style signals, use only: female, male, unknown. Inferred pipelines rarely justify non_binary or other; those are meaningful when the user explicitly chooses them—if you collect declared gender elsewhere, use a different column or enum and keep this one for the coarse three-way signal (or leave it null).page_referrer (prior document URL when the browser sends it—often stripped for cross-origin). Host and path are derivable from that string; store them separately only if you want simpler GROUP BY or indexes without parsing in every query (common for “traffic by referring domain”).page_user_id, anonymous id, session id; optional page_age_bracket and page_gender when you infer or copy from profile (canonical values above).X-Forwarded-For), optional request/correlation id.page_browser_languages (browser’s preferred languages, e.g. Accept-Language or navigator.languages joined), page_document_language (declared language of the document, e.g. <html lang>), timezone, page host, protocol.page_time_to_first_byte_ms, page_dom_content_loaded_ms, page_load_complete_ms (from PerformanceNavigationTiming). Core Web Vitals–style snapshots: page_lcp_ms (largest contentful paint), page_cls (cumulative layout shift sum for the page so far), page_inp_ms (interaction to next paint—usually filled from your own observer or on send-beacon, not at first paint). Counters: page_js_error_count, page_resource_error_count (failed scripts/images from error on window / PerformanceObserver for resource failures). For stack traces and one-off failures, still emit page_event rows (e.g. js_error). Optional extras many teams add later: page_first_byte_ms-style server-side marks, paint times (first-paint), or connection hints (navigator.connection)—omit from the minimal DDL if you do not need them.page_session_id unless you add a denormalized landing column.page_* columns (experiment variant, etc.) when you need them—see DDL examples below.
page_occurred_ts is when the view happened. For IP, consider PostgreSQL INET instead of TEXT if you validate addresses; in MySQL VARBINARY(16) or a validated string type is common.
page_event
One row per interaction—clicks, conversions, errors, scroll milestones, checkout steps, and so on. pageview_id points at the page when the event happened there; leave it null only when there is no page (API, worker). Session and other visit context live on the linked pageview row—see “IDs, pageview_id, and relationships” below.
page_event_type is the primary classifier (for example button_click, ecommerce.purchase). Use optional page_event_label for a human-readable title. Everything else about that occurrence is typed columns on the same row: monetary value, item line, click target, form field, scroll milestone, duration, errors, and so on.
The DDL below includes common optional columns; drop what you do not use and add columns per event shape as your product evolves. If something is not a page-level dimension and not a separate event row, it probably does not belong in the model.
page_eventpageview_id. Ask: Is this an action, outcome, or step that does not require a whole new screen row? If yes, it is usually a page event.pageview_id unless you deliberately emit a virtual pageview per step. Conversion funnels in SQL are usually ordered sequences of both pageviews and events, not pageviews alone.The old wording “step in a journey report” sounds like every funnel step must be a pageview; that is misleading. Use pageviews for screen boundaries you want in “pages” reports; use events for steps and outcomes on those screens, including most revenue and form-progress signals.
A SPA does not mean “everything is only a page event.” You still create pageviews when the routable screen changes—after client-side navigation (pushState / popstate), or when your product defines a new “main content” surface. That is often called a virtual pageview (new row in pageview with the updated URL/path/title).
You do not normally emit a pageview on every click—that inflates “pages” and blurs screen boundaries. Clicks, toggles, and widget interactions stay page events tied to the current virtual pageview. If a click only opens a modal or expands a section and you do not treat that as a new screen, it stays an event.
page_event
page_event_item_id, page_event_item_name, page_event_item_category, and page_event_item_quantity are for commerce-style moments tied to a specific action, not to “the page in general.” Example: on a product listing, one pageview describes the list; each add_to_cart or view_item row carries which SKU and how many. One screen can produce many such rows. Putting a single item on pageview would be wrong for multi-item pages.
page_event_link_url is for outbound or navigational clicks you measure as events first (affiliate exit, download href, “next” CTA target)—the URL is part of what was clicked at that instant. If the click only changes the client route and you count that as a new screen, you also emit a new pageview for the destination; the event can still record the target for attribution.
Below is how the example DDL lines up with “screen context” vs “something happened on the screen.”
pageview — fits the litmus: address and document (page_url, path, query, title), inbound navigation (page_referrer), identity snapshot (page_user_id, page_session_id, optional inferred page_age_bracket / page_gender, …), request/client environment (IP, UA, languages, viewport at paint, device/OS/browser), optional load timing and error counters (page_load_complete_ms, page_lcp_ms, page_js_error_count, …), geo at delivery, and load timestamp. These describe the screen as loaded, not a specific click.pageview — nuance: utm_* is often session acquisition, not something that should change on every SPA virtual screen. Common pattern: fill UTM only on the first pageview of page_session_id, or only when the URL for this navigation actually carried campaign params; leave null on later client-only transitions.page_event — fits the litmus: page_event_type + time; optional page_event_label; commerce line (page_event_item_*); interaction target (page_event_element_*, page_event_link_url, form/field); engagement samples (page_event_scroll_depth_pct, page_event_duration_ms); errors; monetary page_event_value / page_event_currency when the event is the economic fact (purchase, donation).page_event — nuance: page_event_duration_ms is for event-scoped duration (e.g. video segment, time on checkout step), not a substitute for “time on page” in the abstract—derive time-on-page from the next pageview or a dedicated exit/hidden event if you need it.pageview; emitting a pageview for every micro-click; or duplicating a second “category” column when page_event_type already names the event (use a structured type string or enums instead).
A conversion funnel is an ordered list of steps, each step a predicate on pageview or page_event rows. Facts stay in those two tables; funnel and funnel_step in the complete DDL script store CMS-managed definitions keyed by workspace_id. funnel_daily_step_stats holds pre-aggregated counts per day so dashboards stay fast; your job computes counts from facts + steps, then writes stats with an idempotent replace or upsert per stat_date.
pageview (e.g. page_path / page_url patterns) or a page_event (page_event_type, plus optional match_extra JSON validated in app code).page_occurred_ts and page_event_occurred_ts; strict funnels require step 1 before step 2 in time, within max_window_seconds for user-scoped funnels when set.pageview on pageview_id, group by page_session_id from the pageview row.page_user_id or page_anonymous_id with the funnel’s window and count_mode (unique_actors vs event_sequences).funnel_step rows per funnel (simplest); no shared step table required.
Example (conceptual): Step 1 — page_path like /product/%; Step 2 — page_event_type = add_to_cart; Step 3 — checkout screen or begin_checkout; Step 4 — purchase. Full DDL (facts + funnels) is in the copy-paste block below. If you already have pageview / page_event, copy only the funnel half of that script—from the comment Funnel definitions & daily rollups through the last funnel index (PostgreSQL) or the matching funnel section (MySQL).
pageview_id, and relationshipspageview_id? You do, when the pageview is recorded: the browser or server creates a new UUID version 7 (RFC 9562)—time-ordered for friendlier B-tree inserts than v4, still a normal 36-character hyphenated string. The bundled helper uses batchpipe_website_analytics.new_uuid() (v7 via crypto.getRandomValues). Every page_event that belongs to that screen repeats the same id in pageview_id. For server-only inserts, PostgreSQL gen_random_uuid() is still a common default (v4); MySQL’s UUID() in the example follows the server’s usual version (often v1). Mint UUID v7 in application code everywhere if you want a single variant and time-ordered keys on the server too.pageview / page_event with no REFERENCES between them (common for warehouses). funnel_step and funnel_daily_step_stats reference funnel. Add foreign keys from page_event to pageview in your database if you want the database to enforce them.pageview only: page_session_id belongs on the pageview row. For events, get the session by joining page_event to pageview on pageview_id. Do not duplicate page_session_id on page_event unless you have a strong reason to denormalize.
Choose PostgreSQL or MySQL below; the block is one script (facts, indexes, then funnel tables). The bundled JS helper mints UUID v7 (RFC 9562) for new pageview_id / page_event_id values—same column types, better insert locality than v4. Drop columns you do not collect; add CHECK constraints or enums when you want to enforce allowed values for page_device_type, page_age_bracket, page_gender, and similar fields. Timing and error columns are often null on first paint; send the pageview on load or visibilitychange / beacon when you need full metrics. MySQL uses VARCHAR(255) for workspace_id on keyed columns so indexes stay within InnoDB limits—widen if your partition key is longer. Funnel semantics and idempotency patterns are in Funnel metadata and daily stats.
-- Website analytics: pageview, page_event, funnel, funnel_step, funnel_daily_step_stats (PostgreSQL)
-- Recommended: UUID v7 (RFC 9562) from the browser for pageview_id / page_event_id—time-ordered, same string shape as v4.
-- Funnel tables below use DEFAULT gen_random_uuid() (v4) for rare server-created rows; mint v7 in app instead if you prefer.
CREATE TABLE pageview (
-- Set in collector (e.g. batchpipe_website_analytics.new_uuid) or DEFAULT gen_random_uuid() on server insert
pageview_id UUID PRIMARY KEY,
-- Optional partition key (string; can be hierarchical if you want)
workspace_id TEXT,
-- This pageview: actual loaded URL (e.g. location.href)
page_url TEXT NOT NULL,
-- Parsed pieces of page_url (optional denormalization for reporting)
page_path TEXT,
page_query_string TEXT,
page_hash TEXT,
page_title TEXT,
-- Prior URL when available (document.referrer); host/path = parse or add columns if you group by domain a lot
page_referrer TEXT,
-- Who / which visit (snapshot at this screen)
page_user_id TEXT,
page_anonymous_id TEXT,
page_session_id TEXT,
-- Demographics (inferred or from profile); canonical values in guide prose
page_age_bracket TEXT,
page_gender TEXT,
-- Request / edge
page_client_ip TEXT,
page_forwarded_for TEXT,
page_request_id TEXT,
-- Client environment at paint
page_user_agent TEXT,
page_browser_languages TEXT,
page_document_language TEXT,
page_timezone TEXT,
page_host TEXT,
page_protocol TEXT,
-- Geo (often enriched server-side)
page_country_code TEXT,
page_region TEXT,
page_city TEXT,
page_postal_code TEXT,
page_screen_width INT,
page_screen_height INT,
page_viewport_width INT,
page_viewport_height INT,
page_color_depth INT,
page_pixel_ratio NUMERIC(6,3),
page_device_type TEXT,
page_os_family TEXT,
page_os_version TEXT,
page_browser_family TEXT,
page_browser_version TEXT,
-- Navigation timing (ms; PerformanceNavigationTiming vs navigation start)
page_time_to_first_byte_ms INT,
page_dom_content_loaded_ms INT,
page_load_complete_ms INT,
-- Core Web Vitals–style snapshots (often null until late in page life)
page_lcp_ms INT,
page_cls NUMERIC(8,4),
page_inp_ms INT,
-- Client-side counters for this screen
page_js_error_count INT,
page_resource_error_count INT,
-- Acquisition: often first touch only per session (see schema review)
utm_source TEXT,
utm_medium TEXT,
utm_campaign TEXT,
utm_term TEXT,
utm_content TEXT,
page_experiment_variant TEXT,
page_occurred_ts TIMESTAMPTZ NOT NULL,
page_created_ts TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE page_event (
page_event_id UUID PRIMARY KEY,
workspace_id TEXT,
-- Logical link to pageview; null when the event has no page (API, worker)
pageview_id UUID,
-- What happened (primary classifier; use dotted names e.g. ecommerce.add_to_cart)
page_event_type TEXT NOT NULL,
page_event_label TEXT,
page_event_occurred_ts TIMESTAMPTZ NOT NULL,
-- Monetary outcome for this event when applicable
page_event_value NUMERIC(20,6),
page_event_currency TEXT,
-- Commerce line (null unless type is item-level: add_to_cart, purchase, …)
page_event_item_id TEXT,
page_event_item_name TEXT,
page_event_item_category TEXT,
page_event_item_quantity INT,
-- Interaction target
page_event_link_url TEXT,
page_event_element_id TEXT,
page_event_element_selector TEXT,
page_event_element_text TEXT,
page_event_form_id TEXT,
page_event_field_name TEXT,
-- Engagement samples (not “whole page” metrics)
page_event_scroll_depth_pct INT,
page_event_duration_ms INT,
page_event_error_code TEXT,
page_event_error_message TEXT,
page_event_created_ts TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_pageview_workspace_occurred ON pageview (workspace_id, page_occurred_ts DESC);
CREATE INDEX idx_pageview_workspace_path ON pageview (workspace_id, page_path)
WHERE page_path IS NOT NULL;
CREATE INDEX idx_pageview_workspace_session ON pageview (workspace_id, page_session_id)
WHERE page_session_id IS NOT NULL;
CREATE INDEX idx_pageview_workspace_user ON pageview (workspace_id, page_user_id)
WHERE page_user_id IS NOT NULL;
CREATE INDEX idx_page_event_workspace_occurred ON page_event (workspace_id, page_event_occurred_ts DESC);
CREATE INDEX idx_page_event_pageview ON page_event (pageview_id)
WHERE pageview_id IS NOT NULL;
CREATE INDEX idx_page_event_type ON page_event (page_event_type);
-- Funnel definitions & daily rollups (FKs on funnel tables; facts stay logical-key-only above)
CREATE TABLE funnel (
funnel_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
workspace_id TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT,
scope TEXT NOT NULL CHECK (scope IN ('session', 'user_anonymous', 'user_known')),
max_window_seconds INTEGER CHECK (max_window_seconds IS NULL OR max_window_seconds > 0),
count_mode TEXT NOT NULL DEFAULT 'unique_actors'
CHECK (count_mode IN ('unique_actors', 'event_sequences')),
is_active BOOLEAN NOT NULL DEFAULT true,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (workspace_id, name)
);
CREATE TABLE funnel_step (
funnel_step_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
funnel_id UUID NOT NULL REFERENCES funnel (funnel_id) ON DELETE CASCADE,
step_order INTEGER NOT NULL CHECK (step_order >= 1),
step_kind TEXT NOT NULL CHECK (step_kind IN ('pageview', 'page_event')),
step_label TEXT,
page_event_type TEXT,
page_path_like TEXT,
page_url_like TEXT,
page_title_like TEXT,
match_extra JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (funnel_id, step_order),
CHECK (
(step_kind = 'page_event' AND page_event_type IS NOT NULL
AND page_path_like IS NULL AND page_url_like IS NULL AND page_title_like IS NULL)
OR
(step_kind = 'pageview' AND (
page_path_like IS NOT NULL OR page_url_like IS NOT NULL OR page_title_like IS NOT NULL
))
)
);
CREATE TABLE funnel_daily_step_stats (
stat_date DATE NOT NULL,
workspace_id TEXT NOT NULL,
funnel_id UUID NOT NULL REFERENCES funnel (funnel_id) ON DELETE CASCADE,
step_order INTEGER NOT NULL CHECK (step_order >= 1),
actor_count BIGINT NOT NULL CHECK (actor_count >= 0),
computed_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (stat_date, workspace_id, funnel_id, step_order)
);
CREATE INDEX idx_funnel_workspace_active ON funnel (workspace_id) WHERE is_active;
CREATE INDEX idx_funnel_step_funnel ON funnel_step (funnel_id, step_order);
CREATE INDEX idx_funnel_daily_stats_funnel_date ON funnel_daily_step_stats (funnel_id, stat_date DESC);
CMS-managed funnel definitions reference workspace_id like the fact tables. funnel_daily_step_stats holds one row per funnel step per calendar day: actor_count is the number of distinct actors (sessions or users per scope) who completed the funnel through that step in order on that day—your job defines the exact anchor (e.g. first step occurred on stat_date). Idempotency: primary key (stat_date, workspace_id, funnel_id, step_order) lets you delete for a day and funnel set then re-insert, or upsert: in PostgreSQL INSERT … ON CONFLICT (…) DO UPDATE; in MySQL INSERT … ON DUPLICATE KEY UPDATE so reruns overwrite the same logical rows.
Use the helper file directly: /js/website-analytics.js. It exports batchpipe_website_analytics with configurable app-specific defaults (workspace_id, user ids, session id, optional page_age_bracket / page_gender), and it always extracts browser context for pageviews by default (URL, referrer, language, timezone, viewport, screen, and related fields).
When you call pageview_row after the page has loaded (for example from a load handler, visibilitychange, or a send-beacon on exit), it also fills navigation timing (page_time_to_first_byte_ms, DOM ready, load complete) and best-effort page_lcp_ms / page_cls from the Performance APIs where available. Pass page_inp_ms yourself if you measure INP. The script maintains window.batchpipe_runtime_counts (js_error_count, resource_error_count): it increments those on uncaught errors, unhandled promise rejections, and failed script/image/stylesheet/media loads, and pageview_row copies them into the row (you can still override or bump the object manually).
Global: batchpipe_website_analytics —
configure,
browser_context,
pageview_row,
page_event_row,
new_uuid (RFC 9562 v7).
batchpipe_website_analytics.configure({
workspace_id: function () { return window.app_workspace_id; },
page_session_id: function () { return window.app_session_id; },
page_user_id: function () { return window.current_user_id; },
page_anonymous_id: function () { return window.anonymous_id; }
});
var pageview = batchpipe_website_analytics.pageview_row({
page_occurred_ts: new Date().toISOString()
});
var event_row = batchpipe_website_analytics.page_event_row({
page_event_type: 'button_click'
});