BigQuery is a serverless cloud data warehouse for running fast SQL on raw, event-level data. In web analytics, it’s where you land hits like pageview, clicks, and purchases, then compute metrics beyond UI limits—think custom funnels, cohorts, and precise conversion rate. You don’t manage servers or indexes; you design a schema and write SQL. With BigQuery, you can stitch a user across devices, rebuild a session model, or calculate channel LTV using clean event logs and joins.
Why analysts use it
- Granularity without sampling. Keep raw events for accurate event count, retention, and attribution.
- SQL-first workflow. If it fits SQL, it fits BigQuery—funnels from hit→pageview→purchase, channel mixes, user stitching.
- Elastic scale. Terabytes to petabytes with the same endpoint.
- Ecosystem-friendly. Easy pipelines from your datalayer or a server-side container, plus BI tools on top.
Core concepts (quick map)
Concept | Web analytics use | Tip |
---|---|---|
Dataset / Table | Store events like events_* | One dataset per product/domain |
Partitioning | Split by event date | Filter by date to cut scanned bytes |
Clustering | Order by user_id , source , etc. | Speeds user- or channel-centric queries |
Materialized Views | Cache KPIs/funnels | Great for daily dashboards |
Cost model (mental model)
You pay for storage and bytes scanned.
- Query (on-demand):
Cost ≈ (Bytes_scanned / 1 TB) × price_per_TB
- Storage:
Monthly_cost ≈ GB_stored × price_per_GB
Two easy wins: SELECT
only needed columns and always filter by partitioned dates.
Mini example
Event schema sketch:event_date DATE, user_id STRING, session_id STRING, event_name STRING, value NUMERIC, source STRING
.
With this, you can compute pages-per-session, channel LTV, or deduplicate by user without sampling.