BigQuery

BigQuery is a serverless cloud data warehouse for running fast SQL on raw, event-level data. In web analytics, it’s where you land hits like pageview, clicks, and purchases, then compute metrics beyond UI limits—think custom funnels, cohorts, and precise conversion rate. You don’t manage servers or indexes; you design a schema and write SQL. With BigQuery, you can stitch a user across devices, rebuild a session model, or calculate channel LTV using clean event logs and joins.

Why analysts use it

  • Granularity without sampling. Keep raw events for accurate event count, retention, and attribution.
  • SQL-first workflow. If it fits SQL, it fits BigQuery—funnels from hit→pageview→purchase, channel mixes, user stitching.
  • Elastic scale. Terabytes to petabytes with the same endpoint.
  • Ecosystem-friendly. Easy pipelines from your datalayer or a server-side container, plus BI tools on top.

Core concepts (quick map)

ConceptWeb analytics useTip
Dataset / TableStore events like events_*One dataset per product/domain
PartitioningSplit by event dateFilter by date to cut scanned bytes
ClusteringOrder by user_id, source, etc.Speeds user- or channel-centric queries
Materialized ViewsCache KPIs/funnelsGreat for daily dashboards

Cost model (mental model)

You pay for storage and bytes scanned.

  • Query (on-demand): Cost ≈ (Bytes_scanned / 1 TB) × price_per_TB
  • Storage: Monthly_cost ≈ GB_stored × price_per_GB

Two easy wins: SELECT only needed columns and always filter by partitioned dates.

Mini example

Event schema sketch:
event_date DATE, user_id STRING, session_id STRING, event_name STRING, value NUMERIC, source STRING.

With this, you can compute pages-per-session, channel LTV, or deduplicate by user without sampling.