Most people encounter Looker for the first time thinking it is another Tableau or another Looker Studio. They open it, look for a drag-and-drop chart builder, and feel confused. That confusion is the first sign they are dealing with something fundamentally different.
Looker is not a visualization tool. It is a semantic layer platform. The distinction matters enormously in practice. A visualization tool lets you connect data and build charts. A semantic layer platform lets you define what your data means — and then enforces that meaning across every chart, every analyst, and every business question, forever.
This article is the foundation of a series on using Looker to reconcile GA4, SFCC, and offline order data. Before touching any join or any dashboard, you need to understand what Looker actually is and how it thinks about data.
Looker vs Looker Studio: Stop Using These Names Interchangeably
Google owns both products. The naming is a genuine source of confusion even inside Google. Here is the definitive comparison:
| Dimension | Looker Studio (free) | Looker (paid, enterprise) |
|---|---|---|
| Formerly known as | Google Data Studio | Looker (acquired by Google in 2020 for $2.6B) |
| Core purpose | Reporting and visualization | Semantic modeling + BI platform |
| Data modeling layer | None — you connect raw data directly | LookML — defines business logic in code |
| Join capability | “Data blending” — left joins only, field-level, fragile | Full SQL join semantics — inner, left, right, full outer, cross |
| Metric definitions | Defined per report — diverges across dashboards | Defined once in LookML — consistent everywhere |
| Version control | None | Native Git integration — every model change is a commit |
| Row-level security | Not available | Access filters at the Explore level, user attributes |
| Scheduling and alerts | Basic email delivery | Scheduled deliveries, threshold alerts, Slack/webhook integration |
| API access | Limited read API | Full REST API — run queries, manage users, embed dashboards |
| Embedded analytics | iFrame embedding only | SSO embedding, white-label, signed embed URLs |
| Target user | Individual analyst or small team | Data team serving an organisation |
The fundamental architectural difference: Looker Studio puts business logic in dashboards. Looker puts business logic in a model. This sounds like a minor implementation detail until you have 50 dashboards and someone asks you to change the definition of “revenue”.
The Full Looker Architecture
LookML: The Language at the Heart of Looker
LookML (Looker Modeling Language) is a YAML-like domain-specific language that describes your data model. It is the single most important thing to understand about Looker. Everything else — dashboards, Explores, scheduled reports — is downstream of the LookML model.
A LookML project is a collection of files stored in Git. It typically contains:
- View files (.view.lkml) — describe tables and their fields
- Model files (.model.lkml) — define database connections and Explores
- Explore files — sometimes embedded in model files, define joins
- Manifest file — project-level configuration, imported models
Projects and Models
A project maps to a Git repository. A model defines a database connection and the set of Explores available from that connection. One project can contain multiple models, useful when you need to expose different views to different teams from the same data.
# File: reconciliation.model.lkml
connection: "bigquery_production"
include: "/views/*.view.lkml" # Include all view files
explore: order_reconciliation {
label: "GA4 + SFCC + Offline Reconciliation"
description: "Compare GA4 reported revenue against SFCC billed revenue across all channels"
join: sfcc_orders {
type: left_outer
sql_on: LOWER(TRIM(${order_reconciliation.transaction_id}))
= LOWER(TRIM(${sfcc_orders.order_id})) ;;
relationship: one_to_one
}
join: offline_orders {
type: left_outer
sql_on: ${sfcc_orders.order_id} = ${offline_orders.reference_id} ;;
relationship: one_to_many
}
}
Views: The Table Abstraction Layer
A view maps to a database table, a SQL query, or a derived table. It is where you define every field available to end users. A well-written view file is self-documenting — any analyst opening it should understand what the table is, what each field means, and what edge cases to be aware of.
# File: sfcc_orders.view.lkml
view: sfcc_orders {
sql_table_name: `project.reconciliation.normalised_sfcc_orders` ;;
# --- Primary key (always hidden) ---
dimension: order_id {
type: string
sql: ${TABLE}.normalised_order_id ;;
primary_key: yes
hidden: yes
description: "Normalised SFCC order ID (LOWER + TRIM applied). Use for joins."
}
# --- Time dimensions ---
dimension_group: created {
type: time
timeframes: [raw, time, date, week, month, quarter, year]
sql: ${TABLE}.created_at ;;
datatype: timestamp
convert_tz: yes
description: "Timestamp when the order was confirmed in SFCC."
}
# --- Categorical dimensions ---
dimension: channel {
label: "Order Channel"
type: string
sql: ${TABLE}.channel ;;
description: "Channel recorded in SFCC at order creation. Values: web, store, phone, api, b2b."
}
dimension: customer_type {
label: "Customer Type"
type: string
sql: ${TABLE}.customer_type ;;
description: "B2C vs B2B customer classification from SFCC."
}
dimension: order_status {
label: "Order Status"
type: string
sql: ${TABLE}.status ;;
description: "Current order status. Values: confirmed, shipped, cancelled, refunded, pending."
}
dimension: product_category {
label: "Product Category"
type: string
sql: ${TABLE}.primary_category ;;
}
# --- Numeric dimensions (not aggregated) ---
dimension: order_value_raw {
type: number
sql: ${TABLE}.order_value ;;
hidden: yes # Use measures for aggregation, not this
}
# --- Calculated classification ---
dimension: order_size_tier {
label: "Order Size Tier"
type: string
sql: CASE
WHEN ${TABLE}.order_value < 50 THEN 'small'
WHEN ${TABLE}.order_value < 200 THEN 'medium'
WHEN ${TABLE}.order_value < 1000 THEN 'large'
ELSE 'enterprise'
END ;;
description: "Order size tier based on order value. Small: <€50, Medium: €50-200, Large: €200-1000, Enterprise: >€1000."
}
# --- Measures ---
measure: order_count {
label: "Orders"
type: count_distinct
sql: ${order_id} ;;
description: "Count of distinct SFCC orders. Includes all statuses unless filtered."
}
measure: confirmed_order_count {
label: "Confirmed Orders"
type: count_distinct
sql: CASE WHEN ${order_status} NOT IN ('cancelled', 'fraud') THEN ${order_id} END ;;
description: "Count of orders excluding cancelled and fraud status."
}
measure: gross_revenue {
label: "Gross Revenue (SFCC)"
type: sum
sql: ${TABLE}.order_value ;;
value_format_name: decimal_2
description: "Total order value before refunds. Includes shipping and taxes as recorded in SFCC."
}
measure: net_revenue {
label: "Net Revenue (SFCC)"
type: sum
sql: ${TABLE}.order_value - COALESCE(${TABLE}.refund_amount, 0) ;;
value_format_name: decimal_2
description: "Gross revenue minus processed refunds. Does not include pending refunds."
filters: [order_status: "-cancelled,-fraud"]
}
measure: average_order_value {
label: "Average Order Value"
type: number
sql: ${net_revenue} / NULLIF(${confirmed_order_count}, 0) ;;
value_format_name: decimal_2
description: "Net revenue divided by confirmed order count."
}
measure: refund_rate {
label: "Refund Rate"
type: number
sql: COUNT(DISTINCT CASE WHEN ${TABLE}.refund_amount > 0 THEN ${order_id} END)
/ NULLIF(${confirmed_order_count}, 0) ;;
value_format_name: percent_1
description: "Share of confirmed orders that received at least a partial refund."
}
}
Explores: Pre-Built Joins for Self-Service Analytics
An Explore is the interface end users interact with to build queries. It is essentially a pre-configured SQL join made available through Looker’s UI. When you open Looker and click “Explore”, you are selecting an Explore defined in LookML.
The critical insight: joins are defined once in the Explore, not in each dashboard. Every user who opens that Explore gets the same join logic. When you update a join, every saved report using that Explore immediately reflects the change.
Join Types and Relationship Declarations
LookML join declarations include a relationship parameter that is critical for correctness. Declaring this wrong leads to measure fan-out — a class of silent data corruption where aggregated measures return inflated values because rows are being duplicated before aggregation.
explore: ga4_sessions {
label: "GA4 Sessions + Order Reconciliation"
# one_to_one: each GA4 session matches at most one SFCC order
join: sfcc_orders {
type: left_outer
sql_on: LOWER(TRIM(${ga4_sessions.transaction_id}))
= LOWER(TRIM(${sfcc_orders.order_id})) ;;
relationship: one_to_one
# Looker will use symmetric aggregates to prevent fan-out
}
# many_to_one: multiple line items per order
join: order_line_items {
type: left_outer
sql_on: ${sfcc_orders.order_id} = ${order_line_items.order_id} ;;
relationship: one_to_many
# Without symmetric aggregates this would inflate sfcc_orders measures
}
# Limit which fields are exposed from joined views
join: offline_orders {
type: left_outer
sql_on: ${sfcc_orders.order_id} = ${offline_orders.sfcc_reference} ;;
relationship: one_to_one
fields: [offline_orders.channel, offline_orders.store_code, offline_orders.sales_rep]
}
}
Symmetric Aggregates: How Looker Prevents Fan-Out
Fan-out is a classic SQL problem: when you join a one-to-many relationship and then apply SUM to the “one” side, each row gets duplicated for every matching row on the “many” side, inflating the sum. Most BI tools require you to handle this manually with subqueries or CTEs.
Looker handles fan-out automatically using symmetric aggregates — it rewrites the SQL to use a COUNT DISTINCT or SUM DISTINCT pattern that correctly deduplicates. This only works when:
- The view has a declared
primary_key - The
relationshipis correctly declared in the join
# Without symmetric aggregates (naive SQL)
# SELECT SUM(orders.order_value) will be inflated if orders joins one_to_many to line_items
# With symmetric aggregates (Looker-generated SQL)
# SELECT SUM(DISTINCT (CAST(FLOOR(COALESCE(orders.order_value, 0)*(1000000*1.0)) AS FLOAT64))
# + CAST(orders.order_id AS FLOAT64) * 0.000001) ...
# This is how Looker prevents double-counting -- it's generated automatically
Calculated Metrics: The Governance Payoff
The real value of LookML becomes clear when you build calculated metrics that span multiple sources. These metrics are defined once, documented inline, and available to every user instantly. There is no copy-paste, no version drift, no “which formula is correct” debate.
Simple measures
measure: conversion_rate {
label: "Web Conversion Rate (SFCC/GA4)"
type: number
sql: ${sfcc_orders.confirmed_order_count}
/ NULLIF(${ga4_sessions.session_count}, 0) ;;
value_format_name: percent_2
description: "Confirmed SFCC web orders divided by GA4 sessions with a purchase intent signal.
Note: this is based on matched records only. Unmatched orders inflate SFCC numerator."
}
measure: revenue_discrepancy {
label: "Revenue Gap SFCC vs GA4 (€)"
type: number
sql: ${sfcc_orders.net_revenue} - ${ga4_sessions.reported_revenue} ;;
value_format_name: decimal_2
description: "Positive = SFCC reports more than GA4 (normal). Negative = GA4 over-reports (check refund events)."
}
measure: revenue_discrepancy_pct {
label: "Revenue Gap %"
type: number
sql: (${sfcc_orders.net_revenue} - ${ga4_sessions.reported_revenue})
/ NULLIF(${sfcc_orders.net_revenue}, 0) ;;
value_format_name: percent_1
}
Filtered measures
LookML measures can include inline filters, creating pre-segmented metrics that are always consistent:
measure: web_order_count {
label: "Web Orders"
type: count_distinct
sql: ${sfcc_orders.order_id} ;;
filters: [sfcc_orders.channel: "web"]
description: "SFCC orders from the web storefront channel only."
}
measure: b2b_revenue {
label: "B2B Revenue"
type: sum
sql: ${sfcc_orders.order_value} ;;
filters: [sfcc_orders.customer_type: "b2b"]
value_format_name: decimal_2
description: "Revenue from B2B customer segments. Excluded from digital attribution analysis."
}
measure: high_value_order_rate {
label: "High-Value Order Rate"
type: number
sql: COUNT(DISTINCT CASE WHEN ${sfcc_orders.order_value} > 500
THEN ${sfcc_orders.order_id} END)
/ NULLIF(${sfcc_orders.confirmed_order_count}, 0) ;;
value_format_name: percent_1
description: "Share of orders with order value above €500."
}
Derived Tables: Virtual Tables for Complex Transformations
When your transformation logic cannot be expressed as a simple field calculation, Looker supports derived tables: virtual tables defined by a SQL query within LookML. They behave like real database tables in the model — you can join them, expose their fields in Explores, and build measures on top of them.
Native derived tables
view: reconciliation_base {
derived_table: {
sql:
SELECT
s.order_id,
s.created_at AS order_date,
s.channel,
s.customer_type,
s.order_value AS sfcc_revenue,
COALESCE(s.refund_amount, 0) AS refund_amount,
s.order_value - COALESCE(s.refund_amount, 0) AS sfcc_net_revenue,
g.session_id,
g.reported_revenue AS ga4_revenue,
g.session_source,
g.session_medium,
g.session_campaign,
CASE
WHEN g.session_id IS NOT NULL THEN 'tracked_web'
WHEN s.channel IN ('store', 'pos', 'kiosk') THEN 'offline_store'
WHEN s.channel IN ('phone', 'callcenter') THEN 'offline_callcenter'
WHEN s.customer_type IN ('b2b', 'wholesale') THEN 'b2b_wholesale'
WHEN s.channel IN ('web', 'storefront') THEN 'dark_traffic_web'
WHEN s.order_type = 'subscription' THEN 'subscription'
ELSE 'unknown'
END AS order_origin,
s.order_value - COALESCE(g.reported_revenue, 0) AS revenue_delta
FROM `project.reconciliation.normalised_sfcc_orders` s
LEFT JOIN `project.reconciliation.normalised_ga4_purchases` g
ON s.normalised_order_id = g.normalised_transaction_id
WHERE s.status NOT IN ('cancelled', 'fraud', 'test')
;;
# Rebuild strategy: daily, triggered by date change
sql_trigger_value: SELECT CURRENT_DATE() ;;
# Optional: partition and cluster for performance
partition_keys: ["order_date"]
cluster_keys: ["channel", "customer_type"]
}
dimension: order_id { type: string sql: ${TABLE}.order_id ;; primary_key: yes hidden: yes }
dimension: order_date { type: date sql: ${TABLE}.order_date ;; label: "Order Date" }
dimension: channel { type: string sql: ${TABLE}.channel ;; label: "SFCC Channel" }
dimension: order_origin { type: string sql: ${TABLE}.order_origin ;; label: "Order Origin" }
dimension: session_source { type: string sql: ${TABLE}.session_source ;; label: "GA4 Source" }
dimension: session_medium { type: string sql: ${TABLE}.session_medium ;; label: "GA4 Medium" }
dimension: session_campaign { type: string sql: ${TABLE}.session_campaign ;; label: "GA4 Campaign" }
measure: total_sfcc_net_revenue {
label: "SFCC Net Revenue"
type: sum
sql: ${TABLE}.sfcc_net_revenue ;;
value_format_name: decimal_2
}
measure: total_ga4_revenue {
label: "GA4 Reported Revenue"
type: sum
sql: ${TABLE}.ga4_revenue ;;
value_format_name: decimal_2
}
measure: avg_revenue_delta {
label: "Avg Revenue Delta per Order"
type: average
sql: ${TABLE}.revenue_delta ;;
value_format_name: decimal_2
}
}
Persistent derived tables (PDTs)
For large derived tables that are expensive to compute, Looker supports Persistent Derived Tables (PDTs) — the derived table is materialised in the database and cached until a rebuild trigger fires. This dramatically improves query performance for complex joins.
view: reconciliation_pdt {
derived_table: {
sql: SELECT ... FROM `project.sfcc.orders` s
LEFT JOIN `project.ga4.purchases` g ON ... ;;
# Rebuild every night at 3am after GA4 and SFCC exports complete
datagroup_trigger: nightly_etl
# Store in the project.looker_pdts dataset
sql_create: CREATE TABLE IF NOT EXISTS ${SQL_TABLE_NAME} ...
}
}
# Define the datagroup in the model file
datagroup: nightly_etl {
cron: "0 3 * * *" # Every day at 3am UTC
max_cache_age: "24 hours"
}
Access Control: Row-Level and Field-Level Security
Looker has a sophisticated permission system built around user attributes — key-value pairs assigned to users or groups. These can be used to restrict what data users see without creating separate dashboards or reports.
Row-level security with access filters
# In the model file: restrict which regions a user can see
explore: order_reconciliation {
access_filter: {
field: sfcc_orders.region
user_attribute: allowed_region
}
# User with allowed_region = "FR" only sees French orders
# User with allowed_region = "ALL" sees everything
}
Field-level access with required access grants
# Define a named access grant
access_grant: finance_only {
user_attribute: finance_access
allowed_values: ["yes"]
}
# Apply to sensitive fields
measure: gross_margin {
type: number
sql: (${net_revenue} - ${cost_of_goods}) / NULLIF(${net_revenue}, 0) ;;
required_access_grants: [finance_only]
# Users without finance_access=yes cannot see this field
}
Scheduling, Alerts, and the REST API
Looker is not just a query tool. It is an operational platform.
Scheduled deliveries
Any Look or dashboard can be scheduled for delivery via email, Slack, Google Chat, S3, SFTP, or webhook. Deliveries can be filtered by user attribute — the same dashboard sent to ten regional managers will show each one only their region’s data.
Threshold alerts
Users can set alerts on any metric: “notify me via Slack when the web dark traffic rate exceeds 15%”. Looker polls the query on a schedule and fires the alert when the condition is met. No engineering required.
REST API
Looker exposes a full REST API that covers every action available in the UI: run queries, download data, manage users and groups, update user attributes, trigger PDT rebuilds, and embed dashboards. This enables Looker to be used as a data delivery engine, not just a dashboard tool.
# Run a Looker query via the API (Python example)
import looker_sdk
sdk = looker_sdk.init40()
result = sdk.run_inline_query(
result_format="json",
body=looker_sdk.models.WriteQuery(
model="reconciliation",
view="order_reconciliation",
fields=["order_origin", "sfcc_orders.confirmed_order_count",
"sfcc_orders.net_revenue", "revenue_discrepancy"],
filters={"order_reconciliation.order_date": "last 30 days"},
sorts=["sfcc_orders.net_revenue desc"],
limit=500
)
)
# Returns JSON data usable in any downstream system
Performance: Making Looker Fast at Scale
Looker generates SQL and sends it to your database. Its performance ceiling is therefore the performance of your warehouse — BigQuery in most modern stacks. A few patterns consistently make a difference:
- Use PDTs for expensive joins. A pre-materialised reconciliation table that BigQuery computes once nightly is orders of magnitude faster than recomputing the full GA4/SFCC join on every dashboard load.
- Partition and cluster your derived tables. Declaring partition keys (e.g. order date) and cluster keys (e.g. channel) in your PDT definition lets BigQuery skip entire partitions, reducing query cost dramatically.
- Use datagroups for cache control. Configure a datagroup tied to your ETL completion time so Looker serves cached results until new data is available, then invalidates the cache precisely when it should.
- Limit the Explore field picker. Use
hidden: yesaggressively. An Explore with 300 fields is slower to render and harder to use than one with 60 well-chosen fields. - Avoid unbounded liquid filters. If a dashboard has no date filter and queries a full GA4 events table, BigQuery will scan everything. Always set default filters on date dimensions in your Explores.
What Looker Is Not Good For
Looker is powerful and expensive. It is not the right tool for every situation. Be honest about these limitations:
- Ad hoc exploration for a single analyst. If one person needs to explore data quickly, a Jupyter notebook or even Looker Studio is faster to set up.
- Real-time streaming data. Looker queries a database on demand. It is not a streaming analytics tool. For millisecond-latency requirements, you need a different architecture.
- Highly customised UI. Looker dashboards are functional but not as visually flexible as custom-built React dashboards. For pixel-perfect design requirements, use the embed API with a custom frontend.
- Small teams without a data engineer. LookML requires someone who knows SQL and can maintain a codebase. Without that person, the model becomes a liability.
What Comes Next in This Series
- Part 2: Why GA4, SFCC and offline orders will never match — the 5 structural causes and what to do about them
- Part 3: The transaction ID problem — every failure mode, the full SQL diagnostic sequence, and how to fix it in LookML
- Part 4: LookML metric governance — defining revenue, discrepancy, and conversion rate once, with full documentation and version control
- Part 5: Dark traffic and orphan orders — classifying the unmatched orders and building a complete reconciliation dashboard
Sources and Further Reading
- Looker official LookML reference documentation: cloud.google.com/looker/docs/reference/lookml-quick-reference
- Google Cloud blog on Looker and the semantic layer: cloud.google.com/blog/topics
- LookML best practices — Google documentation: cloud.google.com/looker/docs/best-practices
- Symmetric aggregates explanation: cloud.google.com/looker/docs — symmetric aggregates
- Persistent derived tables: cloud.google.com/looker/docs/derived-tables
- Looker API reference: developers.looker.com/api/explorer
- dbt + Looker integration guide: docs.getdbt.com
- GA4 BigQuery Export schema: support.google.com/analytics/answer/7029846