What Is Looker? The Complete Guide to LookML, Multi-Source Data, Joins, Calculated Metrics, and Governance

Most people encounter Looker for the first time thinking it is another Tableau or another Looker Studio. They open it, look for a drag-and-drop chart builder, and feel confused. That confusion is the first sign they are dealing with something fundamentally different.

Looker is not a visualization tool. It is a semantic layer platform. The distinction matters enormously in practice. A visualization tool lets you connect data and build charts. A semantic layer platform lets you define what your data means — and then enforces that meaning across every chart, every analyst, and every business question, forever.

This article is the foundation of a series on using Looker to reconcile GA4, SFCC, and offline order data. Before touching any join or any dashboard, you need to understand what Looker actually is and how it thinks about data.

Looker vs Looker Studio: Stop Using These Names Interchangeably

Google owns both products. The naming is a genuine source of confusion even inside Google. Here is the definitive comparison:

Dimension	Looker Studio (free)	Looker (paid, enterprise)
Formerly known as	Google Data Studio	Looker (acquired by Google in 2020 for $2.6B)
Core purpose	Reporting and visualization	Semantic modeling + BI platform
Data modeling layer	None — you connect raw data directly	LookML — defines business logic in code
Join capability	“Data blending” — left joins only, field-level, fragile	Full SQL join semantics — inner, left, right, full outer, cross
Metric definitions	Defined per report — diverges across dashboards	Defined once in LookML — consistent everywhere
Version control	None	Native Git integration — every model change is a commit
Row-level security	Not available	Access filters at the Explore level, user attributes
Scheduling and alerts	Basic email delivery	Scheduled deliveries, threshold alerts, Slack/webhook integration
API access	Limited read API	Full REST API — run queries, manage users, embed dashboards
Embedded analytics	iFrame embedding only	SSO embedding, white-label, signed embed URLs
Target user	Individual analyst or small team	Data team serving an organisation

The fundamental architectural difference: Looker Studio puts business logic in dashboards. Looker puts business logic in a model. This sounds like a minor implementation detail until you have 50 dashboards and someone asks you to change the definition of “revenue”.

The Full Looker Architecture

Figure 1 — The full Looker architecture stack. The LookML semantic layer is the key differentiator: it sits between raw data and end users, enforcing business logic consistently across all interfaces.

LookML: The Language at the Heart of Looker

LookML (Looker Modeling Language) is a YAML-like domain-specific language that describes your data model. It is the single most important thing to understand about Looker. Everything else — dashboards, Explores, scheduled reports — is downstream of the LookML model.

A LookML project is a collection of files stored in Git. It typically contains:

View files (.view.lkml) — describe tables and their fields
Model files (.model.lkml) — define database connections and Explores
Explore files — sometimes embedded in model files, define joins
Manifest file — project-level configuration, imported models

Projects and Models

A project maps to a Git repository. A model defines a database connection and the set of Explores available from that connection. One project can contain multiple models, useful when you need to expose different views to different teams from the same data.

# File: reconciliation.model.lkml
connection: "bigquery_production"

include: "/views/*.view.lkml"  # Include all view files

explore: order_reconciliation {
  label: "GA4 + SFCC + Offline Reconciliation"
  description: "Compare GA4 reported revenue against SFCC billed revenue across all channels"

  join: sfcc_orders {
    type: left_outer
    sql_on: LOWER(TRIM(${order_reconciliation.transaction_id}))
             = LOWER(TRIM(${sfcc_orders.order_id})) ;;
    relationship: one_to_one
  }

  join: offline_orders {
    type: left_outer
    sql_on: ${sfcc_orders.order_id} = ${offline_orders.reference_id} ;;
    relationship: one_to_many
  }
}

Views: The Table Abstraction Layer

A view maps to a database table, a SQL query, or a derived table. It is where you define every field available to end users. A well-written view file is self-documenting — any analyst opening it should understand what the table is, what each field means, and what edge cases to be aware of.

# File: sfcc_orders.view.lkml
view: sfcc_orders {
  sql_table_name: `project.reconciliation.normalised_sfcc_orders` ;;

  # --- Primary key (always hidden) ---
  dimension: order_id {
    type: string
    sql: ${TABLE}.normalised_order_id ;;
    primary_key: yes
    hidden: yes
    description: "Normalised SFCC order ID (LOWER + TRIM applied). Use for joins."
  }

  # --- Time dimensions ---
  dimension_group: created {
    type: time
    timeframes: [raw, time, date, week, month, quarter, year]
    sql: ${TABLE}.created_at ;;
    datatype: timestamp
    convert_tz: yes
    description: "Timestamp when the order was confirmed in SFCC."
  }

  # --- Categorical dimensions ---
  dimension: channel {
    label: "Order Channel"
    type: string
    sql: ${TABLE}.channel ;;
    description: "Channel recorded in SFCC at order creation. Values: web, store, phone, api, b2b."
  }

  dimension: customer_type {
    label: "Customer Type"
    type: string
    sql: ${TABLE}.customer_type ;;
    description: "B2C vs B2B customer classification from SFCC."
  }

  dimension: order_status {
    label: "Order Status"
    type: string
    sql: ${TABLE}.status ;;
    description: "Current order status. Values: confirmed, shipped, cancelled, refunded, pending."
  }

  dimension: product_category {
    label: "Product Category"
    type: string
    sql: ${TABLE}.primary_category ;;
  }

  # --- Numeric dimensions (not aggregated) ---
  dimension: order_value_raw {
    type: number
    sql: ${TABLE}.order_value ;;
    hidden: yes  # Use measures for aggregation, not this
  }

  # --- Calculated classification ---
  dimension: order_size_tier {
    label: "Order Size Tier"
    type: string
    sql: CASE
      WHEN ${TABLE}.order_value < 50   THEN 'small'
      WHEN ${TABLE}.order_value < 200  THEN 'medium'
      WHEN ${TABLE}.order_value < 1000 THEN 'large'
      ELSE 'enterprise'
    END ;;
    description: "Order size tier based on order value. Small: <€50, Medium: €50-200, Large: €200-1000, Enterprise: >€1000."
  }

  # --- Measures ---
  measure: order_count {
    label: "Orders"
    type: count_distinct
    sql: ${order_id} ;;
    description: "Count of distinct SFCC orders. Includes all statuses unless filtered."
  }

  measure: confirmed_order_count {
    label: "Confirmed Orders"
    type: count_distinct
    sql: CASE WHEN ${order_status} NOT IN ('cancelled', 'fraud') THEN ${order_id} END ;;
    description: "Count of orders excluding cancelled and fraud status."
  }

  measure: gross_revenue {
    label: "Gross Revenue (SFCC)"
    type: sum
    sql: ${TABLE}.order_value ;;
    value_format_name: decimal_2
    description: "Total order value before refunds. Includes shipping and taxes as recorded in SFCC."
  }

  measure: net_revenue {
    label: "Net Revenue (SFCC)"
    type: sum
    sql: ${TABLE}.order_value - COALESCE(${TABLE}.refund_amount, 0) ;;
    value_format_name: decimal_2
    description: "Gross revenue minus processed refunds. Does not include pending refunds."
    filters: [order_status: "-cancelled,-fraud"]
  }

  measure: average_order_value {
    label: "Average Order Value"
    type: number
    sql: ${net_revenue} / NULLIF(${confirmed_order_count}, 0) ;;
    value_format_name: decimal_2
    description: "Net revenue divided by confirmed order count."
  }

  measure: refund_rate {
    label: "Refund Rate"
    type: number
    sql: COUNT(DISTINCT CASE WHEN ${TABLE}.refund_amount > 0 THEN ${order_id} END)
         / NULLIF(${confirmed_order_count}, 0) ;;
    value_format_name: percent_1
    description: "Share of confirmed orders that received at least a partial refund."
  }
}

Explores: Pre-Built Joins for Self-Service Analytics

An Explore is the interface end users interact with to build queries. It is essentially a pre-configured SQL join made available through Looker’s UI. When you open Looker and click “Explore”, you are selecting an Explore defined in LookML.

The critical insight: joins are defined once in the Explore, not in each dashboard. Every user who opens that Explore gets the same join logic. When you update a join, every saved report using that Explore immediately reflects the change.

Join Types and Relationship Declarations

LookML join declarations include a relationship parameter that is critical for correctness. Declaring this wrong leads to measure fan-out — a class of silent data corruption where aggregated measures return inflated values because rows are being duplicated before aggregation.

explore: ga4_sessions {
  label: "GA4 Sessions + Order Reconciliation"

  # one_to_one: each GA4 session matches at most one SFCC order
  join: sfcc_orders {
    type: left_outer
    sql_on: LOWER(TRIM(${ga4_sessions.transaction_id}))
             = LOWER(TRIM(${sfcc_orders.order_id})) ;;
    relationship: one_to_one
    # Looker will use symmetric aggregates to prevent fan-out
  }

  # many_to_one: multiple line items per order
  join: order_line_items {
    type: left_outer
    sql_on: ${sfcc_orders.order_id} = ${order_line_items.order_id} ;;
    relationship: one_to_many
    # Without symmetric aggregates this would inflate sfcc_orders measures
  }

  # Limit which fields are exposed from joined views
  join: offline_orders {
    type: left_outer
    sql_on: ${sfcc_orders.order_id} = ${offline_orders.sfcc_reference} ;;
    relationship: one_to_one
    fields: [offline_orders.channel, offline_orders.store_code, offline_orders.sales_rep]
  }
}

Symmetric Aggregates: How Looker Prevents Fan-Out

Fan-out is a classic SQL problem: when you join a one-to-many relationship and then apply SUM to the “one” side, each row gets duplicated for every matching row on the “many” side, inflating the sum. Most BI tools require you to handle this manually with subqueries or CTEs.

Looker handles fan-out automatically using symmetric aggregates — it rewrites the SQL to use a COUNT DISTINCT or SUM DISTINCT pattern that correctly deduplicates. This only works when:

The view has a declared primary_key
The relationship is correctly declared in the join

# Without symmetric aggregates (naive SQL)
# SELECT SUM(orders.order_value) will be inflated if orders joins one_to_many to line_items

# With symmetric aggregates (Looker-generated SQL)
# SELECT SUM(DISTINCT (CAST(FLOOR(COALESCE(orders.order_value, 0)*(1000000*1.0)) AS FLOAT64))
#   + CAST(orders.order_id AS FLOAT64) * 0.000001) ...
# This is how Looker prevents double-counting -- it's generated automatically

Calculated Metrics: The Governance Payoff

The real value of LookML becomes clear when you build calculated metrics that span multiple sources. These metrics are defined once, documented inline, and available to every user instantly. There is no copy-paste, no version drift, no “which formula is correct” debate.

Simple measures

measure: conversion_rate {
  label: "Web Conversion Rate (SFCC/GA4)"
  type: number
  sql: ${sfcc_orders.confirmed_order_count}
       / NULLIF(${ga4_sessions.session_count}, 0) ;;
  value_format_name: percent_2
  description: "Confirmed SFCC web orders divided by GA4 sessions with a purchase intent signal.
                Note: this is based on matched records only. Unmatched orders inflate SFCC numerator."
}

measure: revenue_discrepancy {
  label: "Revenue Gap SFCC vs GA4 (€)"
  type: number
  sql: ${sfcc_orders.net_revenue} - ${ga4_sessions.reported_revenue} ;;
  value_format_name: decimal_2
  description: "Positive = SFCC reports more than GA4 (normal). Negative = GA4 over-reports (check refund events)."
}

measure: revenue_discrepancy_pct {
  label: "Revenue Gap %"
  type: number
  sql: (${sfcc_orders.net_revenue} - ${ga4_sessions.reported_revenue})
       / NULLIF(${sfcc_orders.net_revenue}, 0) ;;
  value_format_name: percent_1
}

Filtered measures

LookML measures can include inline filters, creating pre-segmented metrics that are always consistent:

measure: web_order_count {
  label: "Web Orders"
  type: count_distinct
  sql: ${sfcc_orders.order_id} ;;
  filters: [sfcc_orders.channel: "web"]
  description: "SFCC orders from the web storefront channel only."
}

measure: b2b_revenue {
  label: "B2B Revenue"
  type: sum
  sql: ${sfcc_orders.order_value} ;;
  filters: [sfcc_orders.customer_type: "b2b"]
  value_format_name: decimal_2
  description: "Revenue from B2B customer segments. Excluded from digital attribution analysis."
}

measure: high_value_order_rate {
  label: "High-Value Order Rate"
  type: number
  sql: COUNT(DISTINCT CASE WHEN ${sfcc_orders.order_value} > 500
                           THEN ${sfcc_orders.order_id} END)
       / NULLIF(${sfcc_orders.confirmed_order_count}, 0) ;;
  value_format_name: percent_1
  description: "Share of orders with order value above €500."
}

Derived Tables: Virtual Tables for Complex Transformations

When your transformation logic cannot be expressed as a simple field calculation, Looker supports derived tables: virtual tables defined by a SQL query within LookML. They behave like real database tables in the model — you can join them, expose their fields in Explores, and build measures on top of them.

Native derived tables

view: reconciliation_base {
  derived_table: {
    sql:
      SELECT
        s.order_id,
        s.created_at                                          AS order_date,
        s.channel,
        s.customer_type,
        s.order_value                                         AS sfcc_revenue,
        COALESCE(s.refund_amount, 0)                          AS refund_amount,
        s.order_value - COALESCE(s.refund_amount, 0)          AS sfcc_net_revenue,
        g.session_id,
        g.reported_revenue                                    AS ga4_revenue,
        g.session_source,
        g.session_medium,
        g.session_campaign,
        CASE
          WHEN g.session_id IS NOT NULL                       THEN 'tracked_web'
          WHEN s.channel IN ('store', 'pos', 'kiosk')        THEN 'offline_store'
          WHEN s.channel IN ('phone', 'callcenter')          THEN 'offline_callcenter'
          WHEN s.customer_type IN ('b2b', 'wholesale')       THEN 'b2b_wholesale'
          WHEN s.channel IN ('web', 'storefront')            THEN 'dark_traffic_web'
          WHEN s.order_type = 'subscription'                 THEN 'subscription'
          ELSE 'unknown'
        END                                                   AS order_origin,
        s.order_value - COALESCE(g.reported_revenue, 0)       AS revenue_delta
      FROM `project.reconciliation.normalised_sfcc_orders` s
      LEFT JOIN `project.reconciliation.normalised_ga4_purchases` g
        ON s.normalised_order_id = g.normalised_transaction_id
      WHERE s.status NOT IN ('cancelled', 'fraud', 'test')
    ;;

    # Rebuild strategy: daily, triggered by date change
    sql_trigger_value: SELECT CURRENT_DATE() ;;

    # Optional: partition and cluster for performance
    partition_keys: ["order_date"]
    cluster_keys: ["channel", "customer_type"]
  }

  dimension: order_id          { type: string  sql: ${TABLE}.order_id ;;          primary_key: yes hidden: yes }
  dimension: order_date        { type: date    sql: ${TABLE}.order_date ;;         label: "Order Date" }
  dimension: channel           { type: string  sql: ${TABLE}.channel ;;            label: "SFCC Channel" }
  dimension: order_origin      { type: string  sql: ${TABLE}.order_origin ;;       label: "Order Origin" }
  dimension: session_source    { type: string  sql: ${TABLE}.session_source ;;     label: "GA4 Source" }
  dimension: session_medium    { type: string  sql: ${TABLE}.session_medium ;;     label: "GA4 Medium" }
  dimension: session_campaign  { type: string  sql: ${TABLE}.session_campaign ;;   label: "GA4 Campaign" }

  measure: total_sfcc_net_revenue {
    label: "SFCC Net Revenue"
    type: sum
    sql: ${TABLE}.sfcc_net_revenue ;;
    value_format_name: decimal_2
  }
  measure: total_ga4_revenue {
    label: "GA4 Reported Revenue"
    type: sum
    sql: ${TABLE}.ga4_revenue ;;
    value_format_name: decimal_2
  }
  measure: avg_revenue_delta {
    label: "Avg Revenue Delta per Order"
    type: average
    sql: ${TABLE}.revenue_delta ;;
    value_format_name: decimal_2
  }
}

Persistent derived tables (PDTs)

For large derived tables that are expensive to compute, Looker supports Persistent Derived Tables (PDTs) — the derived table is materialised in the database and cached until a rebuild trigger fires. This dramatically improves query performance for complex joins.

view: reconciliation_pdt {
  derived_table: {
    sql: SELECT ... FROM `project.sfcc.orders` s
         LEFT JOIN `project.ga4.purchases` g ON ... ;;

    # Rebuild every night at 3am after GA4 and SFCC exports complete
    datagroup_trigger: nightly_etl

    # Store in the project.looker_pdts dataset
    sql_create: CREATE TABLE IF NOT EXISTS ${SQL_TABLE_NAME} ...
  }
}

# Define the datagroup in the model file
datagroup: nightly_etl {
  cron: "0 3 * * *"   # Every day at 3am UTC
  max_cache_age: "24 hours"
}

Access Control: Row-Level and Field-Level Security

Looker has a sophisticated permission system built around user attributes — key-value pairs assigned to users or groups. These can be used to restrict what data users see without creating separate dashboards or reports.

Row-level security with access filters

# In the model file: restrict which regions a user can see
explore: order_reconciliation {
  access_filter: {
    field: sfcc_orders.region
    user_attribute: allowed_region
  }

  # User with allowed_region = "FR" only sees French orders
  # User with allowed_region = "ALL" sees everything
}

Field-level access with required access grants

# Define a named access grant
access_grant: finance_only {
  user_attribute: finance_access
  allowed_values: ["yes"]
}

# Apply to sensitive fields
measure: gross_margin {
  type: number
  sql: (${net_revenue} - ${cost_of_goods}) / NULLIF(${net_revenue}, 0) ;;
  required_access_grants: [finance_only]
  # Users without finance_access=yes cannot see this field
}

Scheduling, Alerts, and the REST API

Looker is not just a query tool. It is an operational platform.

Scheduled deliveries

Any Look or dashboard can be scheduled for delivery via email, Slack, Google Chat, S3, SFTP, or webhook. Deliveries can be filtered by user attribute — the same dashboard sent to ten regional managers will show each one only their region’s data.

Threshold alerts

Users can set alerts on any metric: “notify me via Slack when the web dark traffic rate exceeds 15%”. Looker polls the query on a schedule and fires the alert when the condition is met. No engineering required.

REST API

Looker exposes a full REST API that covers every action available in the UI: run queries, download data, manage users and groups, update user attributes, trigger PDT rebuilds, and embed dashboards. This enables Looker to be used as a data delivery engine, not just a dashboard tool.

# Run a Looker query via the API (Python example)
import looker_sdk

sdk = looker_sdk.init40()

result = sdk.run_inline_query(
    result_format="json",
    body=looker_sdk.models.WriteQuery(
        model="reconciliation",
        view="order_reconciliation",
        fields=["order_origin", "sfcc_orders.confirmed_order_count",
                "sfcc_orders.net_revenue", "revenue_discrepancy"],
        filters={"order_reconciliation.order_date": "last 30 days"},
        sorts=["sfcc_orders.net_revenue desc"],
        limit=500
    )
)
# Returns JSON data usable in any downstream system

Performance: Making Looker Fast at Scale

Looker generates SQL and sends it to your database. Its performance ceiling is therefore the performance of your warehouse — BigQuery in most modern stacks. A few patterns consistently make a difference:

Use PDTs for expensive joins. A pre-materialised reconciliation table that BigQuery computes once nightly is orders of magnitude faster than recomputing the full GA4/SFCC join on every dashboard load.
Partition and cluster your derived tables. Declaring partition keys (e.g. order date) and cluster keys (e.g. channel) in your PDT definition lets BigQuery skip entire partitions, reducing query cost dramatically.
Use datagroups for cache control. Configure a datagroup tied to your ETL completion time so Looker serves cached results until new data is available, then invalidates the cache precisely when it should.
Limit the Explore field picker. Use hidden: yes aggressively. An Explore with 300 fields is slower to render and harder to use than one with 60 well-chosen fields.
Avoid unbounded liquid filters. If a dashboard has no date filter and queries a full GA4 events table, BigQuery will scan everything. Always set default filters on date dimensions in your Explores.

What Looker Is Not Good For

Looker is powerful and expensive. It is not the right tool for every situation. Be honest about these limitations:

Ad hoc exploration for a single analyst. If one person needs to explore data quickly, a Jupyter notebook or even Looker Studio is faster to set up.
Real-time streaming data. Looker queries a database on demand. It is not a streaming analytics tool. For millisecond-latency requirements, you need a different architecture.
Highly customised UI. Looker dashboards are functional but not as visually flexible as custom-built React dashboards. For pixel-perfect design requirements, use the embed API with a custom frontend.
Small teams without a data engineer. LookML requires someone who knows SQL and can maintain a codebase. Without that person, the model becomes a liability.

What Comes Next in This Series

Part 2: Why GA4, SFCC and offline orders will never match — the 5 structural causes and what to do about them
Part 3: The transaction ID problem — every failure mode, the full SQL diagnostic sequence, and how to fix it in LookML
Part 4: LookML metric governance — defining revenue, discrepancy, and conversion rate once, with full documentation and version control
Part 5: Dark traffic and orphan orders — classifying the unmatched orders and building a complete reconciliation dashboard

Sources and Further Reading

Looker official LookML reference documentation: cloud.google.com/looker/docs/reference/lookml-quick-reference
Google Cloud blog on Looker and the semantic layer: cloud.google.com/blog/topics
LookML best practices — Google documentation: cloud.google.com/looker/docs/best-practices
Symmetric aggregates explanation: cloud.google.com/looker/docs — symmetric aggregates
Persistent derived tables: cloud.google.com/looker/docs/derived-tables
Looker API reference: developers.looker.com/api/explorer
dbt + Looker integration guide: docs.getdbt.com
GA4 BigQuery Export schema: support.google.com/analytics/answer/7029846