Every e-commerce analytics project eventually reaches the same moment. Someone opens three tabs — GA4, the SFCC order export, and the offline file — and says: “These numbers don’t match. Which one is right?”

The answer that ends the confusion is also the one that opens up a much more interesting set of questions: all three are right, and the differences between them are not errors to be fixed — they are information to be extracted.

This article is the second in a series on building a GA4/SFCC/offline reconciliation layer in Looker. Before any pipeline, any BigQuery schema, any LookML model — you need to understand, precisely, why these sources diverge. This is not a technical problem. It is a conceptual one.


Three Systems, Three Realities

The core issue is simple to state but easy to underestimate: GA4, SFCC, and your offline data sources are not measuring the same event. They have different observation points, different capture mechanisms, and different definitions of what constitutes a transaction.

Three Systems, Three Observation Points GA4 Browser-side event capture ✔ Web purchases (cookie present) ✔ Session attribution paths ✔ Funnel & behaviour data ✗ Offline orders ✗ Safari ITP losses (7-day) ✗ Refunds (unless implemented) ✗ WebView / in-app orders ✗ B2B / back-office orders Fires at browser purchase event SFCC Order management system ✔ All web orders confirmed ✔ Back-office / agent orders ✔ Order lifecycle (cancel/refund) ✔ Gross + net revenue ✔ Shipping, taxes, discounts ✗ Attribution / channel source ✗ Session behaviour data ✗ Marketing touchpoints Records at order confirmation OFFLINE / ERP POS, call centre, wholesale ✔ In-store POS transactions ✔ Phone / call centre orders ✔ Wholesale / B2B orders ✔ ERP-imported transactions ✗ No GA4 session ever ✗ No digital attribution ✗ No browser touchpoints Syncs to SFCC or BigQuery Green = captures · Red = blind spot · Reconciliation joins all three in BigQuery
Figure 1 — Each system has a distinct observation window. The gaps between them are structural, not accidental. Reconciliation means understanding what lives in each column, not forcing them to agree.

GA4: What the Browser Decided to Tell You

GA4 is a client-side event collection system. Its entire picture of your business depends on what JavaScript code fires in a user’s browser, what the user’s browser permits to be sent, and whether a persistent cookie links that event to a prior session. Every one of these conditions can fail silently.

What GA4 captures well

What GA4 misses structurally

The revenue number in GA4 is therefore: the sum of purchase event values that successfully reached Google’s collection endpoint from a browser session where a cookie was present and tracking was permitted, as of the moment the event fired. That is a very specific and bounded definition of revenue.


SFCC: The Accounting Source of Truth

Salesforce Commerce Cloud is an order management system. Its job is to record every commercial transaction, track its lifecycle, and provide the data that feeds fulfilment, finance, and customer service. It does not care how the customer arrived. It cares that an order was placed, confirmed, and either fulfilled or reversed.

SFCC’s order lifecycle

Order lifecycle in SFCC:

created → payment_pending → payment_confirmed → exported (to WMS)
                                      ↓
                              shipped → delivered
                                      ↓
                              return_requested → return_received → refunded

Alternative paths:
created → payment_failed → (abandoned)
created → fraud_review → cancelled

GA4 fires its purchase event at the “payment_confirmed” step above — specifically, when the confirmation page loads in the browser. But the SFCC record persists through the entire lifecycle. This is why comparing GA4 “revenue” (a snapshot at confirmation time) with SFCC “net revenue” (which includes all subsequent lifecycle events) creates a structural discrepancy that is not a bug.

What SFCC knows that GA4 doesn’t

What SFCC doesn’t know

SFCC has no concept of a marketing channel. It records the order; it does not know whether that order came from a Google Ads click, an organic search, a direct visit, or a referral from an email. That attribution layer lives entirely in GA4. This is why you need both systems in your reconciliation model, and why neither one alone is sufficient for marketing analysis.


The Five Structural Causes of Discrepancy

1. Temporal Scope Mismatch

GA4 and SFCC do not timestamp events the same way. GA4 fires when the confirmation page loads in the browser. SFCC records when payment is confirmed in the backend — which happens after payment gateway validation, fraud check, and inventory reservation. This lag is typically between 0 and 30 seconds, but on edge cases (manual fraud review, delayed payment provider callback) it can be hours or even days.

On intra-month reporting windows, this lag is usually negligible. On period boundaries, it creates real discrepancies:

Real example:
User completes checkout at 23:59:57 on December 31st
  → GA4 fires purchase event: December (revenue lands in Dec)
  → Payment gateway confirms at 00:00:04 January 1st
  → SFCC records order: January (revenue lands in Jan)

Result: This order appears in GA4 December report but not SFCC December report.
On monthly comparison, the gap looks like a December underperformance in SFCC.

Diagnosis: Compare daily order counts between GA4 and SFCC over a 30-day period. If you see a consistent 1-2 day rolling offset in the delta (some days GA4 is higher, the next day SFCC catches up), you are looking at a timing issue, not a structural one. It typically resolves on monthly aggregation.

2. Revenue Definition Mismatch

This is the most commonly underdiagnosed cause of discrepancy. GA4 receives whatever revenue value your dataLayer pushes. SFCC records its own financial definition of order value. Unless these are explicitly aligned, they will diverge on every order that involves any of the following:

Revenue componentTypical GA4 behaviourTypical SFCC behaviourImpact if misaligned
Shipping costsDepends on dataLayer implementation — often excludedAlways included in order_valueGA4 understates revenue on every order with shipping
VAT / taxesOften excluded (HT) in European implementationsRecorded both gross and netGA4 and SFCC use different bases — up to 20% difference
Promotional discountsOften pre-discount value sent to dataLayerPost-discount amount is the order_valueGA4 overstates revenue on every discounted order
Loyalty point redemptionsRarely deductedReduces billed amountGA4 overstates for loyalty programme users
CurrencyDepends on dataLayer currency parameterStored in original currency + convertedMulti-currency sites can have systematic FX differences

Fix: Before any technical work, decide explicitly what “revenue” means: gross or net? Including or excluding shipping? Including or excluding VAT? Then audit your dataLayer implementation against SFCC’s order_value for a sample of 50 orders. If the values systematically differ by a fixed percentage, you have a definition problem to fix at source.

3. Cancelled and Refunded Orders

GA4 records revenue at the moment the purchase event fires. SFCC tracks the order through its entire lifecycle. If a customer requests a refund two weeks later, SFCC adjusts net revenue. GA4 will not, unless you explicitly send a refund event.

The GA4 refund event implementation requires either a server-side trigger (Measurement Protocol) or a GTM tag that fires when your order management system processes the refund. Most implementations do not have this. The result: GA4 permanently overstates revenue for any product category with a meaningful return rate.

-- Quantify the refund gap in BigQuery
SELECT
  DATE_TRUNC(created_at, MONTH)         AS month,
  SUM(order_value)                       AS sfcc_gross_revenue,
  SUM(order_value - COALESCE(refund_amount, 0)) AS sfcc_net_revenue,
  SUM(COALESCE(refund_amount, 0))        AS total_refunds,
  ROUND(
    SUM(COALESCE(refund_amount, 0))
    / NULLIF(SUM(order_value), 0) * 100, 1
  )                                      AS refund_rate_pct
FROM `project.sfcc.orders`
WHERE status IN ('refunded', 'partially_refunded')
  AND DATE(created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH)
GROUP BY 1
ORDER BY 1

-- If refund_rate_pct is consistently above 5%, the GA4 vs SFCC gap
-- is largely explained by missing refund events in GA4.

4. Cookie Loss from ITP and Consent Mode

Apple’s Intelligent Tracking Prevention (ITP) erases first-party cookies after 7 days of inactivity on Safari. Consent Mode v2 means that users who decline analytics cookies generate modelled conversions rather than individual events. Both of these create a category of orders that exist in SFCC but have no individual purchase event in GA4’s raw event data.

The ITP impact depends on your user base:

5. Browserless Order Environments

A significant and growing share of e-commerce orders is placed in environments where GA4 tags either cannot fire or fire incompletely:


The Full Data Flow Architecture

Data Flow: From Source Systems to Reconciliation Layer Browser / GTM GA4 Collection(events + sessions) GA4 → BigQuery Export(daily, event-level) SFCC Storefront(web + back-office) SFCC Order DB(full lifecycle) SFCC → BigQuery(API or nightly export) POS / Call CentreERP / Wholesale SFCC sync / CSV(or ERP direct) Offline → BigQuery(CSV upload or pipeline) BigQuery Unified warehouse All sources joined Looker LookML model + dashboards Normalisation happens before Looker (in BigQuery views): LOWER(TRIM(transaction_id)) = LOWER(TRIM(order_id)) · Revenue definition aligned · Status filters applied
Figure 2 — Data flow from source systems to the reconciliation layer. The key principle: land everything in BigQuery first, normalise at the BigQuery layer, then let Looker handle modeling and presentation.

What the Gap Is Actually Telling You

Once you understand the structural causes, the gap between systems stops being a problem to solve and starts being a measurement instrument. Here is what each component of the gap tells you about your business:

Revenue invisibility rate

The share of SFCC revenue with no matching GA4 session tells you exactly how much of your actual business is invisible to your analytics tool. If this number is 30%, then every ROAS calculation, every channel attribution, every A/B test result you produce is based on 70% of your actual revenue. The invisible 30% is distributed across channels in an unknown way, which means all your efficiency metrics are systematically biased.

Channel under-valuation signal

If a significant share of your revenue comes from call-centre orders influenced by advertising, and those orders never appear in GA4, your paid media ROAS is understated. You may be cutting budgets on channels that are actually performing well, simply because their conversions route through an untracked channel.

Tracking implementation quality score

After accounting for structural offline causes (store orders, B2B API, call centre), the remaining SFCC web orders with no GA4 match represent real tracking failures: missing tags, consent mode misconfiguration, WebView failures, ad blocker suppression. Tracking this metric over time gives you a continuous implementation health score. A spike on any given day indicates a deployment broke something. A gradual increase indicates a systemic degradation.


The Four Business Decisions Before Any Technical Work

Before writing a single SQL query, you need answers to these four questions. They are business decisions, not technical ones. If they are not answered, any technical work you do will be built on a foundation of ambiguity.

Decision 1: What is revenue?

Write it down explicitly. Does “revenue” mean:

There is no universally correct answer. Different teams need different definitions. Finance wants net settled. Marketing wants gross billed. The reconciliation dashboard should expose all three, with the definition clearly labelled on every metric.

Decision 2: What is the join key?

Does the transaction_id in GA4 correspond to the order_id in SFCC? Always? After normalisation? Never? The next article in this series covers this in detail — it is the single most common point of failure in any reconciliation project.

Decision 3: Which orders are in scope?

Are you reconciling web-only orders? All channels? With or without B2B? With or without cancelled orders? The scope decision determines your denominator for every discrepancy metric. A 20% gap with only web orders in scope means something very different from a 20% gap with all channels included.

Decision 4: How do you handle refunds?

Comparing GA4 gross reported revenue (which includes future refunds that haven’t been sent back) to SFCC net revenue (which deducts refunds as they happen) is comparing incompatible things. You either need to align the refund treatment in GA4 (by implementing refund events), or build your reconciliation comparison on SFCC gross confirmed revenue to match GA4’s snapshot nature.


Practical Diagnostic: Your First BigQuery Queries

Once you have GA4 export and SFCC data in BigQuery, run this sequence before building anything in Looker. These queries establish your baseline and categorise the gap.

Query 1: Daily volume comparison (web orders only)

SELECT
  date,
  SUM(ga4_orders)                  AS ga4_orders,
  SUM(sfcc_orders)                 AS sfcc_orders,
  SUM(sfcc_orders) - SUM(ga4_orders) AS delta,
  ROUND(
    (SUM(sfcc_orders) - SUM(ga4_orders))
    / NULLIF(SUM(sfcc_orders), 0) * 100, 1
  )                                AS gap_pct
FROM (
  -- GA4 purchases
  SELECT
    DATE(event_date)                AS date,
    COUNT(DISTINCT transaction_id)  AS ga4_orders,
    0                               AS sfcc_orders
  FROM `project.analytics.events`
  WHERE event_name = 'purchase'
    AND DATE(event_date) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY 1

  UNION ALL

  -- SFCC web orders only
  SELECT
    DATE(created_at),
    0,
    COUNT(DISTINCT order_id)
  FROM `project.sfcc.orders`
  WHERE channel IN ('web', 'storefront')
    AND status NOT IN ('cancelled', 'fraud', 'test')
    AND DATE(created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY 1
)
GROUP BY date
ORDER BY date;

-- Interpretation:
-- Consistent gap: structural issue (tracking quality, consent mode)
-- Spiky gap: event-based issue (deployment broke tag on specific day)
-- Rolling 1-2 day offset: timing mismatch between systems

Query 2: Revenue gap with component breakdown

SELECT
  DATE_TRUNC(s.created_at, MONTH)              AS month,

  -- SFCC figures
  COUNT(DISTINCT s.order_id)                   AS sfcc_orders,
  SUM(s.order_value)                           AS sfcc_gross_revenue,
  SUM(s.order_value - COALESCE(s.refund_amount, 0)) AS sfcc_net_revenue,

  -- GA4 figures (matched only)
  COUNT(DISTINCT g.transaction_id)             AS ga4_matched_orders,
  SUM(g.reported_revenue)                      AS ga4_reported_revenue,

  -- Discrepancy
  SUM(s.order_value) - SUM(g.reported_revenue) AS gross_delta,
  SUM(s.order_value - COALESCE(s.refund_amount, 0))
    - SUM(g.reported_revenue)                  AS net_delta,

  -- Rates
  ROUND(
    COUNT(DISTINCT g.transaction_id)
    / NULLIF(COUNT(DISTINCT s.order_id), 0) * 100, 1
  )                                            AS match_rate_pct,
  ROUND(
    (COUNT(DISTINCT s.order_id) - COUNT(DISTINCT g.transaction_id))
    / NULLIF(COUNT(DISTINCT s.order_id), 0) * 100, 1
  )                                            AS unmatched_pct

FROM `project.sfcc.orders` s
LEFT JOIN `project.analytics.ga4_purchases` g
  ON LOWER(TRIM(s.order_id)) = LOWER(TRIM(g.transaction_id))
WHERE s.channel IN ('web', 'storefront')
  AND s.status NOT IN ('cancelled', 'fraud')
  AND DATE(s.created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH)
GROUP BY 1
ORDER BY 1;

Query 3: Revenue gap by product category

SELECT
  s.product_category,
  COUNT(DISTINCT s.order_id)                   AS sfcc_orders,
  COUNT(DISTINCT g.transaction_id)             AS ga4_matched,
  ROUND(
    COUNT(DISTINCT g.transaction_id)
    / NULLIF(COUNT(DISTINCT s.order_id), 0) * 100, 1
  )                                            AS match_rate_pct,
  SUM(s.order_value)                           AS sfcc_revenue,
  SUM(g.reported_revenue)                      AS ga4_revenue,
  SUM(s.order_value) - SUM(COALESCE(g.reported_revenue, 0)) AS revenue_gap
FROM `project.sfcc.orders` s
LEFT JOIN `project.analytics.ga4_purchases` g
  ON LOWER(TRIM(s.order_id)) = LOWER(TRIM(g.transaction_id))
WHERE s.status NOT IN ('cancelled', 'fraud')
  AND DATE(s.created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY 1
ORDER BY revenue_gap DESC;

-- Categories with a systematically low match rate may:
-- (a) have a broken tag on their specific product pages
-- (b) be sold primarily via offline / B2B channels
-- (c) have a different checkout flow with missing trigger

What Comes Next in This Series


Sources and Further Reading