Why GA4, SFCC and Offline Orders Will Never Match — And What That Gap Actually Tells You

Every e-commerce analytics project eventually reaches the same moment. Someone opens three tabs — GA4, the SFCC order export, and the offline file — and says: “These numbers don’t match. Which one is right?”

The answer that ends the confusion is also the one that opens up a much more interesting set of questions: all three are right, and the differences between them are not errors to be fixed — they are information to be extracted.

This article is the second in a series on building a GA4/SFCC/offline reconciliation layer in Looker. Before any pipeline, any BigQuery schema, any LookML model — you need to understand, precisely, why these sources diverge. This is not a technical problem. It is a conceptual one.

Three Systems, Three Realities

The core issue is simple to state but easy to underestimate: GA4, SFCC, and your offline data sources are not measuring the same event. They have different observation points, different capture mechanisms, and different definitions of what constitutes a transaction.

Figure 1 — Each system has a distinct observation window. The gaps between them are structural, not accidental. Reconciliation means understanding what lives in each column, not forcing them to agree.

GA4: What the Browser Decided to Tell You

GA4 is a client-side event collection system. Its entire picture of your business depends on what JavaScript code fires in a user’s browser, what the user’s browser permits to be sent, and whether a persistent cookie links that event to a prior session. Every one of these conditions can fail silently.

What GA4 captures well

Session-level behaviour: pageviews, scroll depth, click events, add-to-cart, checkout steps
Attribution paths: the channel, campaign, and source that brought a user to a session
Purchase events: revenue, items, transaction ID — when the purchase event fires correctly
User identity signals: client ID (cookie-based), user ID (if implemented), GCLID

What GA4 misses structurally

Any order placed outside a browser: phone, POS, API, back-office. GA4 never existed in those flows.
Orders where the confirmation page didn’t load: user closed the tab, payment provider timeout, slow 3G connection. SFCC confirmed the order; GA4 never saw it.
Orders from consenting-disabled sessions: with Consent Mode v2, if a user declines analytics cookies, GA4 relies on modelled conversions. These modelled events are estimates, not exact records, and do not have individual transaction IDs that can be joined to SFCC.
Refunded and cancelled orders: GA4 recorded the revenue at purchase time. Unless you implement a refund event (with a negative revenue value), GA4 never learns that the order was reversed.
Long-cycle journeys on Safari: ITP erases first-party cookies after 7 days of inactivity. A user with a 10-day consideration cycle is a new visitor every time they return. Their final purchase session exists in GA4, but the original acquisition attribution is permanently lost.

The revenue number in GA4 is therefore: the sum of purchase event values that successfully reached Google’s collection endpoint from a browser session where a cookie was present and tracking was permitted, as of the moment the event fired. That is a very specific and bounded definition of revenue.

SFCC: The Accounting Source of Truth

Salesforce Commerce Cloud is an order management system. Its job is to record every commercial transaction, track its lifecycle, and provide the data that feeds fulfilment, finance, and customer service. It does not care how the customer arrived. It cares that an order was placed, confirmed, and either fulfilled or reversed.

SFCC’s order lifecycle

Order lifecycle in SFCC:

created → payment_pending → payment_confirmed → exported (to WMS)
                                      ↓
                              shipped → delivered
                                      ↓
                              return_requested → return_received → refunded

Alternative paths:
created → payment_failed → (abandoned)
created → fraud_review → cancelled

GA4 fires its purchase event at the “payment_confirmed” step above — specifically, when the confirmation page loads in the browser. But the SFCC record persists through the entire lifecycle. This is why comparing GA4 “revenue” (a snapshot at confirmation time) with SFCC “net revenue” (which includes all subsequent lifecycle events) creates a structural discrepancy that is not a bug.

What SFCC knows that GA4 doesn’t

The final order status: whether it was fulfilled, returned, or refunded
The exact billed amount including taxes, discounts applied server-side, and loyalty point redemptions
The source system that created the order (web, agent, API, POS sync)
Every order regardless of how or where it was placed
Customer identity at the account level, not the anonymous cookie level

What SFCC doesn’t know

SFCC has no concept of a marketing channel. It records the order; it does not know whether that order came from a Google Ads click, an organic search, a direct visit, or a referral from an email. That attribution layer lives entirely in GA4. This is why you need both systems in your reconciliation model, and why neither one alone is sufficient for marketing analysis.

The Five Structural Causes of Discrepancy

1. Temporal Scope Mismatch

GA4 and SFCC do not timestamp events the same way. GA4 fires when the confirmation page loads in the browser. SFCC records when payment is confirmed in the backend — which happens after payment gateway validation, fraud check, and inventory reservation. This lag is typically between 0 and 30 seconds, but on edge cases (manual fraud review, delayed payment provider callback) it can be hours or even days.

On intra-month reporting windows, this lag is usually negligible. On period boundaries, it creates real discrepancies:

Real example:
User completes checkout at 23:59:57 on December 31st
  → GA4 fires purchase event: December (revenue lands in Dec)
  → Payment gateway confirms at 00:00:04 January 1st
  → SFCC records order: January (revenue lands in Jan)

Result: This order appears in GA4 December report but not SFCC December report.
On monthly comparison, the gap looks like a December underperformance in SFCC.

Diagnosis: Compare daily order counts between GA4 and SFCC over a 30-day period. If you see a consistent 1-2 day rolling offset in the delta (some days GA4 is higher, the next day SFCC catches up), you are looking at a timing issue, not a structural one. It typically resolves on monthly aggregation.

2. Revenue Definition Mismatch

This is the most commonly underdiagnosed cause of discrepancy. GA4 receives whatever revenue value your dataLayer pushes. SFCC records its own financial definition of order value. Unless these are explicitly aligned, they will diverge on every order that involves any of the following:

Revenue component	Typical GA4 behaviour	Typical SFCC behaviour	Impact if misaligned
Shipping costs	Depends on dataLayer implementation — often excluded	Always included in order_value	GA4 understates revenue on every order with shipping
VAT / taxes	Often excluded (HT) in European implementations	Recorded both gross and net	GA4 and SFCC use different bases — up to 20% difference
Promotional discounts	Often pre-discount value sent to dataLayer	Post-discount amount is the order_value	GA4 overstates revenue on every discounted order
Loyalty point redemptions	Rarely deducted	Reduces billed amount	GA4 overstates for loyalty programme users
Currency	Depends on dataLayer currency parameter	Stored in original currency + converted	Multi-currency sites can have systematic FX differences

Fix: Before any technical work, decide explicitly what “revenue” means: gross or net? Including or excluding shipping? Including or excluding VAT? Then audit your dataLayer implementation against SFCC’s order_value for a sample of 50 orders. If the values systematically differ by a fixed percentage, you have a definition problem to fix at source.

3. Cancelled and Refunded Orders

GA4 records revenue at the moment the purchase event fires. SFCC tracks the order through its entire lifecycle. If a customer requests a refund two weeks later, SFCC adjusts net revenue. GA4 will not, unless you explicitly send a refund event.

The GA4 refund event implementation requires either a server-side trigger (Measurement Protocol) or a GTM tag that fires when your order management system processes the refund. Most implementations do not have this. The result: GA4 permanently overstates revenue for any product category with a meaningful return rate.

-- Quantify the refund gap in BigQuery
SELECT
  DATE_TRUNC(created_at, MONTH)         AS month,
  SUM(order_value)                       AS sfcc_gross_revenue,
  SUM(order_value - COALESCE(refund_amount, 0)) AS sfcc_net_revenue,
  SUM(COALESCE(refund_amount, 0))        AS total_refunds,
  ROUND(
    SUM(COALESCE(refund_amount, 0))
    / NULLIF(SUM(order_value), 0) * 100, 1
  )                                      AS refund_rate_pct
FROM `project.sfcc.orders`
WHERE status IN ('refunded', 'partially_refunded')
  AND DATE(created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH)
GROUP BY 1
ORDER BY 1

-- If refund_rate_pct is consistently above 5%, the GA4 vs SFCC gap
-- is largely explained by missing refund events in GA4.

4. Cookie Loss from ITP and Consent Mode

Apple’s Intelligent Tracking Prevention (ITP) erases first-party cookies after 7 days of inactivity on Safari. Consent Mode v2 means that users who decline analytics cookies generate modelled conversions rather than individual events. Both of these create a category of orders that exist in SFCC but have no individual purchase event in GA4’s raw event data.

The ITP impact depends on your user base:

If your average consideration cycle is less than 7 days, ITP has minimal impact
If you sell products where customers research for 1–4 weeks before buying (travel, electronics, luxury goods), ITP is a structural and permanent attribution problem
Safari market share in Western Europe is approximately 28–32% on desktop and 50–60% on mobile (2024 data). For mobile-heavy demographics, this is not an edge case.

5. Browserless Order Environments

A significant and growing share of e-commerce orders is placed in environments where GA4 tags either cannot fire or fire incompletely:

In-app WebViews (Instagram, TikTok, Gmail, WhatsApp): When a user opens a link within one of these apps, it renders in the app’s internal browser. Many WebViews block or restrict JavaScript, prevent cookies from persisting, or isolate the session from the user’s main browser. Orders placed in these contexts typically land in SFCC without a matching GA4 event. For businesses with active social advertising, WebView traffic can be 20–40% of mobile sessions.
Headless commerce architectures: If the storefront is decoupled from the server-side rendering layer, the tag injection depends on the frontend framework’s implementation. Missing the confirmation page trigger is common and often undiscovered.
Native mobile apps: Unless Firebase/GA4 event tracking was explicitly implemented in the iOS and Android apps, every purchase via the app lands in SFCC without a GA4 event. This is often a legacy implementation gap — the web was tracked, the app was not.
Call centre and back-office orders: Agents create orders directly in SFCC. There is no browser, no session, no tag. This is expected and correct behaviour. These orders should be classified as offline, not treated as tracking failures.

The Full Data Flow Architecture

Figure 2 — Data flow from source systems to the reconciliation layer. The key principle: land everything in BigQuery first, normalise at the BigQuery layer, then let Looker handle modeling and presentation.

What the Gap Is Actually Telling You

Once you understand the structural causes, the gap between systems stops being a problem to solve and starts being a measurement instrument. Here is what each component of the gap tells you about your business:

Revenue invisibility rate

The share of SFCC revenue with no matching GA4 session tells you exactly how much of your actual business is invisible to your analytics tool. If this number is 30%, then every ROAS calculation, every channel attribution, every A/B test result you produce is based on 70% of your actual revenue. The invisible 30% is distributed across channels in an unknown way, which means all your efficiency metrics are systematically biased.

Channel under-valuation signal

If a significant share of your revenue comes from call-centre orders influenced by advertising, and those orders never appear in GA4, your paid media ROAS is understated. You may be cutting budgets on channels that are actually performing well, simply because their conversions route through an untracked channel.

Tracking implementation quality score

After accounting for structural offline causes (store orders, B2B API, call centre), the remaining SFCC web orders with no GA4 match represent real tracking failures: missing tags, consent mode misconfiguration, WebView failures, ad blocker suppression. Tracking this metric over time gives you a continuous implementation health score. A spike on any given day indicates a deployment broke something. A gradual increase indicates a systemic degradation.

The Four Business Decisions Before Any Technical Work

Before writing a single SQL query, you need answers to these four questions. They are business decisions, not technical ones. If they are not answered, any technical work you do will be built on a foundation of ambiguity.

Decision 1: What is revenue?

Write it down explicitly. Does “revenue” mean:

Gross confirmed order value (including shipping, including VAT, before refunds)?
Net billed amount (excluding VAT, excluding shipping, after discounts)?
Net settled amount (after refunds, after chargebacks)?

There is no universally correct answer. Different teams need different definitions. Finance wants net settled. Marketing wants gross billed. The reconciliation dashboard should expose all three, with the definition clearly labelled on every metric.

Decision 2: What is the join key?

Does the transaction_id in GA4 correspond to the order_id in SFCC? Always? After normalisation? Never? The next article in this series covers this in detail — it is the single most common point of failure in any reconciliation project.

Decision 3: Which orders are in scope?

Are you reconciling web-only orders? All channels? With or without B2B? With or without cancelled orders? The scope decision determines your denominator for every discrepancy metric. A 20% gap with only web orders in scope means something very different from a 20% gap with all channels included.

Decision 4: How do you handle refunds?

Comparing GA4 gross reported revenue (which includes future refunds that haven’t been sent back) to SFCC net revenue (which deducts refunds as they happen) is comparing incompatible things. You either need to align the refund treatment in GA4 (by implementing refund events), or build your reconciliation comparison on SFCC gross confirmed revenue to match GA4’s snapshot nature.

Practical Diagnostic: Your First BigQuery Queries

Once you have GA4 export and SFCC data in BigQuery, run this sequence before building anything in Looker. These queries establish your baseline and categorise the gap.

Query 1: Daily volume comparison (web orders only)

SELECT
  date,
  SUM(ga4_orders)                  AS ga4_orders,
  SUM(sfcc_orders)                 AS sfcc_orders,
  SUM(sfcc_orders) - SUM(ga4_orders) AS delta,
  ROUND(
    (SUM(sfcc_orders) - SUM(ga4_orders))
    / NULLIF(SUM(sfcc_orders), 0) * 100, 1
  )                                AS gap_pct
FROM (
  -- GA4 purchases
  SELECT
    DATE(event_date)                AS date,
    COUNT(DISTINCT transaction_id)  AS ga4_orders,
    0                               AS sfcc_orders
  FROM `project.analytics.events`
  WHERE event_name = 'purchase'
    AND DATE(event_date) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY 1

  UNION ALL

  -- SFCC web orders only
  SELECT
    DATE(created_at),
    0,
    COUNT(DISTINCT order_id)
  FROM `project.sfcc.orders`
  WHERE channel IN ('web', 'storefront')
    AND status NOT IN ('cancelled', 'fraud', 'test')
    AND DATE(created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY 1
)
GROUP BY date
ORDER BY date;

-- Interpretation:
-- Consistent gap: structural issue (tracking quality, consent mode)
-- Spiky gap: event-based issue (deployment broke tag on specific day)
-- Rolling 1-2 day offset: timing mismatch between systems

Query 2: Revenue gap with component breakdown

SELECT
  DATE_TRUNC(s.created_at, MONTH)              AS month,

  -- SFCC figures
  COUNT(DISTINCT s.order_id)                   AS sfcc_orders,
  SUM(s.order_value)                           AS sfcc_gross_revenue,
  SUM(s.order_value - COALESCE(s.refund_amount, 0)) AS sfcc_net_revenue,

  -- GA4 figures (matched only)
  COUNT(DISTINCT g.transaction_id)             AS ga4_matched_orders,
  SUM(g.reported_revenue)                      AS ga4_reported_revenue,

  -- Discrepancy
  SUM(s.order_value) - SUM(g.reported_revenue) AS gross_delta,
  SUM(s.order_value - COALESCE(s.refund_amount, 0))
    - SUM(g.reported_revenue)                  AS net_delta,

  -- Rates
  ROUND(
    COUNT(DISTINCT g.transaction_id)
    / NULLIF(COUNT(DISTINCT s.order_id), 0) * 100, 1
  )                                            AS match_rate_pct,
  ROUND(
    (COUNT(DISTINCT s.order_id) - COUNT(DISTINCT g.transaction_id))
    / NULLIF(COUNT(DISTINCT s.order_id), 0) * 100, 1
  )                                            AS unmatched_pct

FROM `project.sfcc.orders` s
LEFT JOIN `project.analytics.ga4_purchases` g
  ON LOWER(TRIM(s.order_id)) = LOWER(TRIM(g.transaction_id))
WHERE s.channel IN ('web', 'storefront')
  AND s.status NOT IN ('cancelled', 'fraud')
  AND DATE(s.created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH)
GROUP BY 1
ORDER BY 1;

Query 3: Revenue gap by product category

SELECT
  s.product_category,
  COUNT(DISTINCT s.order_id)                   AS sfcc_orders,
  COUNT(DISTINCT g.transaction_id)             AS ga4_matched,
  ROUND(
    COUNT(DISTINCT g.transaction_id)
    / NULLIF(COUNT(DISTINCT s.order_id), 0) * 100, 1
  )                                            AS match_rate_pct,
  SUM(s.order_value)                           AS sfcc_revenue,
  SUM(g.reported_revenue)                      AS ga4_revenue,
  SUM(s.order_value) - SUM(COALESCE(g.reported_revenue, 0)) AS revenue_gap
FROM `project.sfcc.orders` s
LEFT JOIN `project.analytics.ga4_purchases` g
  ON LOWER(TRIM(s.order_id)) = LOWER(TRIM(g.transaction_id))
WHERE s.status NOT IN ('cancelled', 'fraud')
  AND DATE(s.created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY 1
ORDER BY revenue_gap DESC;

-- Categories with a systematically low match rate may:
-- (a) have a broken tag on their specific product pages
-- (b) be sold primarily via offline / B2B channels
-- (c) have a different checkout flow with missing trigger

What Comes Next in This Series

Part 3: The transaction ID problem — every failure mode and the complete SQL diagnostic sequence
Part 4: LookML metric governance — defining revenue and discrepancy metrics once and enforcing them everywhere
Part 5: Dark traffic and orphan orders — classifying the unmatched orders and building a complete Looker reconciliation dashboard

Sources and Further Reading

GA4 ecommerce event documentation: developers.google.com/analytics/devguides/collection/ga4/ecommerce
GA4 Measurement Protocol (for server-side refund events): developers.google.com/analytics/devguides/collection/protocol/ga4
Apple ITP documentation: webkit.org/tracking-prevention
Google Consent Mode v2 guide: developers.google.com/tag-platform/security/guides/consent
GA4 BigQuery Export schema reference: support.google.com/analytics/answer/7029846
Salesforce Commerce Cloud Order Management documentation: documentation.b2c.commercecloud.salesforce.com
ITP impact on analytics — Simo Ahava deep dive: simoahava.com — ITP and web analytics
Safari WebView tracking limitations: developers.google.com/tag-platform/tag-manager/web/webview