Every e-commerce analytics project eventually reaches the same moment. Someone opens three tabs — GA4, the SFCC order export, and the offline file — and says: “These numbers don’t match. Which one is right?”
The answer that ends the confusion is also the one that opens up a much more interesting set of questions: all three are right, and the differences between them are not errors to be fixed — they are information to be extracted.
This article is the second in a series on building a GA4/SFCC/offline reconciliation layer in Looker. Before any pipeline, any BigQuery schema, any LookML model — you need to understand, precisely, why these sources diverge. This is not a technical problem. It is a conceptual one.
Three Systems, Three Realities
The core issue is simple to state but easy to underestimate: GA4, SFCC, and your offline data sources are not measuring the same event. They have different observation points, different capture mechanisms, and different definitions of what constitutes a transaction.
GA4: What the Browser Decided to Tell You
GA4 is a client-side event collection system. Its entire picture of your business depends on what JavaScript code fires in a user’s browser, what the user’s browser permits to be sent, and whether a persistent cookie links that event to a prior session. Every one of these conditions can fail silently.
What GA4 captures well
- Session-level behaviour: pageviews, scroll depth, click events, add-to-cart, checkout steps
- Attribution paths: the channel, campaign, and source that brought a user to a session
- Purchase events: revenue, items, transaction ID — when the
purchaseevent fires correctly - User identity signals: client ID (cookie-based), user ID (if implemented), GCLID
What GA4 misses structurally
- Any order placed outside a browser: phone, POS, API, back-office. GA4 never existed in those flows.
- Orders where the confirmation page didn’t load: user closed the tab, payment provider timeout, slow 3G connection. SFCC confirmed the order; GA4 never saw it.
- Orders from consenting-disabled sessions: with Consent Mode v2, if a user declines analytics cookies, GA4 relies on modelled conversions. These modelled events are estimates, not exact records, and do not have individual transaction IDs that can be joined to SFCC.
- Refunded and cancelled orders: GA4 recorded the revenue at
purchasetime. Unless you implement arefundevent (with a negative revenue value), GA4 never learns that the order was reversed. - Long-cycle journeys on Safari: ITP erases first-party cookies after 7 days of inactivity. A user with a 10-day consideration cycle is a new visitor every time they return. Their final purchase session exists in GA4, but the original acquisition attribution is permanently lost.
The revenue number in GA4 is therefore: the sum of purchase event values that successfully reached Google’s collection endpoint from a browser session where a cookie was present and tracking was permitted, as of the moment the event fired. That is a very specific and bounded definition of revenue.
SFCC: The Accounting Source of Truth
Salesforce Commerce Cloud is an order management system. Its job is to record every commercial transaction, track its lifecycle, and provide the data that feeds fulfilment, finance, and customer service. It does not care how the customer arrived. It cares that an order was placed, confirmed, and either fulfilled or reversed.
SFCC’s order lifecycle
Order lifecycle in SFCC:
created → payment_pending → payment_confirmed → exported (to WMS)
↓
shipped → delivered
↓
return_requested → return_received → refunded
Alternative paths:
created → payment_failed → (abandoned)
created → fraud_review → cancelled
GA4 fires its purchase event at the “payment_confirmed” step above — specifically, when the confirmation page loads in the browser. But the SFCC record persists through the entire lifecycle. This is why comparing GA4 “revenue” (a snapshot at confirmation time) with SFCC “net revenue” (which includes all subsequent lifecycle events) creates a structural discrepancy that is not a bug.
What SFCC knows that GA4 doesn’t
- The final order status: whether it was fulfilled, returned, or refunded
- The exact billed amount including taxes, discounts applied server-side, and loyalty point redemptions
- The source system that created the order (web, agent, API, POS sync)
- Every order regardless of how or where it was placed
- Customer identity at the account level, not the anonymous cookie level
What SFCC doesn’t know
SFCC has no concept of a marketing channel. It records the order; it does not know whether that order came from a Google Ads click, an organic search, a direct visit, or a referral from an email. That attribution layer lives entirely in GA4. This is why you need both systems in your reconciliation model, and why neither one alone is sufficient for marketing analysis.
The Five Structural Causes of Discrepancy
1. Temporal Scope Mismatch
GA4 and SFCC do not timestamp events the same way. GA4 fires when the confirmation page loads in the browser. SFCC records when payment is confirmed in the backend — which happens after payment gateway validation, fraud check, and inventory reservation. This lag is typically between 0 and 30 seconds, but on edge cases (manual fraud review, delayed payment provider callback) it can be hours or even days.
On intra-month reporting windows, this lag is usually negligible. On period boundaries, it creates real discrepancies:
Real example:
User completes checkout at 23:59:57 on December 31st
→ GA4 fires purchase event: December (revenue lands in Dec)
→ Payment gateway confirms at 00:00:04 January 1st
→ SFCC records order: January (revenue lands in Jan)
Result: This order appears in GA4 December report but not SFCC December report.
On monthly comparison, the gap looks like a December underperformance in SFCC.
Diagnosis: Compare daily order counts between GA4 and SFCC over a 30-day period. If you see a consistent 1-2 day rolling offset in the delta (some days GA4 is higher, the next day SFCC catches up), you are looking at a timing issue, not a structural one. It typically resolves on monthly aggregation.
2. Revenue Definition Mismatch
This is the most commonly underdiagnosed cause of discrepancy. GA4 receives whatever revenue value your dataLayer pushes. SFCC records its own financial definition of order value. Unless these are explicitly aligned, they will diverge on every order that involves any of the following:
| Revenue component | Typical GA4 behaviour | Typical SFCC behaviour | Impact if misaligned |
|---|---|---|---|
| Shipping costs | Depends on dataLayer implementation — often excluded | Always included in order_value | GA4 understates revenue on every order with shipping |
| VAT / taxes | Often excluded (HT) in European implementations | Recorded both gross and net | GA4 and SFCC use different bases — up to 20% difference |
| Promotional discounts | Often pre-discount value sent to dataLayer | Post-discount amount is the order_value | GA4 overstates revenue on every discounted order |
| Loyalty point redemptions | Rarely deducted | Reduces billed amount | GA4 overstates for loyalty programme users |
| Currency | Depends on dataLayer currency parameter | Stored in original currency + converted | Multi-currency sites can have systematic FX differences |
Fix: Before any technical work, decide explicitly what “revenue” means: gross or net? Including or excluding shipping? Including or excluding VAT? Then audit your dataLayer implementation against SFCC’s order_value for a sample of 50 orders. If the values systematically differ by a fixed percentage, you have a definition problem to fix at source.
3. Cancelled and Refunded Orders
GA4 records revenue at the moment the purchase event fires. SFCC tracks the order through its entire lifecycle. If a customer requests a refund two weeks later, SFCC adjusts net revenue. GA4 will not, unless you explicitly send a refund event.
The GA4 refund event implementation requires either a server-side trigger (Measurement Protocol) or a GTM tag that fires when your order management system processes the refund. Most implementations do not have this. The result: GA4 permanently overstates revenue for any product category with a meaningful return rate.
-- Quantify the refund gap in BigQuery
SELECT
DATE_TRUNC(created_at, MONTH) AS month,
SUM(order_value) AS sfcc_gross_revenue,
SUM(order_value - COALESCE(refund_amount, 0)) AS sfcc_net_revenue,
SUM(COALESCE(refund_amount, 0)) AS total_refunds,
ROUND(
SUM(COALESCE(refund_amount, 0))
/ NULLIF(SUM(order_value), 0) * 100, 1
) AS refund_rate_pct
FROM `project.sfcc.orders`
WHERE status IN ('refunded', 'partially_refunded')
AND DATE(created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH)
GROUP BY 1
ORDER BY 1
-- If refund_rate_pct is consistently above 5%, the GA4 vs SFCC gap
-- is largely explained by missing refund events in GA4.
4. Cookie Loss from ITP and Consent Mode
Apple’s Intelligent Tracking Prevention (ITP) erases first-party cookies after 7 days of inactivity on Safari. Consent Mode v2 means that users who decline analytics cookies generate modelled conversions rather than individual events. Both of these create a category of orders that exist in SFCC but have no individual purchase event in GA4’s raw event data.
The ITP impact depends on your user base:
- If your average consideration cycle is less than 7 days, ITP has minimal impact
- If you sell products where customers research for 1–4 weeks before buying (travel, electronics, luxury goods), ITP is a structural and permanent attribution problem
- Safari market share in Western Europe is approximately 28–32% on desktop and 50–60% on mobile (2024 data). For mobile-heavy demographics, this is not an edge case.
5. Browserless Order Environments
A significant and growing share of e-commerce orders is placed in environments where GA4 tags either cannot fire or fire incompletely:
- In-app WebViews (Instagram, TikTok, Gmail, WhatsApp): When a user opens a link within one of these apps, it renders in the app’s internal browser. Many WebViews block or restrict JavaScript, prevent cookies from persisting, or isolate the session from the user’s main browser. Orders placed in these contexts typically land in SFCC without a matching GA4 event. For businesses with active social advertising, WebView traffic can be 20–40% of mobile sessions.
- Headless commerce architectures: If the storefront is decoupled from the server-side rendering layer, the tag injection depends on the frontend framework’s implementation. Missing the confirmation page trigger is common and often undiscovered.
- Native mobile apps: Unless Firebase/GA4 event tracking was explicitly implemented in the iOS and Android apps, every purchase via the app lands in SFCC without a GA4 event. This is often a legacy implementation gap — the web was tracked, the app was not.
- Call centre and back-office orders: Agents create orders directly in SFCC. There is no browser, no session, no tag. This is expected and correct behaviour. These orders should be classified as offline, not treated as tracking failures.
The Full Data Flow Architecture
What the Gap Is Actually Telling You
Once you understand the structural causes, the gap between systems stops being a problem to solve and starts being a measurement instrument. Here is what each component of the gap tells you about your business:
Revenue invisibility rate
The share of SFCC revenue with no matching GA4 session tells you exactly how much of your actual business is invisible to your analytics tool. If this number is 30%, then every ROAS calculation, every channel attribution, every A/B test result you produce is based on 70% of your actual revenue. The invisible 30% is distributed across channels in an unknown way, which means all your efficiency metrics are systematically biased.
Channel under-valuation signal
If a significant share of your revenue comes from call-centre orders influenced by advertising, and those orders never appear in GA4, your paid media ROAS is understated. You may be cutting budgets on channels that are actually performing well, simply because their conversions route through an untracked channel.
Tracking implementation quality score
After accounting for structural offline causes (store orders, B2B API, call centre), the remaining SFCC web orders with no GA4 match represent real tracking failures: missing tags, consent mode misconfiguration, WebView failures, ad blocker suppression. Tracking this metric over time gives you a continuous implementation health score. A spike on any given day indicates a deployment broke something. A gradual increase indicates a systemic degradation.
The Four Business Decisions Before Any Technical Work
Before writing a single SQL query, you need answers to these four questions. They are business decisions, not technical ones. If they are not answered, any technical work you do will be built on a foundation of ambiguity.
Decision 1: What is revenue?
Write it down explicitly. Does “revenue” mean:
- Gross confirmed order value (including shipping, including VAT, before refunds)?
- Net billed amount (excluding VAT, excluding shipping, after discounts)?
- Net settled amount (after refunds, after chargebacks)?
There is no universally correct answer. Different teams need different definitions. Finance wants net settled. Marketing wants gross billed. The reconciliation dashboard should expose all three, with the definition clearly labelled on every metric.
Decision 2: What is the join key?
Does the transaction_id in GA4 correspond to the order_id in SFCC? Always? After normalisation? Never? The next article in this series covers this in detail — it is the single most common point of failure in any reconciliation project.
Decision 3: Which orders are in scope?
Are you reconciling web-only orders? All channels? With or without B2B? With or without cancelled orders? The scope decision determines your denominator for every discrepancy metric. A 20% gap with only web orders in scope means something very different from a 20% gap with all channels included.
Decision 4: How do you handle refunds?
Comparing GA4 gross reported revenue (which includes future refunds that haven’t been sent back) to SFCC net revenue (which deducts refunds as they happen) is comparing incompatible things. You either need to align the refund treatment in GA4 (by implementing refund events), or build your reconciliation comparison on SFCC gross confirmed revenue to match GA4’s snapshot nature.
Practical Diagnostic: Your First BigQuery Queries
Once you have GA4 export and SFCC data in BigQuery, run this sequence before building anything in Looker. These queries establish your baseline and categorise the gap.
Query 1: Daily volume comparison (web orders only)
SELECT
date,
SUM(ga4_orders) AS ga4_orders,
SUM(sfcc_orders) AS sfcc_orders,
SUM(sfcc_orders) - SUM(ga4_orders) AS delta,
ROUND(
(SUM(sfcc_orders) - SUM(ga4_orders))
/ NULLIF(SUM(sfcc_orders), 0) * 100, 1
) AS gap_pct
FROM (
-- GA4 purchases
SELECT
DATE(event_date) AS date,
COUNT(DISTINCT transaction_id) AS ga4_orders,
0 AS sfcc_orders
FROM `project.analytics.events`
WHERE event_name = 'purchase'
AND DATE(event_date) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY 1
UNION ALL
-- SFCC web orders only
SELECT
DATE(created_at),
0,
COUNT(DISTINCT order_id)
FROM `project.sfcc.orders`
WHERE channel IN ('web', 'storefront')
AND status NOT IN ('cancelled', 'fraud', 'test')
AND DATE(created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY 1
)
GROUP BY date
ORDER BY date;
-- Interpretation:
-- Consistent gap: structural issue (tracking quality, consent mode)
-- Spiky gap: event-based issue (deployment broke tag on specific day)
-- Rolling 1-2 day offset: timing mismatch between systems
Query 2: Revenue gap with component breakdown
SELECT
DATE_TRUNC(s.created_at, MONTH) AS month,
-- SFCC figures
COUNT(DISTINCT s.order_id) AS sfcc_orders,
SUM(s.order_value) AS sfcc_gross_revenue,
SUM(s.order_value - COALESCE(s.refund_amount, 0)) AS sfcc_net_revenue,
-- GA4 figures (matched only)
COUNT(DISTINCT g.transaction_id) AS ga4_matched_orders,
SUM(g.reported_revenue) AS ga4_reported_revenue,
-- Discrepancy
SUM(s.order_value) - SUM(g.reported_revenue) AS gross_delta,
SUM(s.order_value - COALESCE(s.refund_amount, 0))
- SUM(g.reported_revenue) AS net_delta,
-- Rates
ROUND(
COUNT(DISTINCT g.transaction_id)
/ NULLIF(COUNT(DISTINCT s.order_id), 0) * 100, 1
) AS match_rate_pct,
ROUND(
(COUNT(DISTINCT s.order_id) - COUNT(DISTINCT g.transaction_id))
/ NULLIF(COUNT(DISTINCT s.order_id), 0) * 100, 1
) AS unmatched_pct
FROM `project.sfcc.orders` s
LEFT JOIN `project.analytics.ga4_purchases` g
ON LOWER(TRIM(s.order_id)) = LOWER(TRIM(g.transaction_id))
WHERE s.channel IN ('web', 'storefront')
AND s.status NOT IN ('cancelled', 'fraud')
AND DATE(s.created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH)
GROUP BY 1
ORDER BY 1;
Query 3: Revenue gap by product category
SELECT
s.product_category,
COUNT(DISTINCT s.order_id) AS sfcc_orders,
COUNT(DISTINCT g.transaction_id) AS ga4_matched,
ROUND(
COUNT(DISTINCT g.transaction_id)
/ NULLIF(COUNT(DISTINCT s.order_id), 0) * 100, 1
) AS match_rate_pct,
SUM(s.order_value) AS sfcc_revenue,
SUM(g.reported_revenue) AS ga4_revenue,
SUM(s.order_value) - SUM(COALESCE(g.reported_revenue, 0)) AS revenue_gap
FROM `project.sfcc.orders` s
LEFT JOIN `project.analytics.ga4_purchases` g
ON LOWER(TRIM(s.order_id)) = LOWER(TRIM(g.transaction_id))
WHERE s.status NOT IN ('cancelled', 'fraud')
AND DATE(s.created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY 1
ORDER BY revenue_gap DESC;
-- Categories with a systematically low match rate may:
-- (a) have a broken tag on their specific product pages
-- (b) be sold primarily via offline / B2B channels
-- (c) have a different checkout flow with missing trigger
What Comes Next in This Series
- Part 3: The transaction ID problem — every failure mode and the complete SQL diagnostic sequence
- Part 4: LookML metric governance — defining revenue and discrepancy metrics once and enforcing them everywhere
- Part 5: Dark traffic and orphan orders — classifying the unmatched orders and building a complete Looker reconciliation dashboard
Sources and Further Reading
- GA4 ecommerce event documentation: developers.google.com/analytics/devguides/collection/ga4/ecommerce
- GA4 Measurement Protocol (for server-side refund events): developers.google.com/analytics/devguides/collection/protocol/ga4
- Apple ITP documentation: webkit.org/tracking-prevention
- Google Consent Mode v2 guide: developers.google.com/tag-platform/security/guides/consent
- GA4 BigQuery Export schema reference: support.google.com/analytics/answer/7029846
- Salesforce Commerce Cloud Order Management documentation: documentation.b2c.commercecloud.salesforce.com
- ITP impact on analytics — Simo Ahava deep dive: simoahava.com — ITP and web analytics
- Safari WebView tracking limitations: developers.google.com/tag-platform/tag-manager/web/webview