Google Analytics 4: Crawl Before You Walk

Welcome to the latest installment of my ongoing unofficial series that I’ve just spontaneously decided to name “Google Changes Everything”. Today, we’ll be discussing Google Analytics 4, which as of its October 14 announcement constitutes the biggest change that Google has ever made to the most important piece of marketing software that they (or anyone else) ever created. I hope to give you a sense of the big changes that this new iteration has brought to this extremely popular web analytics platform, and to cover everything that I feel you need to know as you begin to use it.

First, a brief word about what this post will not be. It will not be a comprehensive translation guide between Universal Analytics and Google Analytics 4, informing you as to the many specific changes to nomenclature and report structure, and teaching you how to recreate your existing UA reports in the new language of GA4. There are two reasons why this post won’t be that.

The first reason is that I am not in a position to write that post yet. I know enough about the big changes to be able to speak intelligently about those, but the small changes are so numerous and dramatic that I think it’s going to take me (and most of the digital marketing world) the better part of 2021 to internalize them — not to mention that there is almost certain to be further refinement to those changes over the course of this year as bugs are found and the feature set expands. In full transparency, we only have one client at UpBuild who has launched a GA4 property as yet, so it will take a lot more messing around with it before my hands-on knowledge of the platform catches up to my theoretical knowledge.

The second reason is because I’m not so sure that post should be written at all. I think we stand to get a lot more out of GA4 if we ease into it in the way I will describe below, and allow its restructuring to give us a fundamentally new perspective on our data, rather than simply jumping in and rushing to create some kind of “UA emulator” inside it. Google wouldn’t shake up this tool this significantly unless part of the point was to shake up the way we think about our data. We stand to get a lot more out of the platform if we meet it on its new terms, clear-eyed, which will mean allowing the old models — of hits, of sessions, of every piece of cruft built on top of the old Urchin model — to perish. I genuinely think the new foundation of events, users, and machine-learning-driven modeling that GA4 is built on will give us better insights, and ultimately help us better serve our customers or clients, but only once we have come to understand it. 

Accordingly, this post exists to give you a sense of what GA4 is fundamentally about, and to help you take your very first steps into it. I’m going to let you behind the curtain a bit and lay out UpBuild’s officially sanctioned process for adopting GA4, which, as you’ll see, is a deliberately cautious and unhurried one.

The Big Changes

There are four monumental changes underlying the hundreds of little ones, and these are already fully defined and graspable. Let’s dig in.

Everything Is an Event

The most fundamental change that comes with GA4 is a total restructuring and renaming of the bedrock table of measurements that every report rests on (nbd). Where UA and its predecessors were constructed on the premise of “hits” — instances of site activity, of which page views, ecommerce transactions, and events were among the different types — GA4 has decided to classify all core instances of site activity as “events”.

There are two critical points to make about this change up front. First, the word “events” no longer has the meaning that it had in UA, so don’t go thinking that all measurements in GA4 speak in terms of Event Category, Event Action, and Event Label. That language has actually disappeared wholesale from GA4, as my next point will explain in more detail. “Events” in GA4 means more or less what “hits” used to mean in UA; it’s the new name for the fundamental data point whose collection powers the whole platform.

The second point is that this isn’t as trivial a change as it appears to be; they haven’t simply swapped the word “events” for “hits” in their definitions. They’ve actually transformed the underlying logic of the platform’s data collection.

In UA and before, the different hit types brought with them different standard dimensions and metrics. Page views came with dimensions like “URL”; user timing hits delivered dimensions like “time on page”; transactions came with dimensions like “product” and metrics like “price”, etc. In GA4, we no longer have distinct types of hits, each with its own native dimensions and metrics. We instead have events, which are all fundamentally alike, and “parameters”, which just means aspects of an event, and which is a large enough umbrella to cover everything we used to call a dimension or a metric.

The major events — the ones that are expected to be relevant to all (or at least the majority of) websites and apps — will be collected automatically by the base GA4 tracking code, as will the parameters that would be expected by default. These include familiar sights like page_view, and some exciting new ones like click, video_start, file_download, and scroll, all of which used to require custom tagging in Google Tag Manager plus custom event tracking in UA but which have now been baked into the universal install (there are also extra parameters available for many of these via an Enhanced Measurement option that requires no added code to activate).

Then, there are two more categories of events that can be optionally configured, via added code:

  • Recommended events for certain kinds of B2C sites, of which some are intended to apply to all sites with business objectives, and others are vertical-specific, and
  • Custom events to be enabled either via hardcoding or Google Tag Manager, which follow the same basic rules as they did in UA, except that they must honor the new language of parameters, and of which you are now granted up to 500 per GA4 property.

You can see that the way these redefined, universalized events get represented in GA will still vary from one event type to the next and one report to the next, but the differences stop there. A page view, for example, will no longer be a fundamentally different data point from an event, but instead will be an event named page_view. And “URL” will no longer be a dimension, defined as such at root and bound to the page view hit type, but instead will be a parameter named url and attached to the page view event purely by convention. This is how the logic has been transformed, or, more precisely, simplified.

It’s the User that Matters, Not the Session

Universal Analytics and its predecessors were built around the idea of the website visit — the “session” — as the core phenomenon to be analyzed. All “hits” served to populate dimensions and metrics, and all dimensions and metrics were conceived to describe or measure aspects of the session. The session took on this colossal importance in web analytics because at the beginning of the technology, it was the only trustworthy container for the behavior of a known single user. But it retained that importance even after GA introduced a raft of user-level tracking measures as part of UA — dramatically increasing their power to make connections between sessions and to speculate about the users behind those connected sessions — because most analysts who came up in the earlier days had a hard time trusting Google’s ability to track a user as well as they trusted them to track a session. What’s more, our clients, customers, and bosses overwhelmingly felt the same way; they came up measuring success in terms of sessions and they didn’t want to have to re-benchmark and lose the calibration that they had achieved on this metric, so they encouraged us to keep using Sessions as the unit of measure in our organic traffic reporting rather than switching to Users. I say all this anecdotally of course, but it accurately reflects my experience.

But the fact is that between UA’s initial introduction of the User ID and Client ID tags, and then the gtag.js tag in 2017, along with Chrome’s ever-deepening world domination, GA’s user tracking has come a very long way indeed. More importantly, the way that we browse the web is now invariably multi-device, so you are likelier than ever to be missing critical information if you don’t connect sessions between users, even if risks do persist of some details being missed some of the time. Nothing is ever perfect, and I actually encourage you to keep that in mind when you read the upcoming paragraph about the new machine-learning modeling that GA4 will offer up. But the most important thing to know about GA4 in this light is that Google has forced the issue and driven us all to downplay the session in favor of the user, which is probably for the best.

It’s not that sessions don’t exist in GA4 — they do, though they’re now measured differently — but rather that the concept of the session-scoped dimension, metric, or segment is kaput. As Google’s documentation shows, while UA’s hit-scope dimensions and metrics have been redefined in terms of events and parameters, and where UA’s user-scope dimensions and metrics are now reclassified as “user properties”, the session-scope measurements we used to rely on are now completely gone. Though it can easily get lost in the sea of information that this document contains, this is a major ideological choice on Google’s part. You can still see sessions as a measure of user activity, but you will not be permitted to frame events and their parameters in terms of sessions. The session has just disappeared as a primary lens of analysis, having been absorbed by the user. The user is what — or rather whom — you’re meant to be looking at.

App + Web Tracking Are Natively Joined

Because the user is now seen as the core element to be studied, and because any company that runs a mobile app can expect users to engage with them via both the website and the app, Google has decided to permanently erase the boundary that once stood between web and app (Firebase) analytics. They’ve universalized the “App + Web” property type that they released to Beta in 2019, and all GA4 properties are now natively configurable to cover both website and app traffic. In fact, one of the problems solved by the platform’s decision to reorient all data collection around “events” is that Firebase analytics had already been running on that concept, while UA web analytics was still stuck on hits. Between that new alignment and the advent of cross-platform user tracking as described above, app and web tracking should be able to connect with a new kind of seamlessness in GA4.

New Modeling Driven by Machine Learning (ML)

To this point we’ve largely been discussing new factors of data collection in GA4. But the reporting in GA4 is set to be transformed as well, chiefly by a new capability to deliver ML-powered insights, which promises not only to detect anomalies in the data collected, but will also attempt to fill gaps in collection resulting from the aforementioned user privacy preferences or laws, and provide conversion optimization and customer retention tips based on user behavior patterns. This power is not a feature for Premium/360 accounts only, nor is it something to be earned once you cross a certain threshold of monthly users. It learns by access to all the data that it collects, on all the sites that run GA4, and the business intelligence it generates is a core feature of the platform. This is one reason why Google has been pounding the pavement for GA4 adoption, going so far as to make it the default selection for new properties created (you can still create a new UA property, but you have to look for the button that makes that possible): they need sites to run it so that the machines can learn. Accordingly, this feature might be worth taking with a grain of salt during the rollout phase, but as GA4 becomes the preferred install of GA, expect its insights to get sharper and sharper.

I strongly encourage you to read Google’s official literature on the conceptual differences between UA and GA4, which includes tables to help you translate some of the key UA values to the GA4 vocabulary (or to explain why a translation isn’t possible wherever it isn’t). If you’re like me, though, you’re going to need to get your hands dirty with it before your intuition starts to grasp the differences, so: bookmark that page to use as a reference, and let’s move on to talking about how best to introduce your hands to that dirt.

How to Get Started

In light of just how much ground is shifting with this upgrade and how much of GA4 remains unknown and daunting, UpBuild’s official recommendation is to launch a GA4 property independent of your existing UA property, rather than to flip the upgrade switch, and to plan to run both instances of GA in parallel at least through 2021.

Not only is it likely to take that long for the platform to become stable and robust — along with it being a good idea, as mentioned earlier, to give the ML modeling a good year’s worth of common usage before we start investing a lot of trust in its projections — but I honestly think we’re going to need a year to learn how to read GA4, and the best way to learn to read it is to compare GA4 reports for a given site over a given date range to UA reports of the same site over the same date range. We are certain to find discrepancies in comparing the numbers themselves, as the literature points out in its descriptions of the contrasting forms of sessionization, and elsewhere. But in the more qualitative sense, UA is a language we already speak, and so it will be hugely valuable to continue running it not only as a fail-safe for GA4, but as a Rosetta Stone for it.

That’s all from me. What about you? How much of a hurry are you in to adopt GA4? Why or why not? What are you hearing from your clients, your marketing team, or your devs? I’d love to hear about it in the comments. Let’s share our thoughts and discoveries, in the interest of learning this radically revised platform faster and in greater depth. It’ll likely be just one of the many features of life in 2021 that we’ll do a better job at if we work together.

Related Posts

Leave a Reply

Your email address will not be published.