Injecting CRM Data Into Google Analytics with Measurement Protocol

Disclaimer: This post explores an experimental approach to adding Google Analytics data points to a CRM (e.g., Pardot, Marketo, Hubspot, Act-On). In a hypothetical scenario where one has absolute control of every technical system involved, this can work well; in practice (i.e., IRL) it’s a lot of work, is prone to breakage, and may provide little — if any — ROI relative to the time/resources invested. All have been warned.

You’ve tried the rest, now try the best* way to merge your Google Analytics and CRM datasets. 

Integrating Customer Relationship Management, or CRM, data with Google Analytics. It’s been referred to as the “Holy Grail” of web analytics. If you’re in the business of generating leads for your company or client, I don’t need to explain how valuable that could be. After all, if the only thing that your CMO cares about is the number of Sales Qualified Leads (SQLs) in the CRM, then who cares about a 20% lift in form fills from organic search if you can’t definitively say what type of lead they turned into? Nobody cares about your traffic lift because there’s no concrete connection between the 100 lead forms that got filled out and the 25 new SQLs in the CRM that month.

But what if we could see not only how much traffic came from organic search in a given month in Google Analytics, but how many SQLs were produced as a result of that channel’s performance and spend/investment? What if we could perform analysis on the user behavior of “raw prospects” vs. “SQLs” vs. “MQLs” side-by-side?

And what if that’s only scratching the surface (Hint: it is only scratching the surface).

In this post, we’re going to dive deep into how to make that dream a reality using the Measurement Protocol.

* Debatable 

What We’ve Seen So Far

The cool thing about Universal Analytics (the current version of Google Analytics, as opposed to “Classic”) is that it’s universal — it’s designed to accept data from any number of sources. What we’ve seen so far regarding GA and CRM integration centers on the platform’s ability to support the upload of CSV (Comma-separated values) documents via the Data Import function. In the innovative examples that we’ve seen to date (which I’ve referenced before), your CRM’s user ID (like someone’s Pardot user ID, for example) was passed into Google Analytics as a Custom Dimension.

That Pardot ID could later be used as a key to merge CRM data with GA data using the aforementioned Data Import function. The best thing about this approach is that (theoretically), when accomplished via Google Tag Manager, it doesn’t involve any development work — everything can by handled creating a few JavaScript variables in GTM, updating the pageview tag, and exporting/importing data in GA/your CRM on a regular basis.

In our experience, we’ve found this solution to be lacking for a number of reasons:

  1. Your CRM might not produce user IDs that are available for capture with GTM.
  2. Manually exporting and importing those CSVs each week or month can be time-consuming.
  3. GA’s Data Import feature does not allow you to use a Client ID as a key (more on this later), so you need to have (or build) a way to get a unique user identifier into both platforms (GA and your CRM).

A Better Way

If you’re thinking, “there’s got to be a better way to get this CRM data into my analytics!”, you’re in luck: there is!

Using Data Import for CRM data like…

Is it more complex and does it require some development chops to implement? Sure.

But, is it also more scalable? Heck yes, it is!

Is it way more geeky and more fun? You better believe it.

This new method (which actually isn’t that new) hinges on using the Measurement Protocol. If you haven’t ever heard of that before, that’s okay. A few Builders here hadn’t heard of it either (my bad, everybody. I should have told you earlier!).

What is the Measurement Protocol?

The Measurement Protocol is a method by which someone can send raw data straight to Google Analytics’ servers. It’s similar to the original idea of uploading your own data to GA with a CSV file, except this is exponentially more powerful and customizable.

The idea behind it is relatively straightforward: You make an HTTP request to google-analytics.com and, along with that request, include some query-string parameters (called, the “payload”). When Google Analytics receives and responds to that HTTP request, they’ll process the payload data and ::PRESTO:: you’ll have that data added to your Google Analytics account.

Of course, there’s a lot more to the Measurement Protocol than that and the applications and use-cases for it are nearly limitless. Imagine being able to use Google Analytics to track not just website and app usage, but Smart appliances usage, retail POS transactions, or real-world traffic captured via motion sensors. It’s all possible!

Intrigued? The rest of this post will walk through the process that we’re pioneering to enable our clients (and ourselves) to see CRM data right inside Google Analytics. Read on for more.

How It’s Done

I often find that the best way to approach a highly technical problem like this is to start by stating or restating the goal. So what are we trying to do here anyway?

The Goal: To take an identified user in Google Analytics, find out who that same user is in the CRM database, match up the two, and then add their associated CRM data into Google Analytics.

It should be noted that this is less about viewing the activities of one particular user in isolation within Google Analytics and more about analyzing website usage and behavior for segments of site visitors based on CRM attributes (such as Contact Status, Interest Level, etc.).

Let’s break the challenges down in reverse order:

  • Add Associated CRM Data into GA: This is what the Measurement Protocol was born to do! Even though we haven’t defined what data we’re going to be working with, we know that we have the ability to add that data into Google Analytics.
  • Match Up User Records From Each Dataset: To do this, we need a common piece of information that will be shared between the two datasets (the CRM and GA) for each user. This is referred to as the “key” and I’ll be using that term from here on out. So, for this step, we need a key and we’ll talk about that more in the next section (below).
  • Identifying a User in the CRM: In most CRM systems, a user is identified by their email address. Sometimes it’s a phone number. Sometimes a user can also have a unique ID that they’re assigned, but at UpBuild we’ve found that to not always be the case. This hypothetical key that we started thinking about in the bullet above could definitely serve this purpose.
  • Identifying a User in GA: Personally Identifiable Information (PII) cannot ever be sent to Google Analytics. It’s a violation of GA’s Terms of Service and breaking those terms can get your Analytics property wiped clean. But then, how does Google Analytics identify users in their platform? They assign them a completely unique ID called the Client ID (CID for short). The CID is a numeric ID that Google Analytics assigns to every website user to connect all their pageviews, interactions, and subsequent site visits into one user journey. This CID is then stored as a cookie on the user’s browser so that subsequent visits can be associated to the original CID.

Choosing a Key

At this point, I don’t think I need a [Spoiler Alert] to say that the CID from Google Analytics needs to become our key, but just to deepen our understanding let’s recap the importance of having a shared key. Our ability to merge elements of these two datasets (Google Analytics data and CRM data) hinges on the existence of a shared key — a unique piece of information that identifies a specific person in both platforms.

A few considerations for choosing a key:

  1. The key must be unique, now and in perpetuity. Otherwise, we risk overwriting or otherwise corrupting user data in GA.
  2. The key cannot be a piece of personally identifiable information (PII) such as an email address or phone number. I can’t mention enough that using any PII in Google Analytics in any way is a violation of Google’s Terms of Service and can result in Analytics property deletion.

So while we could theoretically come up with our own unique alphanumeric string that would serve as a key and inject that information into both platforms, it would be much easier to use an anonymous yet unique identifier that already exists within one of the two platforms. Good thing we have the Client ID! Using the CID means we don’t have to worry about getting it into Google Analytics; it’s already there.

All we need to worry about is getting the CID into the CRM. By capturing this when a user fills out a lead form and allowing the CID to pass into the CRM along with all their other lead record data, we’ll have the unique key we need moving forward.

Capturing the CID for Use in the CRM

All I’ve told you so far is that the Client ID exists somewhere and is accessible somehow. Let’s talk about how it comes into being and how we can get at it.

Here’s what the life of a CID looks like in simplified terms.

  1. Upon your first visit to a website, the Google Analytics tracking code loads up and checks to see if it’s seen you before.
  2. To do that, it looks for a browser cookie called “_ga”. This cookie, when it exists, contains your GA CID.
  3. If Google Analytics hasn’t seen you before (i.e., you have no “_ga” cookie), it generates a new CID to identify you and uses that to start tracking how you use the website.
  4. Simultaneously, your new CID is stored is a “_ga” cookie added to your browser.
  5. The process repeats for each successive visit.

So, how do we get at the CID? One way to do this would be to look at the cookie, but that has the potential to get messy. We can avoid having to deal with that broker and go straight to the source — the Google Analytics tracker object (discussing the tracker object is a huge can of worms, so just think of it as the thing that makes analytics data collection work).

On any page that is running the Google Analytics tracking code (via any implementation style), you can access and work with the tracker object with JavaScript. This allows us to get the CID and this will be the key (pun intended) to getting the CID into the CRM.

Modifying Forms & the CRM

The first step in enabling the capture of the CID is to prepare the CRM to receive the extra info for each new lead — we’ll need to add an Analytics CID field. This will be populated by a hidden field on each website form. A hidden field is just a form field that’s not visible to the user, but one that still submits data when a person clicks Submit.

The HTML for the new field should look something like the tag below. The critical point is that the HTML must use the id of “cid”. This id should only appear and be used in this context.

<input type="text" id="cid" style="display:none">

Of course, the form’s functionality will also need to send this data point into the CRM with everything else. This input will become the Analytics CID value for each lead record.

If you were to go to our contact page right now and look at the form’s source code, you’d see that there’s an input field with the name=”cid”. This is hidden (using style=”display:none”), but still passes some data into our CRM.

Populating the Hidden Field

But, “How does any data get there?”, you might be asking. Let’s find out how to do that with JavaScript.

We want to do a few things here:

  1. Take a breath and wait until we can be reasonably sure that Google Analytics is fully loaded on the page
  2. Get the client ID from the tracker object
  3. Find the hidden CID field and set its value to the client ID

The script below does all of that as long as your hidden field has the ID attribute of “cid”. If you can’t make that happen, the code should be easy enough to modify. Also, note that you need to update the script to use the UA-ID that you want; the one shown is UpBuild’s sandbox analytics account.

See the Pen Get CID for a CRM by UpBuild (@upbuild) on CodePen.

How you deliver this script is up to you. It can either be put into the source by a developer or, as we prefer, you can add it via a Custom HTML tag in GTM. Note that it only needs to be present on pages that host a form.

Injecting CRM Data Into Google Analytics

This is where it gets tricky, or fun, depending on how you want to look at it. We’ll need to use the Measurement Protocol to manually or automatically send data to Google Analytics. That will merge our CRM data with the Google Analytics record for each user.

Preparing Google Analytics to Receive Data

Before we actually send that data, we need to give Google Analytics a heads up so that it’s expecting it. If GA doesn’t know what to do with our incoming data, it’s just going to drop it. It’s a powerful tool, but it can’t read your mind.

What we need to do is create a Custom Dimension (or many). In the language of Google Analytics, a “dimension” is any attribute of a user, a page, a product, etc. Configuring a Custom Dimension in your Google Analytics Admin panel allows you to add your own attributes to what you’re tracking. As you create these, you’ll want to make a note of their index number (a number between 1 and 20 used to reference these later).

Each Google Analytics property is allowed 20 Custom Dimensions, so you can bring in a lot of custom data yet you do want to use these slots wisely. Think carefully about how much data you’ll really need; it’s sometimes less than you think. For this example, we’re only using one — Lead Status.

Last thing here: Notice that the “Scope” is set to “User”. That’s because this attribute applies to the person, not just a single visit or a property of a page (like a publish date Custom Dimension).

Injecting CRM Data Into Google Analytics

Now that we’ve done all the ground work, we can get back to the Measurement Protocol…

The Measurement Protocol allows you to make an HTTP Request that will send a hit to the Google Analytics servers. It’s not quite the same as what happens when your basic analytics tracking script loads on any given page on your site, but it’s very similar and the outcome is nearly identical. A request is made to google-analytics.com, that request is received along with a query string payload, and BAM, data in your Google Analytics profile.

The crux of this is having a valid Measurement Protocol hit. Below is a basic example (or you can build and validate your own here).

https://www.google-analytics.com/collect?v=1
&t=pageview&tid=UA-62895838-4&cid=555
&dp=%2Fhome&dt=homepage

Let’s break this down a bit. All of these are required except the last one.

  • https://www.google-analytics.com/collect — The “end point” for your HTTP request
  • v — The Measurement Protocol version (as of today, there’s only Version 1)
  • t — The hit type. You could have a pageview, event, transaction, etc. (for the curious, there’s a whole lot more info here)
  • tid — The tracking ID of the Google Analytics property you’re sending data to.
  • cid — The Client ID for the hit. Every hit needs to be associated with a Client ID, no exceptions. You either need to create a new one or use the CID of an existing user that Google Analytics knows about.
  • dp & dt (optional)— The Document Path and Document Title, respectively. Document pretty much means “Page” and since we’re sending a pageview hit in this example, we’ll want to set these values.

If you’re curious, there’s a full reference sheet for all the parameters here: Measurement Protocol Parameter Reference.

But we’re not just tracking simple pageviews here, we want to inject data from our CRM. What would that Measurement Protocol hit look like?

Behold!

https://www.google-analytics.com/collect?v=1&
t=event&tid=UA-62895838-4&cid=381587225.1486051306
&ec=measurement+protocol&ea=crm+data+injection&
el=lead-updated&cd6=sales+qualified&cd7=analytics+project+lead

Static values are shown in bold while dynamic values (ones that will change) are shown in green.

The required fields from the first example above all appear again: the end point, Measurement Protocol version number, the hit type, tracking ID, and CID. You’ll notice, however, that I’ve changed the hit type to “event”. That’s because what we’re recording in GA is something other than a pageview, eCommerce transaction, or social action. The “event” is CRM data being added into the system.

To facilitate that, we need to provide some additional event-related data (shown in blue above).

  • ec — Event Category, set to “measurement protocol”
  • ea — Event Action, set to “crm data injection”
  • el — Event Label, set to either “lead updated”, “lead added”, or “lead deleted”

Note: These values can be anything you want. What you see above is what makes sense for us to see in our Google Analytics.

The final parameters are for our custom dimensions. I’m using two here for purposes of illustration, but you may find yourself using just one or up to twenty!

  • cd{x} — {x} representing the index number of the Custom Dimension that you want to set. For our purposes, it’s number 6 and 7. For your implementation, it will likely be different.

Dynamically Setting MP Parameters

Again, those values in green will be dynamic (because they will be different with every hit). How do we know what to put there?

Well, this is all the information from our CRM.

  • cid — This is the shared key that we captured earlier. Any given person in our CRM should hereafter have an Analytics CID field set so that we can reference that same person in Google Analytics and merge their data.
  • cd{x} — To keep it easy, let’s pretend we only have one CRM data point that we care about, Lead Status. We’d want to populate cd1 (for example) with whatever this person’s status was, such as “sales qualified”, “marketing qualified”, “unqualified”, “closed”, etc.
  • el — Now this one won’t technically come from our CRM and, in fact, it’s not even necessary. I like to use it as a form of record keeping. That way, I can look at an event report in GA later down the line and see why CRM data was added. The value of this Event Label parameter would answer the question, “Why are we sending data to GA right now?” One of the answers below should be sufficient.
    • “lead updated”
    • “lead added”
    • “lead deleted”

How to Make the HTTP Request to the Measurement Protocol

This is the part that I struggled to wrap my head around for a while. It’s worth doing your own research to educate yourself further about GET and POST HTTP requests and how that all works, but the idea is deceptively simple. Simply opening a new tab in Chrome and typing in a URL generates an HTTP request. You can literally copy and paste either of the two example Measurement Protocol hits in your own browser and send data into GA. That’s all it takes!

The issue is that you don’t want to be writing out new hit payloads and copying and pasting things into your URL bar manually. You want to automate it as much as possible or have some kind of tool to simplify your workflow.

There’s no “right way” to do this, and how you implement it will largely be up to your (or your developers’) tastes and preferences combined with the nature of your tech stack. One amazing thing you can do is lean on a tool like Postman. It allows you to specify the HTTP request you want to make, test it out to make sure it’s valid, and then copy the functional code in your language of choice.

You’ll need to decide what the right solution looks like for you, but at a conceptual level here’s what I’d call an ideal solution.

The Ideal Solution for Sending Measurement Protocol Hits

In order to maximize the efficiency of data sends, you’ll want to batch all updates to be sent to Google Analytics for processing at a set time each day. This can be done with a “cron job” or using any other method for scheduling a script to run, but the idea is to have a script that automatically runs at 3 AM Eastern time (or any other convenient time) every day.

The script will do the following:

For each CRM record,

  • If a given CRM record has changed (has been added or updated) in the last 24 hours:
    • get that record’s Analytics CID and the value of each CRM data point relevant to analytics;
    • also choose the relevant Event Label (“lead added”, “lead updated”, “lead deleted);
    • build (using concatenation) a Measurement Protocol request URL, i.e., payload;
    • using the payload, make an HTTP request;
    • go to the next CRM record.
  • If not,
    • do nothing;
    • go to the next CRM record.

The end result is that each night, the script will iterate over all CRM lead records, build a Measurement Protocol hit for each new or updated record, and send the data to Google Analytics.

How We Do It At UpBuild

Writing this post is a little humorous because we don’t even use a proper CRM here at UpBuild; we track all our sales leads in a Trello board. Trello is what we use to manage all of our other projects anyhow and we have a pretty good system. The drawback as it relates to this is that we can’t open up Trello’s source code and write the script I outlined above.

We can still do the next best thing — build a tool. Our lead volume isn’t crazy (we’d have to update a few “CRM leads” each week, but not more than a handful) so this works well.

Since we’re capturing CIDs through our forms and sending that information into Trello, all someone needs to do is look at our active Trello lead cards, grab the CID and the lead status from the card’s description, and paste it into this simple form (screenshot above). Once you click “HIT IT”, the form POSTs the data to the page, we use PHP Global variables to build out a valid Measurement Protocol payload, and then run a simple cURL script in PHP to make that HTTP request to the Google Analytics servers. Easy peasy, right?

At the end of the day, this is one of the basic custom reports we can now access in our Google Analytics. Again, this is just the tip of the iceberg.

CRM Lead Status is GA

Disadvantages

The one downside of this method and much of web analytics, in general, is that cross-device browsing by a single user breaks the rules and exposes the flaws in the system. Since modern web analytics is cookie-based, any visits in a user’s journey that aren’t on the original browser with that original “_ga” cookie will be considered a new user visiting the site for the first time.

If you visited a site for the first time on Chrome on your Google Pixel, you’d get a new “_ga” cookie and be assigned a CID. If you visited the same site later that same day on your desktop using Firefox, GA would look for that “_ga” cookie, fail to find it, and assign you a new CID. You’d be one person construed as two unique users.

This problem applies both to this CRM<>GA solution and to web analytics as a whole. One day, someone will figure out a truly great answer to this million dollar question. Until then, we have what you could call the second best option for the time being: user ID override. You can read a good primer on it by Justin Cutroni, but the gist of it is that Google Analytics gives you the ability to override its CID with a unique identifier of your choosing. The catch, however, is that you can only assign that user ID once you’ve confirmed that a subsequent visit from a new device or browser is the same person who previously visited your site.

The most common scenario would be seen with sites that require a login because they can create and store a custom user ID and set that to override the CID each time you log in (with your email and password). Facebook, for example, would have no problem with this. Your site, on the other hand, likely doesn’t require that a user login in order to have a satisfactory browsing session, so you’re still going to have fragmented users. Even for a website with a CRM or marketing automation platform, you can’t reasonably expect that a user is going to fill out a form, effectively logging in, every time they visit the site on a new device/browser and self-identify for you.

The point being: you can get what amounts to better data, but it’s an imperfect solution to an incredibly complex problem at best; at worst, it’s a waste of your marketing and dev resources.

What’s Next?

From here, it’s up to you to try this on your own and customize the solution to fit your needs.

Beyond you and me though, this whole exploration has me wondering “Why aren’t marketing automation and CRM SaaS companies baking this into their product?!”.

Just take Pardot, for example. Wouldn’t it be great if you could just go into your Pardot account settings and:

  1. Input your Google Analytics tracking ID;
  2. Map GA Custom Dimension to Pardot Prospect properties;
    1. Slot 1 = Lead Score;
    2. Slot 2 = Company Name;
    3. Slot 3 = Lead Status;
  3. Have Pardot automatically assemble and send Measurement Protocol hits into your GA while you sleep?

There’s your next feature, Pardot. You’re welcome!

At UpBuild, we’re actively working on rolling out customized solutions based on this core idea for a hand-full of our clients. I’d love to hear if anyone reading this tries it and has some success of their own (or failure; that’s cool, too)!

Until next time, good luck and happy optimizing.

Written by
Mike founded UpBuild in 2015 and served as its CEO for seven years, before passing the torch to Ruth Burr Reedy. Mike remains with the company today as Head of Business Operations.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *