Implementing Semantic Markup with Google Tag Manager

Why Semantic Markup & Structured Data?

It’s almost 2018. Do we still need to have this conversation? Structured data is no longer a “nice to have”; it’s as good as a requirement at this point. I’d go so far as to say that having some basic organization and maybe breadcrumb markup on your website is little more than table stakes — having it doesn’t really give you much of an advantage but not having it means you’re not really taking the (SEO) game seriously.

Making sure your website has descriptive yet concise title tags is SEO 101. No one would argue otherwise. I’d love to see basic semantically-organized, structured data get added to that checklist. It’s that foundational and essential.

Why?

SEO is in large part about minimizing ambiguity. It’s about getting specific. Specificity beats generality every time. SEO is about making sure nothing gets lost in translation between the web experiences we create and the programmatic applications that are built to digest and index them.

Making well-structured, semantic data available to search engines (and other web applications) ensures that there’s no ambiguity as to what your content is or what it’s about. Who wouldn’t want that level of control?

Rich Snippets, Rich Cards, and Entity Extraction

All philosophical arguments aside, semantic markup and structured data benefit you (and the web, if you care about that) in concrete ways. Major search engines are not just continuing to use structured data as the basis for rich snippets, but they’re expanding the scope and applications of that use. We now moved beyond traditional snippets into Rich Cards. The latter are highly portable blocks of organized data that can be presented perfectly in the mobile context. Rich Cards also demonstrate how well this data can be organized for current and/or future use in “spoken” voice search results.

Vegan mashed potatoes portable Google result

And that’s not even to mention entity extraction. A process by which search engines such as Google can harvest data points pulled from all over the web to build out a rich entity.

Semantic markup and structured data give one the ability to control or otherwise influence how all of this goes down. It’s kind of a big deal.

What’s New in the World of Semantic Search

As with everything in SEO, semantic markup and the ways in which we can structure data are always evolving. Let’s take a quick moment to recap some of the exciting developments that we’ve seen in 2017.

Adventure Zone Podcast on Google Home

  • Schema.org Version 3.3 — This version saw a lot of updates made to the Schema.org vocabulary, but one of my favorite changes was the addition of HowTo. The HowTo entity type is something I’ve been waiting a while to see. It’s similar to the tried and true Recipe, but HowTo is a more inclusive and general definition that can be applied to pretty much any documented process. In addition to familiar properties like “prepTime” and “yield”, we now have great things like “estimatedCost”, “tool” (anything used in the process)”, and “supply” (anything consumed in the process).
    • It’s worth noting that Version 3.2 from earlier this year expanded the Course type and moved related properties from Pending to Schema.org Core. We also saw some great improvements to the Menu type with this release.
  • Job Postings & Salary Ranges — Stuff’s getting real! This is the kind of thing I get really excited about — a cutting-edge search experience powered by structured data! More information on Job Postings in Google Search.
  • Google’s Updated Merchant Center Recommendations — This post hasn’t yet touched on JSON-LD but this is a nice teaser. Google is now recommending that you use JSON-LD as your method for structuring product data for Merchant Center. This is huge! Too long have marketers and webmasters been hamstrung by having to deal with inline markup. It’s moves like these that have led the UpBuild team to completely abandon recommending inline semantic markup for our clients.
  • Podcast Structured Data — The UpBuild team loves podcasts so this one was super exciting to see. Google’s going to deliver rich data for Podcast series and episodes and this is notably geared toward Google Home. That’s pretty big and hints at why structured data is only going to get more and more important. What’s interesting here is that this requires your data to be part of an RSS feed (RSS 2.0 to be precise); not inline or even JSON-LD. Learn more about Podcast structured data here. This is a great example of why I go out of my way to use the term “structured data” and minimize how much I refer to “semantic markup” — there’s no markup here! This is 100% structured data completely decoupled from any presentation layer.
  • Paywalled Content Structured Data — I’ll refrain from ranting about my extreme distaste for paywalled content, but the new specification is super interesting to see. In Google’s guidelines on Subscription and Paywalled Content they provide a method for specifying not just that a webpage has a paywall, but for specifying which section(s) of the content is subject to the paywall.
  • Early Signs that Bing is Supporting JSON-LD — We are cautiously optimistic about Bing getting onboard with JSON-LD. One, because it’ll be great for the quality of their results and the web in general. Two, because now we really don’t need to deal with inline markup anymore.

Bonus: Stay up to date with the latest news in the world of structured data by bookmarking Aaron Bradley’s impressively comprehensive document tracking important developments in the space.

JSON-LD vs. Inline Markup

We’ve already touched on this a bit in this post, but let’s break it down a bit more to illustrate the differences between these two implementation methods.

JSON-LD Delivery

Basic JSON-LD Example

  • Self-contained <script>
  • Can contain any relevant data, even data not presented on page
  • Decoupled from presentation layer; content changes won’t break the data’s structure
  • Could theoretically be implemented without submitting any development requests

Inline Delivery

Basic inline

  • Baked into the HTML source code of the page
  • Can only be applied to data presented on page (meta tags are one notable exception)
  • Intrinsically linked to the presentation layer; content changes will almost certainly break the data’s structure
  • Can’t be implemented without major development requests/code changes

Well, when you put it that way it seems like a pretty easy choice to make? JSON-LD all the way.

The unfortunate thing is that we’ve all been trained to use inline microdata; I think we can survive the switch though. The other consideration is that not all search engines are built to handle JSON-LD…yet.

However, if recent findings related to Bing’s JavaScript rendering are any indication this may soon become a non-issue. If you were motivated to cover all your bases, the safe bet throughout 2018 would probably be to implement both styles so that you had a fallback for the smaller engines that don’t yet support JSON-LD.

While inline markup is brittle, we do have a powerful tool called ItemRef and ItemId that can help overcome a lot of the challenges. But at some point, we have to ask ourselves if bending over backward for smaller engines is worth the extra work compared to the minimal return we might get. That’s a decision that each marketer needs to make for themselves but our recommendation these days is to focus on JSON-LD only.

So let’s talk about how to implement JSON-LD.

How to Implement JSON-LD on Your Site

We’ll start by discussing how to implement static (i.e., unchanging) JSON-LD and then we’ll get into the fun part: implementing dynamic JSON-LD with Google Tag Manager. Throughout this example, we’ll use the event page for TechSEO Boost (view it here) in pursuit of an Event Rich Card in Google.

TechSEO Boost

Implementing Static JSON-LD

This is fairly easy to do, with or without Google Tag Manager. You have three main options.

  1. Create (or copy) your static JSON-LD script and manually place it in the <head> of each page that you want it to appear on.
  2. Use a plugin (e.g., Yoast for SEO) that automatically generates and applies basic JSON-LD (supported types are limited) to key pages.
  3. Use Google Tag Manager to put the exact JSON-LD you need exactly where you need it.

To get started with writing your JSON-LD, head to Google’s Introduction to Structured Data or go straight to the source at Schema.org.

Once you have your JSON-LD ready to go in its static form, it’s time to get in on the site using GTM. You’ll need two GTM components: a tag and a trigger.

  • Tag — Type: HTML — Name: {{Schema Type}} JSON-LD — Contents: Your JSON-LD object, wrapped in <script> tags
  • Trigger — Type: Page View — Name: {{Page Name/Type}} — Conditions: Page Path equals /{{your-page}}

That’s literally all there is to it. Test that out in Preview Mode to ensure it’s firing, Publish, and then validate using Google’s Structured Data Testing Tool.

Implementing Dynamic JSON-LD

Static implementation is all well and good for information that’s constant (i.e., static) but what about when your structured data needs to reflect information in flux? Take products on an eCommerce website or event detail pages for example. Prices can change at any time. New events will continually be added over time. That’s a lot of ongoing JSON-LD maintenance.

To avoid having to deal with that, we’d want to make our JSON-LD dynamic. We have three main options we can use to accomplish that.

  1. Develop CMS-driven JSON-LD so the same logic that drives the site (via the CMS) also drives the JSON-LD.
  2. Use Google Tag Manager to harvest on-page data points and roll them into a dynamically created JSON-LD script tag.
  3. Use Google Tag Manager as a proto-CMS that will store all our structured data points and use that to dynamically create a JSON-LD script tag.
  4. Have your CMS populate a dataLayer to provide supplemental (or complete) data points for GTM to use within a dynamic JSON-LD script tag.

We’ll spend the majority of this discussion on the #2 option, but we’ll also touch on the ideas behind #3 and #4 because it might be a good option for some.

It all starts with taking the same basic JSON-LD template (from either Google’s Introduction to Structured Data or Schema.org) and turning it into a reusable shell. To conceptualize this, simply remove every value from the JSON-LD that relates to a specific entity instance. To create an Event entity shell, remove the event name, its date, the ticket price, speaker(s), etc. Anything that’s subject to change for different events. The resulting incomplete JSON-LD is the basis for our dynamic tag.

[code language=”javascript”]
{
"@context": "http://schema.org",
"@type": "Event",
"eventStatus": "http://schema.org/EventScheduled",
"name": " ",
"description": " ",
"image": " "
}
[/code]

So how do we actually get values into this thing? We’ll harness the awesome power of GTM, of course! Each empty slot in the entity shell will need to be populated by a GTM Variable. This is by no means a complete list, but let’s use three data points as examples.

  • Event Image
  • Registration Page
  • Event Month

Getting the Event Image with a GTM Variable

GTM is a pretty nifty tool and sometimes this makes work of this sort blessedly simple. In this example, we want to be able to grab the image associated with an event while recognizing that it could be a different image from page to page. We can create a GTM Variable called “dataPoint — Event Image” and choose the DOM Element variable type. This uses GTM’s built-in ability to locate an element in a page’s Document Object Model and store information about that DOM element as the variable’s value. By choosing CSS Selector as the Selection Method, putting in a valid selector like “.section-hero img”, and specifying SRC as the Attribute Name, we get the event page’s main image every time.

For help getting started with identifying the right CSS selector, read Laura’s great post, “Chrome Extensions We Love: jQuery Unique Selector“, paying special attention to Step #2.

Getting the Registration Page URL with a GTM Variable

In much the same way, we can use GTM’s DOM Element variable type to grab the destination URL of a button or call to action. This will be the data point we use to set the offer URL for the event in the JSON-LD script. To do this we’ll create a GTM Variable called “dataPoint — Register Page” and, again, select the DOM Element variable type. We then choose CSS Selector as the Selection Method, use a selector like “a[title=’register-now’]”, and specify the HREF attribute as what we want to get back.

Boom. The correct registration page URL every time (so long as the CTA has the title attribute “register-now”).

Getting the Event Month with a GTM Variable

Using DOM Element variable is just scratching the surface of what GTM can do. Imagine a scenario where the information you want isn’t a readily available attribute of a DOM element. Let’s take the event month as an example; we’ll need a full ISO date (YYYY-MM-DD) for our JSON-LD, but we’ll look at the month in isolation right now. The challenge is that we’re rarely going to have an event page with a date like 2017-11-30 printed on the surface for us to grab. We need some JavaScript magic.

To do that, we’ll create a new GTM variable but this time we’ll use the Custom JavaScript variable type. We’ll name the variable “dataPoint — Event Month”. Here’s where it gets fun.

With a GTM JavaScript variable, we can provide a custom JS function that executes complex JavaScript and returns a value that we can use. In the example below, we’re doing four things.

  1. Grabs the text of the DOM element matching “.event-date” and splits it into pieces by either “, ” (comma + space) or ” ” (space). The result for “Thursday, November 30, 2017” would be four pieces (i.e., an Array with four values.)
  2. Takes the second Array value (Index 1, since the first is Index 0) and trims it to just the first three letters (just to make it easier to work with later).
  3. We create an associative Array that maps 3-letter month names to their numeric equivalents.
  4. We return (as the value of this GTM variable) the numeric value associated with the 3-letter month we feed into the “months” array.

[code language=”javascript”]
function () {
var dateArray = jQuery(".event-date").text().split(/,\s|\s/);
var monthStr = dateArray[1].substring(0,3);
var months = {
‘Jan’ : ’01’,
‘Feb’ : ’02’,
‘Nov’ : ’11’
};
return months[monthStr];
}
[/code]

Rinse and repeat with clones of this variable to get the day and year (a bit simpler since we can just return dateArray[2] and dateArray[3], respectively). You could even get fancy and adapt the JavaScript above to write out the full ISO date: YYYY-MM-DD.

Putting It All Together

But it’s not enough to just drop our variable names into our JSON-LD entity shell and call it a day. If we did that, we’d find that GTM’s output doesn’t properly fill in the values. We’d see something like this in our JSON-LD tag:

[code language=”javascript”]
"image": google_tag_manager["GTM-M3M24C"].macro(‘gtm12’)
[/code]

That’s not what we want! For the curious, this is explained in greater detail in Chris Goddard’s Moz post on this topic.

What we need to do in order to overcome this is to write a script that properly builds out and attaches a JSON-LD script tag. Here’s a basic example of what that looks like. I’ll break it down below.

[code language=”javascript”]
(function(){

var data = {
"@context": "http://schema.org",
"@type": "Event",
"name": {{dataPoint — Event Name}},
"description": {{dataPoint — Event Description}},

}

var script = document.createElement(‘script’);

script.type = "application/ld+json";

script.innerHTML = JSON.stringify(data);

document.getElementsByTagName(‘head’)[0].appendChild(script);

})(document);
[/code]

  1. The first and last line sets off what is known as an IIFE — an Immediately-Invoked Function Expression.

    [code](function(){})(document)[/code]

    The main thing to understand is that this isolates everything we’re doing (so it doesn’t interfere with any other JS on the site) and immediately runs the code as soon as the <script> tag is ready.

  • The second part (the entire block) is an Object. One which strongly resembles the JSON-LD shell we created. Used in a straightforward way like this, there’s not much difference between an Object and regular JSON-LD, but as you get into more advanced use-cases it’s very important to understand that you need to work with this like an actual JavaScript Object rather than a simple text string (it’s definitely not that).
    For example, if you ever need to add new properties to the model or modify existing ones, you’d use dot notation, e.g., data.location.address.streetAddress would let you add a streetAddress schema property that didn’t previously exist. This is helpful if there are data points that need to be added in only if they exist on the page.
    If that last statement is Latin to you, congrats! You probably won’t have to worry about it much. 😉 The most important thing to understand at this stage is that we’re just building out our data structure and assigning variables (wrapped in double curly braces) as values.
  • Next, our code creates a new <script> tag, i.e., a script element. Not much more to it at this stage.
  • Next, we set the type attribute to “application/ld+json”. This will signal to Google and other engines that this is JSON-LD and not just any old JavaScript tag.
  • Next (this is so awesome), we set the innerHTML, i.e., the contents of said script tag, to a fully-formed JSON-LD block using JSON.stringify.
  • Finally, we select the <head> tag on our page and append our new JSON-LD script to the end of it.

You know what we just did? We put JSON-LD on a site without having to ask a dev team for help and we got to be in control of the entire process. For a lot of marketers (both in-house and agency-side), that’s huge.

Yes, it really works.

Yes, it validates.

Yes, it’ll get you rich snippets, rich cards, and aid entity extraction.

Yes, it’ll show up in Google Search Console’s structured data reports.

What It Means for SEOs & Marketers

At the end of the day, a method like this opens up a world of possibilities. It provides us with infinite flexibility and unlimited power when it comes to providing structured data to search engines for SEO gain. Sure, there’s a considerable learning curve to be reckoned with, but once you get past that point you’re golden.

The Unlimited Power of GTM

Deciding How To Implement

For everything great about this method of structured data implementation, it’s not the end-all be-all. It might not be the best fit for you, especially if you have the ability to modify your website/CMS or you have easy access to development resources/time. Whether or not Google is your primary engine is another consideration — if a lot of your users are loyal to smaller engines, you might want to stay with old-fashioned inline markup.

The choice is yours, but I hope the flowchart below can help you make a good decision.

Structured Data Implementation Flow Chart

Analytics Meets Semantic Markup

One thing that we didn’t have enough time to talk about at TechSEO Boost is what this could mean for analytics. What if we wanted to track, within Google Analytics, which of our pages had which type of semantic entities on them? Or what if we just wanted to leverage this semantic data to improve our analytics implementation beyond the simple use-case of tracking semantic markup? This was the fundamental topic of my post on Semantic Analytics over on Moz. The catch is that I came up with that idea back when inline markup was the only game in town.

Using the GTM & JSON-LD tactics we’ve described so far, the same thing is going to be almost impossible to do. Why? What we’re doing with GTM is adding semantic markup after the fact because semantic markup was not something we had on the website in the first place. Google Tag Manager loads its container, grabs key data points and constructs our JSON-LD on the fly.

And that’s precisely the problem!

By the time that happens, we’ve already sent a pageview to Google Analytics. We want to track a page view right away to account for as close to 100% of our site users as possible; we don’t want to defer page view tracking until we create semantic JSON-LD or other markup, which is what would need to happen if we wanted to send schema.org information to GA along with that page view.

But if we have web development capabilities and/or resources, we can take a better approach. We can use a dataLayer to power both our semantic JSON-LD and our semantic analytics.

Consider this:

[code language=”javascript”]
dataLayer = [{
mainEntity = "Article"}]
[/code]

Our CMS (the application layer) would output properties of each page not into the user-facing HTML (the presentation layer), but into the intermediary dataLayer. Essentially, a JavaScript object that users won’t see but that GTM is specifically designed to work with. In this way, we could specify that the main entity of the page is an “Article” without having to print that to the user experience.

From there, we can create a dataLayer variable right in GTM, give it the same name as the dataLayer property that we want to access in the code (e.g., “mainEntity”), and then we can work with that value in whatever way we want.

  • We can feed “mainEntity” to GA as a Custom Dimension,
  • we can use”mainEntity” as a parameter for Content Groupings, and
  • we can use the value of “mainEntity” to trigger the appropriate JSON-LD script.

In this way, a relatively small development request makes GTM a much more powerful analytics tool and simultaneously allows GTM to become your defacto JSON-LD CMS.

You can, of course, extend this idea even further to output additional content properties to the dataLayer for easy access. That way, you won’t have to worry about as much (if any) DOM scraping with custom GTM variables.

[code language=”javascript”]
dataLayer = [{
mainEntity = "Article",
title = "How to JSON-LD with GTM",
author = "Ruth Burr &amp;amp; Mike Arnesen",
publishDate = "2017-11-30T13:20",
featuredImage = "https://www.upbuild.io/images/gtm-json.png"
}];
[/code]

At that point, you just spin up more dataLayer-type GTM Variables to pull in those values from the dataLayer. Drop them into your dynmaic JSON-LD tag and you’re good to go. Plus, you don’t have to get deep into JavaScript to scrape data off of your page.

That’s a Wrap!

Got questions? Let us know in the comments below.

Oh, and Happy Optimizing!

Related Posts

Leave a Reply

Your email address will not be published.