International SEO is a complex affair that demands reckoning with questions far beyond the scope of traditional SEO. First, there are the concrete matters of where your product or service is sold, and which languages your company offers customer support in. Beyond that lie more nuanced considerations like the level of locality at which you want to be competitive in search within each country, the extent to which your product (or at least your branding) differs from country to country, and the breadth of geographical distribution of your site’s servers.
These considerations, along with the others that spiral out from them, will inform your optimal web infrastructure, your messaging, and nearly everything else, and accordingly they’ll have to be hammered out carefully over the course of conversations with just about every department you’ve got. [If you’re starting down this road in earnest right now, do yourself a favor and look to this post from Aleyda Solis, by consensus the world’s foremost expert on the subject, on State of Digital, then move on to the checklist post she wrote for Moz as often as you need to.]
Once you’ve gotten those questions answered, and you’ve finally got a site (or network of sites) that covers your entire global customer base, there will be this one last — crucial but comparatively simple — step: setting up your hreflang tags. Introduced in 2010 by the Internet Engineering Task Force, the hreflang tag is an HTML element that webmasters can use to tell search engines what language and country a given page is intended for, and where (if anywhere) its translations or equivalents for other countries can be found. Failure to use them properly could result in Google serving up the wrong version of a page to a particular location, or placing translations of the same page in competition with one another for rankings within the same area. So while hreflang tags aren’t a panacea for international SEO, the efforts you make in addressing all the big-picture concerns from the above paragraph will be fruitless without them. And compared to all of the stuff in the above paragraph, they’re relatively simple. So, in the interest of keeping to my 10-minute promise, let’s assume that you’ve got all that stuff figured out already, and skip to this, your last step.
A Quick Nomenclature Clarification
Despite its name, the hreflang tag is not actually a new and specialized element, such as were introduced as part of HTML5. It’s merely a <link> tag that into which a new and specialized attribute, called hreflang, is placed alongside the traditional href attribute that indicates the URL that you’re linking to. The <link> tag is already commonly used in other SEO scenarios, perhaps most often as a container for the rel=”canonical” attribute. In much the same way that we colloquially refer to these kinds of <link> tags as “canonical tags”, so do we say “hreflang tags”. So I’m glad we’ve gotten that out the way.
How to Construct Hreflang Tags
Now here’s how you construct them.
First, gather together all the pages across your sites that are translations of one another, or that are equivalent pages intended for different localities. For each group of related pages, make yourself a little list in which you note the URL of each one, along with the language it’s in, and (if applicable) the country that it’s intended for. [If every page on your site is offered in multiple languages and/or subject to geographical variance, you might want to make a spreadsheet, like the below, that identifies each page by some Master name and then indicates the URL and language of every version. This way, each row of the spreadsheet will indicate an array of hreflang tags that you have to make.]
Note that you have to distinguish the versions by language or else the tag won’t work (and why else would you be doing this?), but distinguishing by country is optional and subject to your site’s particular needs.
Once you have that list, visit your good friend Wikipedia and look up the internationally observed two-letter ISO codes for each country that it’s intended for. Do not skip this step! Do not assume that you will guess right. I once had to clean up a predecessor’s messy implementation in which he had chosen “uk” as the two-letter code for “United Kingdom”, which, if he’d bothered to look up his codes, he would have learned actually referred to Ukraine. The code for the country he wanted was actually “gb” (Great Britain). Look up your codes. It’s worth the fifteen seconds that it takes.
Then, armed with these two pieces of information, you can easily transform your list of equivalent pages into an array of hreflang tags by reference to the below model.
Anatomy of an Hreflang Tag
An hreflang <link> tag must contain three attributes in order to be recognized as such:
- href. This is where the URL of the version in question goes. It wouldn’t be a <link> tag without this attribute.
- hreflang. The namesake attribute! This is where your two-letter ISO codes go. If you’ve chosen to distinguish the pages purely by language, then all you need is your two-letter language code. If you’re adding the country designator as well, you’ll attach that to your language code with a hyphen.
- rel. This is the attribute that indicates the relationship between the page containing the tag and the page referenced in the href attribute. The good news about this one is that in hreflang world, it will always be the same: rel=”alternate”. That simple line is the critical datum telling Google that the URL in the href attribute is an alternate version of the page it’s currently on.
Let’s imagine a site called example.com, whose homepage is made available in three languages: English, Spanish, and Japanese. Let’s also assume that they don’t sell a product or otherwise have reason to concern themselves with the geographical location of their visitors, and thus can limit the scope of their hreflang tags to language. Assuming also that they’re segregating their languages by subdirectory, the hreflang tag array for these three versions of the homepage is going to look like this:
And that’s it. The message to search engines is clear: these three pages are alternates of one another, and they differ by language, and here’s which language corresponds to which version.
Now, let’s imagine a bigger site — a retailer, say — that sells a given product in ten different countries, and needs Google to serve the geographically correct version of the product detail page to its search audience so that they will see the page not only in the correct language, but with the product’s price displayed in the correct currency. This is a perfect example of a case in which one absolutely would need to make use of both the language code and the country code in one’s hreflang attributes. Let’s imagine the ten countries are: the US, Canada, Mexico, the UK, France, Spain, Belgium, Sweden, Japan, and Australia. Let’s also assume that the subdirectories they use to separate their audiences have the clearest possible names. Here’s what the hreflang array would look like:
What did you notice there? We had to account for language and country as separate quantities, which means we ended up with several countries for which pages needed to be offered in more than one language, and several languages that applied to more than one country. So our hreflang array had to capture every applicable language-country combination.
What else did you notice? That a given language code is not necessarily going to be the same two letters as the country code for that language’s country of origin. Note that it’s “sv” for Swedish, the language, but “se” for Sweden, and “ja” vs. “jp” for Japanese/Japan. Just another reminder to look up your codes.
Where to Place Hreflang Tags
As it happens, there are three different places on your website where you can place your various hreflang tags, and as the effect seems to be equivalent regardless, you can choose the one that burdens your developers the least.
The three available locations for hreflang tags are:
- The <head> of each page in each array,
- Your XML sitemap,
- Your HTTP headers.
I’ll provide some brief examples of how each implementation looks, but first, a note of caution. Whichever location you choose, you’ll be obliged to satisfy one critically important condition: reciprocity! Your tags must be reciprocal, which means that every page in an array must bear tags that reference every page in the array, including itself. Let’s reproduce the simpler of our two examples from above:
If you were to choose the <head> of the three pages as your preferred location for placement, this array of three tags would have to appear in exactly this form in the <head> of all three of the pages that it references. This does indeed mean that one of the tags in your every array will necessarily be self-referential. It’s supposed to be. Google wants to be able to crawl all three pages and have each one tell it the complete story, and in the exact same words. So it isn’t enough for one page’s array simply to reference the other pages in the array. Every one must reference every one.
How does this rule apply to an XML sitemap implementation? I’m glad you asked!
XML sitemaps identify URLs using a two-level tag structure, like this:
Happily, there is a convention that allows us to specify alternate pages for a given page within its <url> tag. Returning to our above example, here’s how it would look:
Beware once again the reciprocity pitfall! You might see the above example and instinctually believe it has satisfied reciprocity because it contains a self-referential tag; this is the entry for the English version of the homepage and it references itself as an alternate. But you’d only be half-right. To be completely reciprocal, all three pages’ <loc> tags would need to bear this full three-page array of <link> tags. After all, in order to qualify as a complete XML sitemap, every URL on the site would need its own <loc> tag, right? By that same logic, in order to have a truly complete array, you’d have to write this:
Now you’re set!
And what if you want to place the markup in your HTTP headers? This is unlikely to be your first choice on a site that’s mostly traditional HTML web documents, but as it’s the only choice available to you if you want to arm non-HTML resources (like PDFs) with hreflang markup, it’s worth learning. This approach simply requires different punctuation; you need to separate your URL callouts with commas, and your attributes with semicolons.
Here’s how we would configure this for a PDF on our example.com domain, with its three languages:
To satisfy reciprocity, all you’d need to do is be sure that this exact string of code appeared in the HTTP header of all three versions of the PDF. And boom.
X-Default: The Final Wrinkle
There’s one final piece to this puzzle. About two years after the hreflang tag was first introduced, a new special value for the hreflang attribute, called “x-default”, was added to cover versions of pages that were not bound to a specific language. This value is very commonly misused — heck, even I misused it when I first learned it — so let me save you from the troubles I ran into.
There is only one (very specific) kind of page version for which “x-default” is meant: a version from which a visitor can manually select the language-country version of her choosing. The example Google gives on its own support page is “your homepage showing a clickable map of the world”. That’s what it’s for, and that’s the only thing it’s for. It is not to be used on the English-language versions of your pages simply because English speakers make up the largest share of your audience and you would prefer that otherwise unspecified visitors view the English version. No, no, no. It doesn’t matter how large your readership is within each language or country. “X-default” is only for people who haven’t chosen their preferred language yet.
That just about covers it! With great thanks to the aforementioned Aleyda Solis, as well as Google’s Search Console support pages, I hope I’ve given you a working playbook for international SEO markup. Let me know how this works out for you. And although I only know how to say it in English: happy optimizing!