How To Download “Linked From” URLs using Google Search Console API

Getting “Linked from” URLs is a really useful way to find URLs that are linking to pages on your site. This is really beneficial when you want to find and fix URLs that are linking to a page reporting a 404 Not Found Google Search Console, but there’s simply not a great way to get this data out of the Google Search Console interface (in either the old or the new version). Instead, we can use the Search Console API Service programmatic way to access the functionality of the platform), to obtain the information quickly and easily.

404s are a perfectly normal part of the web (and search engines come to expect them), but in an effort to provide a great search experience Google won’t want to include these pages in Search results. So when a link directs users to a page that returns a 404 error, Google recognizes that the page is no longer available and will begin the process to remove it from their index. Furthermore, pages that are linking to 404 errors can cause a bad user experience and this could impact how those pages are treated in Search too. So we will want to make sure that when we know a page is generating a 404 response that we can locate all of the other pages pointing to it as soon as possible.

Retrieving “Linked From” URLs

Step 1: Choose Your Search Console API Service

There are several options to choose from the Search Console API v3 Services page, but for this tutorial, we are going to be specifically choosing:

webmasters.urlcrawlerrorssamples.list.

This service specifically retrieves details about crawl errors (as with all Search Console data, we can only access this information for URLs of a domain that we have access to within Google Search Console).

Step 2: Type Your Site URL

Site URL – Type in URL of the domain from your GSC account that you want to analyze , including the protocol (e.g. https://www.upbuild.io/, not upbuild.io).

Step 3: Choose Your Crawl Error Category

Category – This field provides a drop-down list of multiple error types. For this particular report, we are going to want to choose notFound from the list. Here is the full list of available crawl errors to choose from:

  • authPermissions
  • flashContent
  • manyToOneRedirect
  • notFollowed
  • notFound
  • other
  • roboted
  • serverError
  • Soft404

Step 4: Choose Your Platform Type

Platform – The specific device type that you want to extract data for. In this instance, we’ll be choosing web. Here are the available types to choose from:

  • mobile
  • smartphoneOnly
  • web

Step 5: Choose Your Fields

This selector specifies which fields to include in a partial response. Go ahead and click the ‘Use fields editor’ link (as shown below):

To display the following pop-up modal:

Then go ahead and select all, which will help us retrieve the following:

Fields

  • urlCrawlErrorSample – (Provides information about the sample URL and its crawl error)
  • first_detected – (The date when the error was first detected)
  • last_crawled – (The date when the URL was first crawled)
  • pageUrl – (The URL of the page generating the crawl error)
  • responseCode – (The numeric response code – e.g. 404)
  • urlDetails – (Retrieves additional details about the URL generating the error):
    • containingSitemaps – (Shows sitemap URLs pointing to the crawl error)
    • linkedFromUrls – (Our main reason for this initiative! Displays the source of the crawl error)

After all, is said and done, you should have something that looks like this:

Step 6: Authorize and Execute

Now let’s go ahead and click that Authorize and execute button. You should be greeted with the following message:

There are no additional scopes that we need to add, so let’s just click Authorize and execute. You may encounter a prompt asking you to sign into your Google account, so be sure to sign into the Google account that has permission to access the Search Console property in question. If everything goes as expected you should now see a message declaring that the webmasters.urlcrawlerrorssamples.list was executed, along with the initial GET request and the JSON response for all of the parameters we had set earlier:

Step 7: Convert the JSON Response

Next, you simply copy and paste the JSON response (everything after the -Show headers- text) into the text field (as shown below) into JSON-CSV.com.

As soon as you paste the JSON response, you should then see a table compiling the information:

As you can see in the far right column, under urlDetails__linkedFromUrls__001, (the first in the sequence of many), we now have a more actionable dataset that can help us fix the source URLs for pages generating notFound errors in Google Search Console.

That’s it! As you could see during our setup to extract this information, there are other Search Console services, categories, and fields of data that you can use the API to explore, and with the introduction of the new Search Console, hopefully, we can look forward to an even more extensive Search Console API v4. Hey, it’s almost Christmas, we can hope, right?

Written by
James McNulty was born in Sidcup, Kent England in 1985. James now lives in North Richland Hills, Texas with his wife Megan, and two dogs Colin and Davey. James has been building websites since 1999 and is currently a Senior Marketing Strategist at UpBuild.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *