SEO Spider
How To Automate The URL Inspection API
How to Automate The URL Inspection API
The Google URL Inspection API allows users to request the data Search Console has about the indexed version of a URL, including index status, coverage, rich results, mobile usability and more.
This means you’re able to check in bulk whether URLs are indexed on Google, and if there are warnings or issues.
The URL Inspection API has been integrated into the Screaming Frog SEO Spider, so users can pull in data for up to 2k URLs per property a day alongside all the usual crawl data.
This tutorial shows you how to use the SEO Spider to collect URL Inspection data, options to work with or around the 2k URL limit, and how to automate URL Inspection API data and reporting to monitor indexing.
How to Connect to The URL Inspection API
Click ‘Config > API Access > Google Search Console’, connect to a Search Console account, choose the property and then under the ‘URL Inspection’ tab, select ‘Enable URL Inspection’.
When you perform a crawl, URL Inspection API data will then be populated in the ‘Search Console’ tab, alongside the usual Search Analytics data (impressions, clicks, etc).
The Search Console tab includes the following URL Inspection API related filters –
- URL Is Not on Google – The URL is not indexed by Google and won’t appear in the search results. This filter can include non-indexable URLs (such as those that are ‘noindex’) as well as Indexable URLs that are able to be indexed. It’s a catch all filter for anything not on Google according to the API.
- Indexable URL Not Indexed – Indexable URLs found in the crawl that are not indexed by Google and won’t appear in the search results. This can include URLs that are unknown to Google, or those that have been discovered but not indexed, and more.
- URL is on Google, But Has Issues – The URL has been indexed and can appear in Google Search results, but there are some problems with mobile usability, AMP or Rich results that might mean it doesn’t appear in an optimal way.
- User-Declared Canonical Not Selected – Google has chosen to index a different URL to the one declared by the user in the HTML. Canonicals are hints, and sometimes Google does a great job of this, other times it’s less than ideal.
- Page Is Not Mobile Friendly – The page has issues on mobile devices.
- AMP URL Is Invalid – The AMP has an error that will prevent it from being indexed.
- Rich Result Invalid – The URL has an error with one or more rich result enhancements that will prevent the rich result from showing in the Google search results.
You can export Google Rich Result types, errors and warnings, details on referring pages and Sitemaps via the ‘Bulk Export > URL Inspection’ menu.
How to Focus On Key Sections or Pages
URL Inspection data will be populated against the first 2k URLs found in the crawl, which is breadth-first (ordered by crawl depth) from the start page of the crawl.
Use the SEO Spider configuration to focus the crawl to key sections, pages or a variety of template types.
Some of the main options include –
- Crawl by subdomain or subfolder.
- Use the Include to narrow the crawl.
- Exclude areas that are not important.
- Upload key pages or templates for sampling in list mode.
- Consider adjusting any crawl limits.
Under ‘Config > API Access > Google Search Console’ and the ‘URL Inspection’ tab, you can ‘Ignore Non-Indexable URLs for URL Inspection’, if you’re only interested in data for URLs that are Indexable in a crawl.
This saves wasting the 2k query budget on URLs you don’t care about.
How to Work with The 2k A Day Limit
Google has a 2k query per day and property limit for the URL Inspection API.
Google didn’t build the API to allow webmasters to check if every single URL on their website is indexed. They think it’s pretty normal for some URLs not to be indexed. The purpose of the API is to allow users to check more than 1 URL at a time, and get a better sample across templates outside of GSC.
If you have hit the 2k URLs per day per property limit for the URL Inspection API you will receive this message.
The crawl itself will obviously continue and complete, URLs just won’t continue to be populated with URL Inspection data. If you’d like data for more URLs, then you have two options.
1) Patience (Wait A Day!)
Let the crawl finish, wait for 24hrs, re-open the crawl, connect to the API again and then bulk highlight and ‘re-spider’ the next 2k URLs to get URL Inspection API data.
Or export the previous crawl, copy the URLs you want URL Inspection data from, and upload in list mode.
Before exporting and combining with the previous days crawl data.
2) Verify Multiple Properties & Enable ‘Use Multiple Properties’
You can verify multiple subdomains and subfolders as separate properties in Search Console for a site. Each property would have a 2k URL limit for the URL Inspection API.
For example, all URLs within /blog/ can have their own 2k query limit if verified as a property.
If you have multiple subdomains or subfolders set-up as separate properties, then enable the ‘Use Multiple Properties’ configuration found in ‘Config > API Access > GSC > URL Inspection’.
The SEO Spider will automatically detect all relevant properties in the account, and use the most specific property to request data for the URL.
This means it’s now possible to get far more than 2k URLs with URL Inspection API data in a single crawl, if there are multiple properties set up – without having to perform multiple crawls.
How to Automate URL Inspection Data & Index Monitoring
There’s various ways you can automate crawls and fetch URL Inspection API to monitor indexing of the most important pages on a website. The simplest is to use scheduling, the export for Google Data Studio feature and our new URL Inspection API Data Studio template.
If you’re already automating crawl reports in Data Studio, then you’ll be familiar with this process and there is a page within this report. Let’s run through the process of automating index monitoring.
1) Schedule A List Mode Crawl
Go to ‘File > Scheduling’ and under ‘General’ choose a task and project name and daily interval.
Next, click ‘Start Options’ and switch ‘Crawler Mode’ to ‘List’. For ‘Crawl Seed’, click ‘browse’ and select a .txt file with the URLs you want to check every day for URL Inspection data.
You could crawl a website in regular ‘Spider’ mode if it’s under 2k URLs and gather index data for every single URL.
However, websites are often much larger and there can also be many URLs you don’t care about. So it makes sense to focus on the most important URLs on the site.
This might be the top 10, 20 or 100 URLs, rather than 2k. So many websites have a small number of really key landing pages that drive revenue.
2) Use A Crawl Config with URL Inspection API Enabled
For ‘Crawl Config’ in scheduling ‘Start Options’, ensure you supply a saved configuration file, that has ‘Enable URL Inspection’ activated in ‘Config > API Access > Google Search Console > URL Inspection’.
Setting up a saved configuration is simple. In the SEO Spider interface, just select the configuration you want, then click ‘File > Config > Save As’. This is the file that needs to be supplied in the ‘Crawl Config’.
3) Select The Google Search Console API
Enable the ‘Google Search Console’ API, click ‘Configure’ and select the account and property.
4) Export For Data Studio
On the ‘Export’ tab, enable ‘Headless’ and choose the ‘Google Drive Account’ to export the URL Inspection API data in a Google Sheet.
Next, click ‘Export For Data Studio’ and then the ‘Configure’ button next to it.
The configure button will then show you a list of available metrics from tabs and filters on the left, which need to be selected for the export by clicking the right arrow.
Select ‘Site Crawled’, ‘Date’ and ‘Time’ metrics, and then search for ‘Search Console’ to see the list of metrics available for this tab. Select the bottom 7 metrics, which are related to URL Inspection and click the right arrow.
When the scheduled crawl has run the ‘Export for Data Studio’ Google Sheet will be exported into your chosen Google Drive account.
By default the ‘Export for Data Studio’ location is ‘My Drive > Screaming Frog SEO Spider > Project Name > [task_name]_crawl_summary_report’.
5) Connect to URL Inspection Google Data Studio Template
Now make a copy of our URL Inspection Monitoring Data Studio template and connect to your own Google Sheet with data from the ‘Export for Data Studio’ crawl summary report.
You now have a daily index monitoring system for the most important URLs on the website, which will alert you to any URLs that are not indexed, or have issues.
If you’re not familiar with how to take a copy of a Data Studio dashboard and connect to a different data source, have a read of our ‘Connecting to Data Studio‘ guide, and follow the same process.
Summary
The tutorial should help you utilise the SEO Spider to fetch URL Inspection API data you need.
Check out our Screaming Frog SEO Spider user guide, FAQs and tutorials for more advice and tips.
If you have any further queries, feedback or suggestions to improve our URL Inspection API or Data Studio integration in the SEO Spider then just get in touch with our team via support.