Screaming Frog SEO Spider Update – Version 5.0

Dan Sharp

Posted 4 September, 2015 by in Screaming Frog SEO Spider

Screaming Frog SEO Spider Update – Version 5.0

In July we released version 4.0 (and 4.1) of the Screaming Frog SEO Spider, and I am pleased to announce the release of version 5.0, codenamed internally as ‘toothache’.

Let’s get straight to it, version 5.0 includes the following new features –

1) Google Search Analytics Integration

You can now connect to the Google Search Analytics API and pull in impression, click, CTR and average position data from your Search Console profile. Alongside Google Analytics integration, this should be valuable for Panda and content audits respectively.

Search Analytics integration

We were part of the Search Analytics beta, so have had this for some time internally, but delayed the release a little, while we finished off a couple of other new features detailed below, for a larger release.

For those already familiar with our Google Analytics integration, the set-up is virtually the same. You just need to give permission to our app to access data under ‘Configuration > API Access > Google Search Console’ –

connect to search console

The Search Analytics API doesn’t provide us with the account name in the same way as the Analytics integration, so once connected it will appear as ‘New Account’, which you can rename manually for now.

rename search console account

You can then select the relevant site profile, date range, device results (desktop, tablet or mobile) and country filter. Similar again to our GA integration, we have some common URL matching scenarios covered, such as matching trailing and non trailing slash URLs and case sensitivity.

search console connected

When you hit ‘Start’ and the API progress bar has reached 100%, data will appear in real time during the crawl under the ‘Search Console’ tab, and dynamically within columns at the far right in the ‘Internal’ tab if you’d like to export all data together.

There’s a couple of filters currently for ‘Clicks Above 0’ when a URL has at least a single click, and ‘No GSC Data’, when the Google Search Analytics API did not return any data for the URL.

no GSC data

In the example above, we can see the URLs appearing under the ‘No GSC Data’ filter are all author pages, which are actually ‘noindex’, so this is as expected. Remember, you might see URLs appear here which are ‘noindex’ or ‘canonicalised’, unless you have ‘respect noindex‘ and ‘respect canonicals‘ ticked in the advanced configuration tab.

The API is currently limited to 5k rows of data, which we hope Google will increase over time. We plan to extend our integration further as well, but at the moment the Search Console API is fairly limited.

2) View & Audit URLs Blocked By Robots.txt

You can now view URLs disallowed by the robots.txt protocol during a crawl.

Disallowed URLs will appear with a ‘status’ as ‘Blocked by Robots.txt’ and there’s a new ‘Blocked by Robots.txt’ filter under the ‘Response Codes’ tab, where these can be viewed efficiently.

URLs blocked by robots.txt

The ‘Blocked by Robots.txt’ filter also displays a ‘Matched Robots.txt Line’ column, which provides the line number and disallow path of the robots.txt entry that’s excluding each URL. This should make auditing robots.txt files simple!

Historically the SEO Spider hasn’t shown URLs that are disallowed by robots.txt in the interface (they were only available via the logs). I always felt that it wasn’t required as users should know already what URLs are being blocked, and whether robots.txt should be ignored in the configuration.

However, there are plenty of scenarios where using robots.txt to control crawling and understanding quickly what URLs are blocked by robots.txt is valuable, and it’s something that has been requested by users over the years. We have therefore introduced it as an optional configuration, for both internal and external URLs in a crawl. If you’d prefer to not see URLs blocked by robots.txt in the crawl, then simply untick the relevant boxes.

URLs which are linked to internally (or externally), but are blocked by robots.txt can obviously accrue PageRank, be indexed and appear under search. Google just can’t crawl the content of the page itself, or see the outlinks of the URL to pass the PageRank onwards. Therefore there is an argument that they can act as a bit of a dead end, so I’d recommend reviewing just how many are being disallowed, how well linked they are, and their depth for example.

3) GA & GSC Not Matched Report

The ‘GA Not Matched’ report has been replaced with the new ‘GA & GSC Not Matched Report’ which now provides consolidated information on URLs discovered via the Google Search Analytics API, as well as the Google Analytics API, but were not found in the crawl.

This report can be found under ‘reports’ in the top level menu and will only populate when you have connected to an API and the crawl has finished.

ga & gsc not matched report

There’s a new ‘source’ column next to each URL, which details the API(s) it was discovered (sometimes this can be both GA and GSC), but not found to match any URLs found within the crawl.

You can see in the example screenshot above from our own website, that there are some URLs with mistakes, a few orphan pages and URLs with hash fragments, which can show as quick links within meta descriptions (and hence why their source is GSC rather than GA).

I discussed how this data can be used in more detail within the version 4.1 release notes and it’s a real hidden gem, as it can help identify orphan pages, other errors, as well as just matching problems between the crawl and API(s) to investigate.

4) Configurable Accept-Language Header

Google introduced local-aware crawl configurations earlier this year for pages believed to adapt content served, based on the request’s language and perceived location.

This essentially means Googlebot can crawl from different IP addresses around the world and with an Accept-Language HTTP header in the request. Hence, like Googlebot, there are scenarios where you may wish to supply this header to crawl locale-adaptive content, with various language and region pairs. You can already use the proxy configuration to change your IP as well.

You can find the new ‘Accept-Language’ configuration under ‘Configuration > HTTP Header > Accept-Language’.

Accept-Language Header

We have some common presets covered, but the combinations are huge, so there is a custom option available which you can just set to any value required.

Smaller Updates & Fixes

That’s the main features for our latest release, which we hope you find useful. Other bug fixes and updates in this release include the following –

  • The Analytics and Search Console tabs have been updated to allow URLs blocked by robots.txt to appear, which we believe to be HTML, based upon file type.
  • The maximum number of Google Analytics metrics you can collect from the API has been increased from 20 to 30. Google restrict the API to 10 metrics for each query, so if you select more than 10 metrics (or multiple dimensions), then we will make more queries (and it may take a little longer to receive the data).
  • With the introduction of the new ‘Accept-Language’ configuration, the ‘User-Agent’ configuration is now under ‘Configuration > HTTP Header > User-Agent’.
  • We added the ‘MJ12Bot’ to our list of preconfigured user-agents after a chat with our friends at Majestic.
  • Fixed a crash in XPath custom extraction.
  • Fixed a crash on start up with Windows Look & Feel and JRE 8 update 60.
  • Fixed a bug with character encoding.
  • Fixed an issue with Excel file exports, which write numbers with decimal places as strings, rather than numbers.
  • Fixed a bug with Google Analytics integration where the use of hostname in some queries was causing ‘Selected dimensions and metrics cannot be queried together errors’.

Small Update – Version 5.1 Released 22nd October 2015

We released a small update to version 5.1 of the SEO Spider, which just include some bug fixes and tweaks as below.

  • Fixed issues with filter totals and Excel row numbers..
  • Fixed a couple of errors with custom extraction.
  • Fixed robots.txt total numbers within the overview section.
  • Fixed a crash when sorting.

That’s everything for this release!

Thanks to everyone for all the suggestions and feedback for our last update, and just in general. If you spot any bugs or issues in this release, please do just drop us a note via support.

Now go and download version 5.0 of the SEO Spider!

Dan Sharp is founder & Director of Screaming Frog. He has developed search strategies for a variety of clients from international brands to small and medium-sized businesses and designed and managed the build of the innovative SEO Spider software.

99 Comments

  • Emanuele Vaccari 9 years ago

    Great update as always!
    The only thing thats lacking is the option to save the column order in the reports :)

    Reply
    • screamingfrog 9 years ago

      Hey Emanuele,

      Thank you, good to hear you like it.

      Column ordering has been bumped up our ‘todo’ list for the next version. So you will be able to customise columns in the UI (and for reporting too).

      Cheers for the nudge as well.

      Dan

      Reply
      • Thomas Rosenstand 9 years ago

        Still holding my breath, Dan :-)

        Reply
      • Idan 9 years ago

        Hi guys,
        Thank you for the great work :)

        Is there any chance for life-time license?

        Reply
        • screamingfrog 9 years ago

          Hey Idan,

          No worries, hope it helps out.

          At this moment, we don’t have any plans for a life-time licence, just annual.

          Cheers.

          Dan

          Reply
  • Matt Roberts 9 years ago

    Just picked up a license a few weeks ago for work on some bigger sites :)

    I got super pumped when I saw the new feature to integrate Google Analytics data! This saves lots of time from having to append that data manually, and helps us make better decisions by letting us read a story told via metrics… <3

    Reply
  • Antonio Casano 9 years ago

    what about adding yandex metrika (metrica.yandex.com) support?

    Reply
    • Rosparket 8 years ago

      I’ve a similar question/problem

      Reply
      • screamingfrog 8 years ago

        Good question. I’ll include it on our list to discuss for possible development!

        Thanks for the suggestion.

        Dan

        Reply
  • Umut Hocaoglu 9 years ago

    When will you guys have an API for the entire program? Can’t wait for that day !!!

    Reply
  • Dimitar Georgiev 9 years ago

    Another great update of this extremelly usefull software. Not just like you, i really love you guys :D Every SEO in the world must use your tool if he wants to rank well in SERP.

    p.s. Same question as above asked by mr. Casano “what about adding yandex metrika (metrica.yandex.com) support?”

    Reply
    • screamingfrog 9 years ago

      Thanks Dimitar!

      To your query (and Antonio’s earlier), thanks for the suggestion. We will take a look :-)

      Reply
  • Morgan 9 years ago

    Thanks for combining updates together in one release!

    Reply
  • Everett Sizemore 9 years ago

    Excellent! This is going to save us a lot of time and Vlookup headaches.

    Reply
  • John 9 years ago

    Quick question, but is there a way to audit sitemaps across all URLs? For example, this URL is included in the sitemap – keep up the good work!

    Reply
    • screamingfrog 9 years ago

      Hey John,

      Quick answer, no, not right now – another on the list!

      The only way to do it at the moment, is upload in list mode (you can upload the .xml file directly), crawl and compare against a site crawl.

      Cheers.

      Dan

      Reply
  • Nakul Goyal 9 years ago

    Love it Dan. Love all the updates as always. Screaming Frog is indeed a tool that I can’t be without on my laptop. This is probably the only tool that I install on a new laptop after I install Chrome and some extensions :). Love it. Cheers.

    Reply
  • Tony Edward 9 years ago

    Great Update! Been waiting for these features!

    Reply
  • David 9 years ago

    Going to love the blocked robots.txt function. Just another check to make sure that I don’t miss something!

    Reply
  • Cédric 9 years ago

    The Google Search Analytics integration is awesome !! Thanx !

    Reply
  • Dennis Hüttner 9 years ago

    Best Update at this time ;-) i will stay tuned

    Best Regards

    Reply
  • Dan 9 years ago

    Dan – probably obvious (haven’t tried it I admit!) but is it straightforward to connect GA/GSC to various/different IDs/Logins?
    Many/most of us have different credentials based on client(s), so easy to add them all and swap around/choose?
    Ta
    (the best just gets better and better!)
    Dan
    (COYG)

    Reply
    • screamingfrog 9 years ago

      Hey Dan,

      Yeah you can add multiple Google Accounts (for GA / GSC), so it’s easy to just choose and switch to the one you want, you don’t need to re-type & re-authorise each time etc :-)

      Cheers

      Dan

      Reply
  • weddingplz 9 years ago

    How many links this spider software can crawl?

    Any facility for unlimited pages crawl in this.

    Reply
  • Kamil Kanigowski 9 years ago

    Best update! no i can view URLs disallowed by the robots.txt <3 Thank You Screaming Frog – work without You would be hard :)

    Reply
  • bizzit 9 years ago

    Hello,

    I don’t see very important info, which dissapears in version 5.0 => where is “from localization” eg. for external links (info about subpages where is highlighted, on the main table, external link) ?

    In version 4.x it was in the bottom table (now is name and value column only).

    ??

    Reply
    • screamingfrog 9 years ago

      Hi Bizzit,

      We’ve never had anything named ‘from localization’?

      I’m not entirely sure what you mean unfortunately either. You could revert back to 4.0 (amend the download URL from our SEO Spider download page) and send me a screenshot?

      Cheers.

      Dan

      Reply
  • Daniel 9 years ago

    Great software, one of the best seo tools ever. keep up the good work!

    Reply
    • Hamangalistim 5 years ago

      A very useful tool. To me it helped a lot in finding all the links on my site. I learned a lot and I work with him regularly every day. Well done, keep up the bad work

      Reply
  • Sandro Alvares 9 years ago

    What happen option a create sitemap is removed free?

    thanks.. im used 4.0 is good work! … 5.0 not work more. dafuq

    Reply
    • screamingfrog 9 years ago

      Hey Sandro,

      The sitemap creation feature is still available in free!

      Nothing has changed here between 4.0 and 5.0 :-)

      Cheers.

      Dan

      Reply
  • Josh Thomas 9 years ago

    Thats awesome! Good job guys.
    Its great to see you guys evolving and continuing to be creative and innovative. Thanks for the analytics integration, means a lot.

    cheers
    Josh

    Reply
  • Jonathan Ridehalgh 9 years ago

    I’m just getting starting using the tool, so far it has proved invaluable already at quickly highlighting ways to improve my site, many thanks for the free option

    Reply
  • Shounak Gupte 9 years ago

    Your software is amazing! Saves heaps of time! :)
    Keep up the good work.

    Reply
  • Tom Binga 9 years ago

    The only missing still is being compatible with anyone who has a filter set up in their GA account to show the full URL.

    Right now the GA results in SF still show blank.

    Reply
    • screamingfrog 9 years ago

      Hey Tom,

      Yeah, we haven’t included this as a option just yet (like matching trailing slashes or upper & lowercase characters).

      Extended URI filters appear to be pretty common though, so we will do this at some point.

      For now (if possible), I’d recommend just using a raw untouched ‘view’ in GA (always recommend everyone has at least one).

      Cheers.

      Dan

      Reply
      • Lauren Ancona 9 years ago

        +1 for this. The problem with using your ‘No Filters’ view for this is that the point of the filters in some cases is rolling up reporting numbers when you’re working in the context of a terrible, legacy CMS with duplicate content mapped all over the place, so the analytics nums won’t be ‘right’.

        All told I’m a huge fan & don’t want to ignore the many awesome features included in the last release – I just want to be able to use them! Thanks for all your hard work.

        Reply
  • Mooka 9 years ago

    Amazing software. Great work!

    Reply
  • Adam 9 years ago

    This is the first version that does not run very well if I can get it to run. Most of the time I can’t even get it to launch unless I restart the computer and don’t run anything else. No problems at all with previous version. Maybe the overhead is much higher or something but it’s sad to see a great product start to have issues.

    Reply
  • izhak agam 9 years ago

    Great software saves a lot of time each organic SEO, thanks

    Reply
  • Onebeeper 9 years ago

    Nice update ! Thank you for your great job. :)

    Reply
  • Eliav L. 9 years ago

    A really nice update with helpful features for SEO guys :)
    Thanks.

    Reply
  • Anthony 9 years ago

    Great release! Any plans to give visibility into the Google Search Console keywords?

    Reply
    • screamingfrog 9 years ago

      Hi Anthony,

      Thanks for the suggestion. Honest answer, probably not, but I’ll give it some thought. The guys are URL Profiler do this I believe, so I recommend checking them out.

      We’ve added a filter so you can exclude/include certain keywords though which should make it more useful (so you can exclude brand etc).

      Cheers.

      Dan

      Reply
  • Site 9 years ago

    Good software!)
    Good job, guys.

    Reply
  • Solonia 9 years ago

    This is a very helpful SEO Tool, how much is a lifetime Licence? After buying this latest version software after that, you can upgrade on software after I will pay with another money or one-time payment you can support with lifetime and upgrade automatically. I’m waiting for your answer.

    Reply
    • screamingfrog 9 years ago

      Hey Solonia,

      We only do an annual licence at the moment, which includes upgrades and support, too.

      Cheers.

      Dan

      Reply
  • Esteban 9 years ago

    I use software to search expired domain on .edu and .gov websites. It’s very usefull for me. Thx ;)

    Reply
  • Phillip 9 years ago

    thanks for the update

    Reply
  • Paula 9 years ago

    Super update, thanks ;)

    Reply
  • SEOWHO 9 years ago

    The best software I worked with no doubt at all!
    Most recommended SEO software today!

    Reply
  • Cesar Florero 9 years ago

    Can anyone provide some help? I am able to link to both GA and GSC, but there is no data in either of both tabs belonging to GA and GSC.

    The columns for the page/URL analytics are empty. Based on what i have seen in other sites that reference the integration they have data being displayed? I am not really sure if I am doing something wrong here.

    any help will be appreciated.

    Cesar

    Reply
  • fenix 9 years ago

    Hi, I can not find help using the error STATUS: DNS lookup failed. By this I can not find fault 405. Can I ask for help?

    Reply
  • Cassi 9 years ago

    Are we able to save the search console data – so that we have a record past 90 days ?

    Reply
    • screamingfrog 9 years ago

      Hi Cassi,

      You can run a crawl, collect search console data for 90 days and save it as a project.

      But you can’t just download search console data and save it (has to be part of the crawl etc).

      Cheers.

      Dan

      Reply
  • Actualité SEO 9 years ago

    Thanks for the update! :)

    Reply
  • Jacob SEO 9 years ago

    I ran a crawl and SF says there is no GA or SC data but yet when looking at both i see data. Any reason SF would say there is none when connected to the APIs?

    Reply
  • amit malki 9 years ago

    Great tool. I teach on this tool, on courses I teach!

    Reply
  • Nir Levi 9 years ago

    hands down one of the best SEO tools out there!

    Reply
  • jay behm 9 years ago

    Great update as always. This is very helpful SEO tool. Thanks!

    Reply
  • Tom 9 years ago

    Hi, first of all you would like to thank you for the amazing product you guys have, secondly when I’m crawling a site and the site has some URL’s that are blocked for robots, can I configure the the Frog in some way that it will still fetch the info blocked from crawling ?
    thank you

    Reply
    • screamingfrog 9 years ago

      Hi Tom,

      Appreciate the kind words!

      Yes, you can choose to ‘ignore robots.txt’ (under the spider configuration) to crawl pages which are blocked :-)

      Cheers.

      Dan

      Reply
  • Antoine Girault 9 years ago

    I have to admit that this tool has become my favourite SEO tool, cheers ;)

    Antoine

    Reply
  • Zico 9 years ago

    Hi, I’d like to ask about the feature that shows URLs blocked by robots.txt. – Does this feature only works in the spider mode? What my problem is – when I upload a .txt file with URLs that I know they are blocked in robots.txt it didn’t show me that. I assume that this is because in list mode there can be 100 urls from 100 different domains and SF would have to check 100 different robots.txt files?

    Reply
    • screamingfrog 9 years ago

      Hi Zico,

      When you upload URLs in list mode, the ‘ignore robots.txt’ configuration gets automatically ticked (as it assumes you want to crawl them, not be blocked).

      You can untick this box and then view URLs blocked by robots.txt.

      Cheers.

      Dan

      Reply
  • Dersus 9 years ago

    Great software as always

    Reply
  • David 9 years ago

    Hi Dan, love the software and I like the GA integration a lot. However, I still run into quite a few urls not being matched with Analytics (where all GA cells remain empty), even when these page urls match exactly with the url in Analytics. Any idea what causes this and how this can be fixed? Or is there a limit on data being pulled from GA?

    Reply
  • seomulti 8 years ago

    Hi I have a question, as soon as the program is being run-it does give the result in real time? wean we fixed a problem?

    Reply
    • screamingfrog 8 years ago

      Hello :-)

      Yes, the SEO Spider runs and displays data in real-time. So if you’ve fixed an issue, then run a crawl, it will give you the latest changes.

      Obviously it doesn’t fix any issues for you, it just crawls and displays the data in real-time.

      Cheers.

      Dan

      Reply
  • nextsite 8 years ago

    Hey, can i buy lifetime license?

    Reply
  • Shahar Azar 8 years ago

    Good job guys, one of the best seo tools ever!

    Reply
  • Matt Kellogg 8 years ago

    Just purchased this tool 4 days ago and my technical understanding of SEO has increased dramatically already. I believe I’ll be using ScreamingFrog for a long time and every SEO that is serious should purchase it as well. Thank you guys!

    Reply
  • eMojo 8 years ago

    Hey,
    The Best Software for SEO, Best Price.
    my favourite SEO tool,
    Cheers,
    Shay

    Reply
  • Azhar M 8 years ago

    I just downloaded the Screaming frog SEO, so just wanted to say thank you so much creating such a wonderful tool. Really really appreciate it a million.

    Reply
  • Still my favorite tool – keep up the good work.

    Reply
  • Ben Watch 8 years ago

    Great software!! one of the best SEO tools ever. save a lot of time very recommended

    Reply
  • Robert 8 years ago

    I know I’m late to the party but I this is hands down a must have tool. The GA integration is outstanding… our team wouldn’t be as efficient without it. Definitely had a few quirks at first but everything has come together smoothly. Thank YOU guys!

    Reply
  • baruch 8 years ago

    Excellent software.
    Using it for some time.
    Worth every penny

    Reply
  • Sebastian 8 years ago

    Screaming Frog is the best tool when you have to migrate from one domain to another. Last time I had to move site with almost 1 million URLs. Imagine how long would it take if I had to do it manualy.

    Reply
  • Roberto 8 years ago

    Hi,
    Great great software, very useful!

    What about integrating a double spider for testing mobile websites (different user-agents) at the same time? I have to always launch 2 spidering sessions at a time…

    Thank you!

    Roberto

    Reply
  • Avraham Regevsky 6 years ago

    Wow! Simply my favorite tool for fixing errors and finding pages with “no response” for my clients websites. Great features and very precise compared to other SEO tools that I checked. Thank you Dan

    Reply
  • Dreamer 5 years ago

    The Google Search Analytics integration is awesome !! Thanx !
    Excellent software.
    Using it for some time

    Reply
  • Ema Online 4 years ago

    Love this Frog :)

    Reply
  • Get Card 4 years ago

    My favorit tool! i always recommend my SEO students to use it.
    Thank you!

    Reply

Leave A Comment.

Back to top