Screaming Frog SEO Spider Update – Version 1.60
Dan Sharp
Posted 13 June, 2011 by Dan Sharp in Screaming Frog SEO Spider
Screaming Frog SEO Spider Update – Version 1.60
I am really excited about the latest release of the Screaming Frog SEO spider, version 1.60. It’s without a doubt our biggest update since we launched the tool about 6-months ago and a number of the new features have come from feedback and suggestions from users. So thanks to everyone who taken the time to e-mail, tweet or grab me at a conference with suggestions! It’s genuinely appreciated and really helps us improve the tool.
So, lets take a look at what’s new in version 1.60 –
- Crawling At Scale – We have a couple of really big features in this release, but probably the update we are most proud of is actually the performance of the spider crawling websites at scale. In old versions of the spider, you could experience slow down of crawl when crawling a significant number of URI. We have worked really hard to optimise the performance here (so CPU use is less intensive!) and it’s now possible to crawl a much larger number of URI, without any speed slowdown. As an example, by upgrading the RAM allocation to 1,500mb the spider easily crawled over half a million pages of the BBC news section. With further RAM allocation, we were able to crawl into millions.
- Save Crawl Function – We now have a save crawl function. This was probably our most requested feature. It allows you to save a crawl at anytime and re-open the .seospider file in the SEO spider and open or resume the crawl. Thanks to everyone who suggested this feature, I believe it was Ian Macfarlane who was the first in particular.
- Bulk Export Of Links – The SEO spider stores a huge amount of data on linkage of a site. So, we made this available in the new export function which allows you to export every single instance of a link found by the spider, the source, the destination, alt or anchor text and status code. So for example, you can now export all inlinks to all pages which 404 error instead of exporting individually for each URI. You can now get extremely granular data of linking relationships on a site, so I will be interested to see how people use this data.
While these were the main updates, we also added some other very cool features –
- Search Function – You can now search for anything in the spider. This is case sensitive and works on all columns in the tab you are viewing. So switch to the appropriate tab if it’s something specific in a page title or meta description etc.
- Gzip content encoding was improved
- Support of proxy and web server authentication – We support both Basic and Dgest authentication
- Smart file names for exporting data
- Remember last access folder for file open/save and exporting
- Included a open recent saved crawls option
- Included a recent crawled domains option. Thanks to Rob Nicholson for the suggestions here (and more to come in this area).
We also fixed a few bugs for –
- Fixed crash with lower window pane menu right click
- A parsing issue for relative urls starting with ?
- A bug effecting urls linked to with an umlaut
- Fix for commas in urls when uploading list files
We still have a huge ‘todo’ list for the spider and continue to get great feedback from users. However, if there is anything you would love to see, then please leave your feedback below. If we don’t already have it on the list, we will add anything with real value.
Thanks again to everyone for all their support of the tool!
What’s RAM go to do with it? I can scrape millions with under 100MB of RAM usage. It’s called HDD caching. If you don’t you will not scale properly. Limiting usage/depth/number of pages by RAM is very wrong.
Export XMLs with what you store in memory and recombine when needed. Will be very fast. Or better, hook into a MySQL instance and pour data in it. In desperate times just use an Access MDB Database or SQLite as it’s standalone… but rather slow IMHO.
Regards.
Hey 5ubliminal,
Rather than post our development response, I’ll just say thanks for your opinion :-)
Cheers.
Thats Good to hear.. Will try out the new version soon
wow, perfect! the save function was really missing.
Hi guys, just wanted to say a big thank you for a great product. Just been playing with the ‘free’ version and working on a site compiling a list of 404s and manually exporting these into excel……now you have saved me a load of time…..and now I’m going to give you guys some money and purchase the licence :)
Excellent job guys. I’m a happy bunny.
Guys – some websites can’t handle how fast your awesome crawler hits their sites. I need an option to delay page crawls.
Can you add an option so that I can input a number of milliseconds to delay each paged crawled?
1000 would be 1 second, 2000 2 seconds, 100 = 1/10th of a second.
Thanks!
Hi Kevin,
We should have this feature available in our next version of the SEO spider due to be released extremely soon.
Thanks! :-)
Is this a joke ?
I’m using version 1.7
I’m very disappointed it’s too slow !!!
An average of 5 Urls /s
Xenu is a freeware and is far more faster… till 20 times faster !!!
I hope things will evolve quickly, for the time being it’s a waste of money and time !
Hi Dan,
We have speed control coming up in our next version. So you will be able to set it to go as fast as you like, ‘even faster than Xenus max setting’!
But as you’ll find out, it’s actually the server response that dictates speed more than the requests.
So your 5 uri/s is down to your site and server ;-) Crawl the BBC and notice the difference.
Thanks!
I am very happy that ScreamingFrog is constantly updating the software. And they are real improvements for me. It would be exciting if a UX Designer could make a few adjustments.