Search Results for:
*
How To Crawl A Staging Website
Find out how to crawl a staging or development website, considering robots.txt, authentication, and the SEO Spider configuration.
How To Use List Mode
Upload a list of URLs using 'list mode', and control what other elements are crawled, such as external links, or images for laser-focused crawling.
What Is Link Score?
Find out more about our Link Score algorithm which calculates the relative importance of URLs based on the internal linking of a site.
Resolving Google Analytics / Google Search Console Connection Issues
If you've experienced a 'Failed to Connect To Google Analytics / Search Console' security message, this guide will help you debug and resolve the issue.
How To Crawl Large Websites
Crawl large websites by switching to database storage mode, increasing memory and by configuring the crawl to extract the data you need.
Robots.txt Testing In The SEO Spider
View URLs blocked by robots.txt, the disallow lines & use the custom robots.txt to check & validate a site's robots.txt thoroughly, and at scale.
Crawling Password Protected Websites
Crawl websites that require a login, using web forms authentication using our inbuilt Chrome browser.
How do I extract multiple matches of a regex?
If you want all the H1s from the following HTML: <html> <head> <title>2 h1s</title> </head> <body> <h1>h1-1</h1> <h1>h1-2</h1> </body> </html> Then we can use: <h1>(.*?)</h1>
Why is my regex extracting more than expected?
If you are using a regex like .* that contains a greedy quantifier you may end up matching more than you want. The solution to this is to use a regex like .*?. For example if you are trying to...
How does the Spider treat robots.txt?
The SEO Spider is robots.txt compliant. It checks robots.txt in the same way as Google. It will check robots.txt of the (sub) domain and follow directives specifically any for Googlebot, or for all user-agents. You are able to adjust the...