Search Results for:
*
Robots.txt
The robots.txt is a file that sits on the root of a domain, for example: https://www.screamingfrog.co.uk/robots.txt This provides crawling instructions to bots visiting the site, which they voluntarily follow. In this guide, we’ll explore why you should have a robots.txt,...
How To Crawl A Staging Website
Find out how to crawl a staging or development website, considering robots.txt, authentication, and the SEO Spider configuration.
How To Use List Mode
Upload a list of URLs using 'list mode', and control what other elements are crawled, such as external links, or images for laser-focused crawling.
Page Titles
Writing a good page title is an essential skillset for anyone in SEO, as they help both users and search engines understand the purpose of a page. In this guide we take you through the fundamentals, as well as more...
What Is Link Score?
Find out more about our Link Score algorithm which calculates the relative importance of URLs based on the internal linking of a site.
Resolving Google Analytics / Google Search Console Connection Issues
If you've experienced a 'Failed to Connect To Google Analytics / Search Console' security message, this guide will help you debug and resolve the issue.
How To Crawl Large Websites
Crawl large websites by switching to database storage mode, increasing memory and by configuring the crawl to extract the data you need.
Robots.txt Testing In The SEO Spider
View URLs blocked by robots.txt, the disallow lines & use the custom robots.txt to check & validate a site's robots.txt thoroughly, and at scale.
Crawling Password Protected Websites
Crawl websites that require a login, using web forms authentication using our inbuilt Chrome browser.
How do I extract multiple matches of a regex?
If you want all the H1s from the following HTML: <html> <head> <title>2 h1s</title> </head> <body> <h1>h1-1</h1> <h1>h1-2</h1> </body> </html> Then we can use: <h1>(.*?)</h1>