Search Results for:
*
Why is my regex extracting more than expected?
If you are using a regex like .* that contains a greedy quantifier you may end up matching more than you want. The solution to this is to use a regex like .*?. For example if you are trying to...
How does the Spider treat robots.txt?
The SEO Spider is robots.txt compliant. It checks robots.txt in the same way as Google. It will check robots.txt of the (sub) domain and follow directives specifically any for Googlebot, or for all user-agents. You are able to adjust the...
Why isn’t my Include/Exclude function working?
The Include and Exclude are case sensitive, so any functions need to match the URL exactly as it appears. Please read both guides for more information. Functions will be applied to URLs that have not yet been discovered by the...
Web Scraping & Custom Extraction
Scrape any data from the HTML of a page using CSS Path, XPath and regex to enhance a crawl, such as author name, comments, shares or more.