SEO Spider
How To Use N-Grams
Introduction
N-grams are a sequence of consecutive words (or numbers, and symbols) found in text. They enable you to see the words used on a page, their frequency and patterns for various NLP tasks.
Using n-grams has various limitations around topic modelling and semantic relevance, but they are a useful tool in SEO for simple text analysis, on-page alignment and even internal linking.
This tutorial shows how to use the Screaming Frog SEO Spider to analyse phrase frequency using the inbuilt n-grams analysis across pages of a crawl, or aggregated across groups of pages of a site.
Please note – An SEO Spider licence is required to perform the n-grams analysis below.
1) Enable ‘Store HTML / Store Rendered HTML’
First, enable both ‘Store HTML’ and ‘Store Rendered HTML’ within ‘Config > Spider > Extraction’.
This will mean the HTML (and rendered HTML if in JavaScript rendering mode) will be stored during the crawl, and used to enable the n-grams analysis.
2) Crawl the Website
Input the website address into the URL bar at the top, and click ‘Start’.
The SEO Spider will then start crawling the website to perform n-grams analysis.
3) View the N-grams Tab
N-grams can be viewed by clicking on a URL or group of URLs in the top window pane Internal tab, and then the lower N-grams tab.
By default ‘1-gram’ will be displayed, which are single word phrases found for the URL(s) selected. However, this can be adjusted up to ‘6-gram’ for six word phrases via the filter.
The example below shows ‘2-gram’ selected with the word cloud visualisation on the right-hand side for our how to find broken links tutorial.
N-grams are collected not just from the page, but links to the page.
The following columns are displayed in the N-grams tab –
- Body Text – The number of times the n-gram is within the body text of the page based upon the content area.
- Density – The percentage the n-gram appears in the body text against the total of all body text n-grams.
- Body Text (Unlinked) – The number of times the n-gram is unlinked (not contained within an <a href link) within the body text of the page based upon the content area.
- Headings – The number of times the n-gram is within heading elements on the page.
- Title – The number of times the n-gram is within the title element of the page.
- Inlinks Anchor Text – The number of times the n-gram is used within anchor text to the page.
- Inlinks Alt Text – The number of times the n-gram is used within image alt text that is linked to the page.
- Total – The total number of times the n-gram is used across body text, title and inlinks (anchor text and alt text).
Multiple Pages N-gram Analysis
Alongside viewing n-grams on a single page basis, you’re able to highlight multiple URLs to analyse n-grams across a wider set of pages.
This can be useful when analysing similar topics, or groups of pages.
If multiple pages are selected, then the right-hand ‘URLs’ tab can be selected to show which pages n-grams appear on.
Multiple n-grams can also be selected. In the example below, 26,045 2-grams from across the SEO Spider tutorials have been selected.
The right-hand side has updated to show 51,776 rows of URLs with 2-grams which can be used for analysis.
4) Exporting N-grams
Data can be exported in bulk using the ‘Export’ buttons at the top of the N-grams tab.
N-grams can be exported for a page, or multiple pages using the ‘Export’ button on the left-hand side – which does not include the URLs.
Alternatively, use the ‘Export’ tab on the right hand-side URLs tab to export URLs and selected n-grams in bulk.
In this example, the 51,776 rows of URLs with 2-grams will be exported.
5) Using N-grams for Text Analysis & On-page Alignment
N-grams help understand the words on a page and to a degree the context and relevance of a page for target key phrases and topics.
While Google has shifted way beyond simple keyword matching, having the words you want to rank on the page typically still helps in SEO.
Keyword density is a myth, so there isn’t a magic density level you should aim for to improve rankings. However, the occurrence, frequency and uniqueness of words are useful to analyse as an indication of basic relevancy (outside of semantic models).
In the example above for our how to find broken links tutorial, it makes sense that ‘broken links’ is the most frequently used bigram. It’s simple to see that the page is optimised with it in the body, title, heading and in inlinks anchor text to the page.
N-grams data can be used and combined with other data sources in various ways for insight, such as:
- Page alignment – Analysing relevance of a page to target keywords. The separate columns for body text, headings, titles, and inlinks help understand overall relevance, not just on-page.
- Keyword gaps – Matching n-grams against other keyword research data, such as Search Console data to see if they exist on the page, or if there’s an opportunity to include them.
- Cannibalisation – Compare the n-grams of pages that are cannibalising in the SERPs to understand their similarities.
- Competitive analysis – Crawl equivalent competitor pages alongside your own, and compare the differences in n-grams to identify opportunities. Perform n-gram IDF analysis across the corpus to understand what might be important.
They can also be used for uncovering internal linking opportunities, which brings us onto the next section.
6) Using N-grams to Identify Internal Linking Opportunities
N-grams provides an alternative to using custom search to find unlinked keywords for internal linking.
You can highlight a section of a website and filter for relevant keywords in ‘Body Text (Unlinked)’ in the URLs tab to identify link opportunities.
In the example above, our tutorial pages have been highlighted to search for the 2-gram ‘duplicate content’.
The right-hand side filter has been set to ‘Body Text (Unlinked)’ and the column of the same name shows the number of instances unlinked on different tutorial pages that we might want to link to our how to check for duplicate content guide.
The search function can be used to find variants of the exact phrase as well. For example, searching for ‘duplicate’ shows variants such as ‘duplicate pages’ and ‘near duplicates’, that we might want to consider for internal linking as well.
If you have a large list of keywords to search for internal linking opportunities, the search function can be used to search for them at once using regex.
Matches Regex can be selected in the search filter and keywords can be separated by pipes. The syntax can be pasted directly into the search filter to trigger, for example:
[N-gram] Matches Regex 'duplicate content|broken link'
‘Broken link’ in the above would also match against ‘broken links’, so you can match exact words using \bword\b. For example:
[N-gram] Matches Regex '\bduplicate content\b|\bbroken link\b'
This would match a particular word (‘word’ in this case), as \b matches word boundaries.
Support
This tutorial should help users analyse phrase frequency using n-gram analysis in the SEO Spider. We hope it acts as a guide on how to use the feature, but also inspiration of how it could be used in SEO. We hope to hear about other useful ways this feature can be used!
If you experience any issues, check out the following –
Alternatively please contact us via support and we can help.