Screaming Frog https://www.screamingfrog.co.uk/ Thu, 14 Nov 2024 08:09:23 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 Screaming Frog SEO Spider Update – Version 21.0 https://www.screamingfrog.co.uk/seo-spider-21/ https://www.screamingfrog.co.uk/seo-spider-21/#comments Tue, 12 Nov 2024 09:07:05 +0000 https://www.screamingfrog.co.uk/?p=291781 We’re delighted to announce Screaming Frog SEO Spider version 21.0, codenamed internally as ‘towbar’. This update contains new features and improvements based upon user feedback and as ever, a little internal steer. So, let’s take a look at what’s new. 1) Direct AI API Integration In our version 20.0 release...

The post Screaming Frog SEO Spider Update – Version 21.0 appeared first on Screaming Frog.

]]>
We’re delighted to announce Screaming Frog SEO Spider version 21.0, codenamed internally as ‘towbar’.

This update contains new features and improvements based upon user feedback and as ever, a little internal steer.

So, let’s take a look at what’s new.


1) Direct AI API Integration

In our version 20.0 release we introduced the ability to connect to LLMs and query against crawl data via custom JavaScript snippets.

In this update, you’re now able to directly connect to OpenAI, Gemini and Ollama APIs and set up custom prompts with crawl data.

You can configure up to 100 custom AI prompts via ‘Config > API Access > AI’.

Direct AI Integration with OpenAI

You’re able to select the category of model, the AI model used, content type and data to be used for the prompt such as body text, HTML, or a custom extraction, as well as write your custom prompt.

The SEO Spider will auto-control the throttling of each model and data will appear in the new AI tab (and Internal tab, against your usual crawl data).

AI Tab results

In a similar way as custom JS snippets, this can allow you to create alt text at scale, understand the language of a page, detect inappropriate content, extract embeddings and more.

The ‘Add from Library’ function includes half a dozen prompts for inspiration, but you can add and customise your own.

OpenAI Add From Library

The benefits of using the direct integration over custom JS snippets are –

  • You can input your API key once for each AI platform, which will be used for all prompts.
  • You don’t need to edit any JavaScript code! You can just select requirements from dropdowns and enter your prompt into the relevant field.
  • JavaScript rendering mode isn’t required, data can be returned through any crawl mode.
  • The APIs are automatically throttled as per their requirements.

This new AI integration should make it even more efficient to create custom prompts when crawling. We hope users will utilise these new AI capabilities responsibly for genuine ‘value-add’ use cases.


2) Accessibility

You can now perform an accessibility audit in the SEO Spider using the open-source AXE accessibility rule set for automated accessibility validation from Deque.

This is what powers the accessibility best practices seen in Lighthouse and PageSpeed Insights. It should allow users to improve their websites to make them more inclusive, user friendly and accessible for people with disabilities.

Accessibility can be enabled via ‘Config > Spider > Extraction’ (under ‘Page Details’) and requires JavaScript rendering to be enabled to populate the new Accessibility tab.

Accessibility Config

The Accessibility tab details the number of accessibility violations at different levels of compliance based on the Web Content Accessibility Guidelines (WCAG) set by the W3C.

Accessibility Tab

An accessibility score for each page can also be collected by connecting to Lighthouse via PageSpeed Insights (‘Config > API Access > PSI’).

WCAG compliance levels build upon each other and start from WCAG 2.0 A to 2.0 AA, then 2.0 AAA before moving onto 2.1 AA and 2.2 AA. To reach the highest level of compliance (2.2 AA), all violations in previous versions must also be achieved.

The Accessibility tab includes filters by WCAG with over 90 rules within them to meet that level of compliance at a minimum.

Accessibility Tab filters in Overview tab

The right-hand Issues tab groups them by accessibility violation and priority, which is based upon the WCAG ‘impact’ level from Deque’s AXE rules and includes an issue description and further reading link.

Accessibility issues in the right hand Issues tab

The lower Accessibility Details tab includes granular information on each violation, the guidelines, impact and location on each page.

Accessibility Details tab

You can right-click on any of the violations on the right-hand side, to ‘Show Issue in Browser’ or ‘Show Issue In Rendered HTML’.

All the data including the location on the page can be exported via ‘Bulk Export > Accessibility > All Violations’, or the various WCAG levels.

Accessibility Bulk Exports

There’s also an aggregated report under the ‘Reports’ menu.


3) Email Notifications

You can now connect to your email account and send an email on crawl completion to colleagues, clients or yourself to pretend you have lots of friends.

This can be set up via ‘File > Settings > Notifications’ and adding a supported email account.

Email Notifications

You can select to ‘Email on Crawl Complete’ for every crawl to specific email address(es).

Crawl complete emails

So many friends.

Alternatively, you can send emails for specific scheduled crawls upon completion via the new ‘Notifications’ tab in the scheduled crawl task as well.

Email Notifications from scheduled crawls

The email sent confirms crawl completion and provides some top-level data from the crawl.

Email Notification Delivered

We may expand this functionality in the future to include additional data points and data exports.

Please read about notifications in our user guide.


4) Custom Search Bulk Upload

There’s a new ‘Bulk Add’ option in custom search, which allows you to quickly upload lots of custom search filters, instead of inputting them individually.

Bulk Upload Custom Search

If you’re using this feature to find unlinked keywords for internal linking, for example, you can quickly add up to 100 keywords to find on pages using ‘Page Text No Anchors’.

Custom search bulk upload filters

Please see our ‘How to Use Custom Search‘ tutorial for more.


Other Updates

Version 21.0 also includes a number of smaller updates and bug fixes.

  • Additional crawl statistics are now available via the arrows in the bottom right-hand corner of the app. Alongside URLs completed and remaining, you can view elapsed and estimated time remaining, as well as crawl start time date and time. This data is available via ‘Reports > Crawl Overview’ as well.
  • Custom Extraction has been updated to support not just XPath 1.0, but 2.0, 3.0 and 3.1.
  • Scheduling now has ‘Export’ and ‘Import’ options to help make moving scheduled crawl tasks less painful.
  • The Canonicals tab has two new issues for ‘Contains Fragment URL’ and ‘Invalid Attribute In Annotation’.
  • The Archive Website functionality now supports WARC format for web archiving. The WAR file can be exported and viewed in popular viewers.
  • You can now open database crawls directly via the CLI using the –load-crawl argument with the database ID for the crawl. The database ID can be collected in the UI by right-clicking in the ‘File > Crawls’ table and pasting it out, or viewed in the CLI using the cli –list-crawls argument.
  • There’s a new right click ‘Show Link In Browser’ and ‘Show Link in HTML’ option in Inlinks and Outlinks tab to make it more efficient to find specific links.

That’s everything for version 21.0!

Thanks to everyone for their continued support, feature requests and feedback. Please let us know if you experience any issues with this latest update via our support.

Small Update – Version 21.1 Released 14th November 2024

We have just released a small update to version 21.1 of the SEO Spider. This release is mainly bug fixes and small improvements from the latest major release –

  • Fixed issue with custom database locations not being picked up.
  • Fixed bug in OpenAI Tester.
  • Fixed a couple of crashes, including for users that had ‘auto connect’ selected for the old GA API, which hasn’t been available for sometime (and is now removed!).

The post Screaming Frog SEO Spider Update – Version 21.0 appeared first on Screaming Frog.

]]>
https://www.screamingfrog.co.uk/seo-spider-21/feed/ 11
Screaming Frog Crawling Clinic Returns to brightonSEO San Diego https://www.screamingfrog.co.uk/brightonseo-san-diego-2024/ https://www.screamingfrog.co.uk/brightonseo-san-diego-2024/#comments Wed, 23 Oct 2024 09:37:56 +0000 https://www.screamingfrog.co.uk/?p=289984 It’s that time of year again! After a successful visit to San Diego last year, the Screaming Frog Crawling Clinic are packing up their gear and heading across the pond for brightonSEO San Diego Edition once again. This time it’s hosted from the 18th to the 20th of November. You’ll...

The post Screaming Frog Crawling Clinic Returns to brightonSEO San Diego appeared first on Screaming Frog.

]]>
It’s that time of year again! After a successful visit to San Diego last year, the Screaming Frog Crawling Clinic are packing up their gear and heading across the pond for brightonSEO San Diego Edition once again. This time it’s hosted from the 18th to the 20th of November.

You’ll find us at stand 19, right at the heart of the action as you enter the venue. Be sure to pop by to chat all things SEO Spider (and Log File Analysis if you really want to nerd out), as well as pick up some cool swag. You’ll find us at the stand spot below:

We’ll happily run through some of our latest releases, diagnose issues, listen you your feature requests or talk to you about our award-winning agency services.


Top Talks

We’ve got our eyes on some must-see talks at brightonSEO, so if you’re wondering which sessions to prioritise, here are our top picks:

Felipe Bazon: Topical Authority in the Age of Entity SEO

Wednesday 20th, 9:15 am

If you’re looking to gain an edge on topical authority and entity SEO, don’t miss this session with Felipe Bazon. We had the pleasure of catching him at a similar event in Eindhoven earlier this year, and his insights were truly eye-opening. It’s bound to be one of the highlights of the week!

Ross Hudgens: Data-Driven Lessons from 12+ Years in Content-Led SEO

Tuesday 19th, 9:15 am

For any content-focused SEOs, Ross Hudgens‘ talk is one for your calendar. As long-time fans of Ross’ work with Siege Media, we’ve been particularly inspired by their approach to keyword opposition to benefit (KOB) analysis. In fact, we applied this method to our own work, with great results — check out the full rundown of the KOB strategy here. If you’re into content marketing and SEO, this session is a must.

Pre-Event Fun: The Boardwalk Bash

Before the main event kicks off, make sure you head to the brightonSEO Boardwalk Bash on the evening of the 18th for some networking, free beers, and beachside vibes. We’re thrilled to be sponsoring this event, and you’ll find our signature Screaming Frog beer mats scattered around – so grab some beers and mats on us! Find out more about the event here.

Whether it’s your first brightonSEO (UK or USA) or you’re a regular attendee, we’re excited to see you there. Swing by Stand 19, grab some merch, and chat with the team about how we can help you level up your SEO game.

See you in San Diego!

The post Screaming Frog Crawling Clinic Returns to brightonSEO San Diego appeared first on Screaming Frog.

]]>
https://www.screamingfrog.co.uk/brightonseo-san-diego-2024/feed/ 2
Create Custom Heatmap Audits With the SEO Spider https://www.screamingfrog.co.uk/heatmap-audits-with-screaming-frog/ https://www.screamingfrog.co.uk/heatmap-audits-with-screaming-frog/#comments Mon, 21 Oct 2024 08:37:06 +0000 https://www.screamingfrog.co.uk/?p=289452 Ten years ago, I was searching for a tool that would help me determine what was wrong with a website I was working on. While scrolling through posts on a forum, I noticed that people had been praising a tool that had a catchy name, as opposed to all the...

The post Create Custom Heatmap Audits With the SEO Spider appeared first on Screaming Frog.

]]>
Ten years ago, I was searching for a tool that would help me determine what was wrong with a website I was working on.

While scrolling through posts on a forum, I noticed that people had been praising a tool that had a catchy name, as opposed to all the other “SEO-something-something” tools. So I gave it a shot.

This article is a guest contribution from Miloš Gizdovski, Operations Manager at Lexia.

It’s October 2024 now, I’m still using this tool, and the name is quite familiar in the SEO community — Screaming Frog SEO Spider Tool.

Looking back, I can’t imagine doing any kind of technical SEO audit without it. None of my colleagues at Lexia marketing agency can either.

We’re happy when that “New version available” message pops up as it feels like a movie trailer we’ve been looking forward to for months has finally been released.

A few years ago, one of our new clients had hundreds of blog posts on their website. After the usual procedure and initial audits, I wanted to create something that will help me determine which of those posts were actually worthy focusing on.

I saw a map of users’ scores for each episode of Game of Thrones, so I thought it would be cool to use something similar for the blog posts. This would show each month’s data in terms of organic users, sessions, bounce rates, conversion, clicks, impressions, and average positions.

After several tries, I managed to create a report that could be applied to any website.

These reports have a name – Lexia Heatmaps, or just Heatmap reports if you like.

Lexia Heatmaps show a trend of specific parameters over a period of several months or even a year. However, instead of just one page, the reports show the trend for all pages at the same time. This way, nice looking reports that can reveal a lot of possibilities or threats are created.

The following sections of this article will describe how to create a heatmap report for any group of similar pages.

The blog section will be used as an example, but this can be applied to, for example, products or collections of products as well.

A quick note — it will require some familiarity with Excel functions to create the heatmaps, as they are a bit more advanced than the SEO reports that you can export directly from Screaming Frog SEO Spider. Don’t worry, I will show you how to do it step by step!


First Step – Collect the Blog Posts

Obviously, the first step would be to export the list of blog posts in order to create the heatmap.

There are several ways to do this:

  • Manually collecting the URLs
  • Creating a custom extraction in the SEO Spider tool
  • Exporting the links from the sitemap

The first option is the most time consuming. Not a problem for a website with 10 blog posts, but try doing this with those counting over a hundred.

The second option is using the SEO Spider tool. You can create a custom extraction by picking the specific elements, for example:

  • /blog/ path in the URL
  • Author’s section
  • Publishing date

However, I find that the third option, which is using the sitemap, is the most suitable one.

If you’re lucky, there will be a “Post” sitemap, where all the blog posts are hosted. This comes in handy in cases where the blog posts don’t have /blog/ in the URL.


Second Step – Prepare the SEO Spider Tool

Now that you have a list, you can proceed with gathering the data. There are two things you need to do before running the crawl.

The first one is adjusting the crawl mode, and the second one is setting up the API.

Crawl options can be found in Mode settings, and here you have to select the List.

Look for the API under the Configuration options. For the purpose of this article, I will choose Google Analytics 4, which is one of the several options available for selection.

After the API Access window opens, sign in with a Google account that has access to Analytics.

The main tab of the API Access will now show this:

Here is where you can choose the account — pick the one you need and be sure to select the right property and data stream. In my case, the website has only one property and data stream, so I will keep it as All Data Streams.

The next step is setting up the Date Range, which is the tab next to the Account Information:

We have to run a new scan for each month, but remember — this only makes sense if the month is over. So, I’ll start with September 2024, as I have all the data for September ready.

The next tab is Metrics.

Since I already have the Sessions included by default, I will leave everything as is.

However, if I wanted to create a heatmap of the Average session duration, I would need to check this parameter in the Session list, and it would be included in the results.

Lastly, there is one more tab that we need to adjust – Filters.

Since you want to check the organic performance of the blog posts, here is what you need to select:

I didn’t include Organic Social because, for this experiment, I’m only interested in users that came from the organic channels.

If you want to check the users coming organically from social media, or even the Direct traffic users, you can do it as well! Just select the channel and your heatmap will show the trend of direct traffic MoM or YoY.


Third Step – The Results

We are now ready to start the crawling process and get organic sessions data for the month of September.

I already mentioned that you have several options for uploading the list of URLs using the List mode:

In my case, I just pasted the URL of the sitemap.

This is the best option for situations where a lot of blog posts have been published since the last time you updated the heatmap, and here is why:

Whenever you publish a blog post, it will end up in the sitemap, so search engine bots could discover it. By adding the sitemap URL to the SEO Spider tool, it will read all URLs.

If you simply paste the previous month’s list, you will miss the opportunity to track the performance of the new blog posts that have been published since. Most of these new blog posts will have just a few sessions but, after all, they should be tracked from the beginning.

After the crawl is over, head to the Analytics section and check the All reports:

Slide the results to the right and you will see the scores from GA4. The GA4 Sessions is the column we are looking for:

All the numbers in that column are the GA4 organic sessions for each page individually.

The process of gathering September data is done, so now we have to add it to the heatmap list.


Fourth Step – Data Entry

We have two scenarios here:

  • You are building the heatmap ground zero.
  • You already have the heatmap done, and you just need to this month’s data

Both of these cases follow the same process, but of course, you’ll need more time to create everything from scratch.

Scenario #1 – Creating a New Heatmap Report From the Start

In our example, the reports are created in Google Spreadsheets. The blank report looks like this:

There are two main sections here.

On the left, there is the URL map section, where the list of all blog posts should be added.
On the right, you can see the months and Organic Sessions 2024 in the header. This is where the results from the SEO Spider tool will go.

Since we started with an empty document, I will just copy and paste the results from the SEO Spider tool:

This process should be repeated for all months, meaning that you will need to adjust the Date Range and include only August.

In order to do this, you first need to Clear the crawl and then go the API section and adjust the Date Range:

The next part is very important — the new crawl will re-order the URLs, so don’t simply copy and paste the GA4 Sessions results.

Go the main Export option and export each monthly crawl instead:

The order of the original list will stay the same in the exported file (.xlsx), so you just have to get to the GA4 Sessions column, which was the BQ column for me.

Once you have everything sorted and all months added to the reports, it’s time for the last step – adding the colors.

In Google Spreadsheets, you can easily do this by selecting the range of cells and applying the Conditional formatting. The process is almost the same as in Excel.

Go to the Conditional formatting option:

On the right side, you’ll see a new window with various options. It will be Single color by default.

Select the range by clicking on the small windows icon:

I will select everything between columns C and N. Now we have to format the rules.

The first rule will be that if the cell is equal to 0, the cell color should be light red:

Now, while you’re still in the editor, click on the + Add another rule, just below the Cancel / Done button. It will open another formatting option within the same range.

You now have to apply another rule, but think about this one. Some websites will have a lot of traffic, so adding values between 1-10 would be a waste of time (and color).

In such cases, even values between 1 and 100 could represent the low-traffic blog posts.

In this example, I will use lower values to create the heatmap report:

After all the “between” rules are defined, the last one could simply be “greater than”:

The heatmap is now finalized.

A few tips to keep in mind while creating the format rules:

  1. Tip #1 Decide which color should show zero traffic and which one should show all positive values. I pick light red for zeros and variants of green for positive numbers.
  2. Tip #2 Define the “between” values, but don’t forget about the end values as they shouldn’t overlap. For instance, if you have a rule from 1-10, the next one should start with 11.
  3. Tip #3 Use gradients of the same color. This will help you determine the trend more easily.
  4. Tip #4 The last color variants will be darker, so it would be good to change the color of the text from dark to white. This will help you see the values properly.
  5. Tip #5 You don’t have a lot of color options by default. Choose the between values wisely.

Scenario #2 – Updating an Existing Heatmap Report

Let’s imagine you have to add previous month’s values to the existing list. However, you’ve published some articles since the last Heatmap report update, so you have to use the Sitemap upload option.

The order of the URLs will not match the one you already have in the Heatmap report.

There are several options to match the URLs, depending on how many new articles you have published.

Option 1

If you haven’t published many new articles, you can just manually add them to the Heatmap report list.

Copy your updated Heatmap list to an empty Excel file and save it. After that, set up the SEO Spider tool’s API and change the mode to List.

Instead of choosing the Download XML Sitemap option, choose From a file. Upload your Excel file, run the crawl, and hit the main Export button.

Find the GA4 Sessions option and copy data to the Heatmap report.

Option 2

If you have published a lot of new blog posts, it will take some time to manually add them to the list.

You can use Excel to identify the new articles instead.

Set up the API for the new month and run the usual crawl from the sitemap (mode List, upload option Download XML Sitemap).

Then, paste the URL list to one column and GA4 results to the other column of an Excel file, let’s say column A and B.

In the fourth column (D) just paste the list of the URLs from your Heatmap report. I like to keep one column empty (column C), just to have better visualization when the real URLs are there.

It will look like this:

The columns will have different URL orders — this will happen when you upload the list from the sitemap and your Heatmap report.

These two lists will not have the same number of rows because of the newly published articles. The Sitemap column will have more rows in this case.

So, we need to match two URL lists that have different order and assign GA4 Session values to the Heatmap list using a VLOOKUP function.

Go to the column E, add a new header (Heatmap GA4 results) in first row and create this function:

=VLOOKUP(D:D,A:B,2,FALSE)

meaning:

Match the entire content of A and D columns, and then assign values from column B to the corresponding cells in column D.

Apply the function to all cells.

Here is the result, you can see how the GA4 Sessions for the URL 2 (Sitemap list) is now present in the Heatmap list as well:

The new blog posts will be present somewhere in column A, but we want to add them to column D as well.

We can easily find them if we check the duplicate values of the columns A and D, and then simply look for those that were not marked as duplicates.

First, select these two columns and search for duplicates using the Conditional Formatting > Highlight Cells Rules > Duplicate Values option:

The duplicates in both columns will have red cells, while some of the cells in column A will remain transparent. These are your new blog posts, still not included in the Heatmap report.

The last step is to collect them all, which can be done by applying a filter:

  1. Select column A
  2. Add the Filter (upper right corner, Sort & Filter option)
  3. Filter by Color
  4. No Fill
  5. Expand the selection

Here are the URLs that were not found in column D:

You can now copy them to column D and run an additional Duplicate check, if you want to make sure that all URLs are present in the Heatmap column (D).

Your heatmap list is now ready, so just add the values from the column E to your Google spreadsheet, and don’t forget to expand the list of spreadsheet’s URLs by adding the new blog posts from Excel’s column D.

The process is repeated each month. Don’t worry, you will become quicker over time, so it will probably take you around 20-30 minutes to add GA4 values for the new month after you get familiar with the process.


How to Use a Lexia Heatmap Report?

The heatmap reports are not a new thing in marketing. A lot of us have used various tools to determine where people click on our website, so we could optimize the pages for more conversions.

This heatmap is a bit different, because it’s based on a larger group of pages, values, and parameters.

It shows a trend of events that helps us determine winning and losing pages, for instance. If we see that certain blog posts are losing traffic over time, we can ask ourselves – why?

Maybe they need more internal links, a content rewrite, updated images and videos, meta tags update, etc.

On the other hand, we see that some pages are doing great. How to use them? Well, they can become a source of internal links to other pages that need a “push”. Or, we can promote those pages on social media and newsletters, and get even more positive results.

How about products?

We can use the Lexia Heatmap report there as well. If there are products that bring traffic, but no sales, maybe we can add a discount there. How about those that have low traffic, but sales are great? Include them into quality blog posts, prepare newsletters, and promote them on social media.

Combining several heatmap reports will give you an even better perspective. If you include GSC and Ahrefs, for example, you can see track clicks, impressions, positions, backlinks, and more.

Options are endless.

What do you think about Lexia Heatmaps? Let me know on LinkedIn, or contact me via our Lexia website.

The post Create Custom Heatmap Audits With the SEO Spider appeared first on Screaming Frog.

]]>
https://www.screamingfrog.co.uk/heatmap-audits-with-screaming-frog/feed/ 2
The brightonSEO Crawling Clinic October ’24 https://www.screamingfrog.co.uk/brighton-seo-october-24/ https://www.screamingfrog.co.uk/brighton-seo-october-24/#respond Fri, 27 Sep 2024 09:09:09 +0000 https://www.screamingfrog.co.uk/?p=287344 The biannual brightonSEO events have been marked in our calendar for well over 12 years now, and next week’s is no exception. As always, you can find us in our usual spot (stand 34, right hand side of the exhibition hall as you walk in): Come and meet the team...

The post The brightonSEO Crawling Clinic October ’24 appeared first on Screaming Frog.

]]>
The biannual brightonSEO events have been marked in our calendar for well over 12 years now, and next week’s is no exception. As always, you can find us in our usual spot (stand 34, right hand side of the exhibition hall as you walk in):

Come and meet the team and discuss any issues you’re experiencing, our exciting version 20 features, things you’d like to see added to the SEO Spider, and more. We’re also offering a full 2-week trial licence, and the team are more than happy to give you a primer on how best to use it.

If you’re after agency services, we’re also one of the most decorated agencies in the UK, and one of our team would happily talk you through our award-winning offering.


GreenSEO Meet-up

Heading down early? Our SEO & Data Manager, Aaron, is speaking at the GreenSEO Meet-Up on Wednesday 2nd October, which we’ve also sponsored. If you’re interested in how SEO practices can contribute to reducing the environmental impact of websites, you’ll definitely want to attend.


Merch

Lastly, we’ll be dishing out our highly-coveted merch, including new beanies in a range of colours, ready for the coming winter months…!

We look forward to seeing you all next week!

The post The brightonSEO Crawling Clinic October ’24 appeared first on Screaming Frog.

]]>
https://www.screamingfrog.co.uk/brighton-seo-october-24/feed/ 0
Using the Screaming Frog SEO Spider and OpenAI Embeddings to Map Related Pages at Scale https://www.screamingfrog.co.uk/map-related-pages-at-scale/ https://www.screamingfrog.co.uk/map-related-pages-at-scale/#comments Mon, 23 Sep 2024 08:53:37 +0000 https://www.screamingfrog.co.uk/?p=285915 Since Screaming Frog SEO Spider version 20.0 was released, SEOs can connect Screaming Frog and OpenAI for several use cases, including extracting embeddings from URLs. Using embeddings is a powerful way to map URLs at scale at a high speed and low cost. In this blog post, we’ll explain step...

The post Using the Screaming Frog SEO Spider and OpenAI Embeddings to Map Related Pages at Scale appeared first on Screaming Frog.

]]>
Since Screaming Frog SEO Spider version 20.0 was released, SEOs can connect Screaming Frog and OpenAI for several use cases, including extracting embeddings from URLs.

Using embeddings is a powerful way to map URLs at scale at a high speed and low cost. In this blog post, we’ll explain step by step what they are and how to map them using Screaming Frog, ChatGPT (OpenAI API) and Google Colab. This post is a more complete version of my original post gathering more use cases and feedback from SEOs who tried it.

After your crawl, all you need to do is upload a sheet and you’ll receive back another one, with your source URL and related ones in another spreadsheet. It’s that easy!

This article is a guest contribution from Gus Pelogia, Senior SEO Product Manager at Indeed.


Use Cases

Before we dive into the how, let’s explain the why. Mapping pages at scale has several use cases, such as:

  • Related pages, if you’ve a section on your website where you list related articles or suggested reads on the same topic
  • Internal linking beyond matching anchor text, your links will have a better context because the page topic is related
  • Page tagging or clustering for cases where you want to create link clusters or simply understand performance per topic, not per single page
  • Keyword relevance, such as written on the iPullRank blog, where they explain a method to find the ideal page to rank for a keyword based on keyword and page content

What Are Embeddings?

Let’s get it straight from the horse’s mouth. According to Google on their Machine Learning (ML) crash course:

Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words. Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space. An embedding can be learned and reused across models.

In my own SEO words: embeddings are unique numbers attributed to words on a page.

If this is still not clear, don’t get caught up on the concept. You can still find similar pages without knowing the theory.


What Is Cosine Similarity?

So far, you’ve thousands of embeddings mapped. Each URL has hundreds of these large numbers separated by a comma. The next step is to understand cosine similarity. As read in this iPullRank article, cosine similarity is “The measure of relevance is the function of distance between embeddings”.

In my own SEO words: with embeddings, you transformed pages into numbers. With cosine similarity, you’re finding how topically close these numbers/words/pages are. Using the Google Colab script (more on it later) you can choose how many similar pages you want to put next to each other.

You’re matching the whole page content, not just the title or a small section, so the proximity is a lot more accurate.


Using Screaming Frog + OpenAI to Extract Embeddings

Here’s where things start getting more hands-on. First of all, you need to get an OpenAI API and add some credit to it. I’ve extracted embeddings from 50.000 URLs with less than $5 USD, so it’s not expensive at all.

Open Screaming Frog and turn JavaScript rendering on. From the menu, go to Configuration > Crawl Config > Rendering > JavaScript.

Then, head to Configuration > Custom > Custom JavaScript:

Lastly, select Add from Library > (ChatGPT) Extract embeddings […] > Click on “JS” to open the code and add your OpenAI key.

Now you can run the crawl as usual and embeddings will be collected. If you want to save a bit of time, untick everything on Configuration > Crawl and Extraction since you won’t look at internal links, page titles or other content or technical aspects of a website.


Using LLMs to Create a Python Script

After having your crawl done, it’s time to use ChatGPT again to create the code for your tool. Ask something along the lines of: “Give me a Python code that allows me to map [5] related pages using cosine similarity. I’ll upload a spreadsheet with URLs + Embeddings on this tool. The code will be placed on Google Colab”.

You can try it yourself or use my existing Related Pages Script to upload your sheet directly, reverse engineer the prompt or make improvements. The tool will ask you to upload your csv file (the export from Custom JavaScript created by Screaming Frog). The sheet should have two headers:

  • URL
  • Embeddings

Once it processes the data, it’ll automatically download another csv with Page Source and Related Pages columns.

As with anything AI related, you’ll still want to manually review everything before you make any drastic changes.


Common Issues

While this is an easy to use tool, some problems might come up. Here are the ones I’ve seen so far:

  • Rename the headers in your Screaming Frog export to “URL” and “Embeddings”
  • CSV file has URLs without embeddings, such as crawled images or 404 pages, which don’t generate embeddings. Make sure every column has a valid URL and the embedding is visible
  • The crawl has a high speed and you started getting errors from OpenAI. Decrease crawling speed, go grab a coffee and let it do its work
  • OpenAI has many models and some page crawls might fail due to the number of output tokens requested. Generate your API using gpt-4o mini (up to 16.384 tokens) twice as much as gpt-4 (8.192 tokens). If some pages still fail, remove them from the crawl

The post Using the Screaming Frog SEO Spider and OpenAI Embeddings to Map Related Pages at Scale appeared first on Screaming Frog.

]]>
https://www.screamingfrog.co.uk/map-related-pages-at-scale/feed/ 2
Digital PR for Charities – 7 Experts Reveal Their Favourite Campaigns https://www.screamingfrog.co.uk/digital-pr-for-charities/ https://www.screamingfrog.co.uk/digital-pr-for-charities/#comments Mon, 16 Sep 2024 09:22:21 +0000 https://www.screamingfrog.co.uk/?p=284953 The popularity of Digital PR has grown heavily during the last few years, as brands look at other impactful ways to grow their coverage on the web. Many brands have been busy utilising the power of digital PR to not only generate valuable links that point back to key pages...

The post Digital PR for Charities – 7 Experts Reveal Their Favourite Campaigns appeared first on Screaming Frog.

]]>
The popularity of Digital PR has grown heavily during the last few years, as brands look at other impactful ways to grow their coverage on the web.

Many brands have been busy utilising the power of digital PR to not only generate valuable links that point back to key pages on their website, but they’ve also utilised it to grow brand awareness with key target audiences.

One sector in particular which can really stand to benefit from and utilise the power of digital PR is that of the non-profit charity sector.

In this piece I wanted to explore how, and perhaps more importantly why digital PR can be utilised by charities, as well as asking other experienced consultants about their favourite campaigns.

This article is a guest contribution from SEO Consultant Matt Tutt.


What is Digital PR?

Digital PR is a marketing strategy that helps a brand enhance its online visibility. It typically involves a campaign led by a marketing consultant or agency, designed to showcase the brand in a positive and memorable way.

Most successful digital PR campaigns will result in lots of media coverage, including news stories, strong social media engagement, and usually links that will point back to the brand’s website – which can help to boost their SEO.

While the primary goal of digital PR is to secure coverage where the target audience is most active, the ultimate objective is to drive revenue – either through increased brand recognition or direct online sales, depending on the business. For charities this goal might not be tied to growing revenue, but instead to growing awareness of the causes they’re championing.

Some companies also invest in digital PR specifically to boost their SEO, leveraging the links generated through these campaigns. In industries where traditional link-building methods (like content creation and outreach) are challenging, digital PR offers an effective alternative.


How Can Charities Benefit From Digital PR? Will It Work for All Charities?

Nonprofits often have different objectives than traditional businesses. Their purpose might include raising awareness or educating the public about a particular issue.

For example, charities in the health sector may focus on spreading awareness about a specific health problem and educating the public. And they will also likely aim to generate donations to fund research and sustain their operations.

So for a charity, the overall objective behind investing in digital PR might vary a lot in comparison to a traditional business, which would have likely been solely revenue focused.

Understanding the main objective of the nonprofit is therefore key before you go down the avenue of exploring digital PR as a worthwhile marketing opportunity.

It’s worth noting that for the bigger, established charities and nonprofits, it’s unlikely that they would carry out a digital PR campaign with the main goal of improving their SEO performance – for them it will likely be to raise further awareness of their cause, and to generate more online support.


The Best Charity Digital PR Campaigns – Favourites Chosen By Other PR Consultants

I reached out to several experienced digital PR consultants to find out what their favourite charity PR campaigns were – and have listed these below.

The Dangers of E-Bike Batteries by Electrical Safety First – via Jo O’Reilly

I think Digital PR can sometimes be seen as the less worthwhile cousin to Traditional PR, so when I see really good Digital PR that can compete with Traditional PR in terms of message, narrative, and execution, I am always super impressed.

One of the best examples recently is a campaign by Electrical Safety First around the dangers of e-bike batteries in the UK. Not only have they landed coverage and links galore, but they have also handled an incredibly serious story with the gravity it deserves, working with case studies to tell tragic stories sensitively and help to push for a change in policy to ensure that e-bike battery fires don’t impact more families.

Shared by Freelance Digital PR specialist Jo O’Reilly


Population by Pixel by WWF Japan – via Hannah Smith

WWF Japan – Population by Pixel is a print campaign that dates back to 2008. The agency (HAKUHODO C&D) used pixelation to represent the number of animals left for a range of endangered species – each pixel represents an animal, so the more endangered a species is, the more pixelated the image appears. I love it because I think it’s a really creative, accessible, and impactful use of data visualisation. Numbers are notoriously tricky for people to get their heads around, but this campaign makes the intangible, tangible. I think it’s brilliant.

My advice to charities would be much the same as the advice I’d give to any client – you can’t control what a journalist might write, or indeed how they decide to frame your organisation within the context of any given story. As such I’d advise giving careful consideration to what you put out there, and the extent to which it honours your overall vision and mission.

I’d also mention being mindful of which organisations (commercial or otherwise), and which individuals you choose to collaborate or partner with as a charity – do these people and/or organisations really share your values, or are they using you to try to fix their own tarnished reputations? Does the partnership feel congruent? Does it make sense? Consider the potential risks carefully.

Shared by Creative Content Consultant Hannah Smith


“Charities should maintain authenticity by sharing genuine stories and focusing on impact rather than overly promotional content” – Britt Klontz

Charities who plan to run digital PR campaigns should prioritise maintaining authenticity by sharing genuine stories and focusing on impact rather than overly promotional content.

Building a community through meaningful interactions and providing value beyond fundraising appeals is very important when it comes to fostering long-term supporter engagement and loyalty. Be extra mindful of how beneficiaries are conveyed and make sure that the stories and testimonials shared are accurate and verifiable to avoid accusations of misleading information. Content that could be seen as exploitative or disrespectful should be avoided.

Transparency is also crucial, and they need to prioritise clear communication on how donations are used to build trust.

Don’t overlook all digital channels. Make sure you’re producing content tailored to your website, social channels, and email newsletters. This will ensure you’re reaching a wide audience and increase your chances of encouraging support from your community.

Shared by Freelance Digital PR Consultant & Publicist Britt Klontz


Save Our Species by IUCN & Lacoste via Alex Cassidy

In 2018 wildlife charity IUCN Save Our Species partnered with Lacoste to raise awareness for endangered animals by creating a set of limited edition polos.The biggest news hook for this was that only one polo was made for each remaining animal that it represents. At the time, only 67 Javan Rhinos were left in the wild (the good news is there are now 76!), so only 67 Javan Rhino polos were made. Naturally, all the polos sold out in a day, though you can still get some on eBay for hundreds of pounds. The marriage of scarcity of product and scarcity of animals is a perfect example of a charity raising awareness of a worthy cause by piggybacking on a better known brand, enabling both to get a message across on sustainability and environment.

Distinctly work pro bono with a chosen charity every year on their organic and paid strategy, which includes Digital PR. The biggest thing we suggest is to not ignore the linkbuilding element of Digital PR. More often than not we find there is low hanging fruit in existing mentions from local chapters of charities in local press, missing links on databases, and lots of opportunities with councils and universities. Sometimes the tendency is to think big from the outset, when a lot more can be done with much smaller opportunities done thoroughly and at scale.

Shared by Alex Cassidy, Head of Digital PR at Distinctly


He’s Coming Home by Women’s Aid via Amy Irvine

One PR campaign that comes to mind is the Women’s Aid ‘He’s Coming Home’ campaign. This campaign was launched during the World Cup in 2022 and the campaign revealed shocking statistics, including the fact that during major sporting events, domestic abuse rises by up to 38%. I loved this campaign as it was so simple, by changing just one word in the UK’s most famous football chant, the campaign highlighted the darker side of major football tournaments. This multi-channel campaign went viral, appeared on billboards, was featured in a one-shot film, posted across social media and also resulted in widespread coverage by news outlets. Traffic increased on the Women’s Aid website by 78.3%.

The latest instalment of this campaign was released ahead of this summer’s Euros tournament. Research by Lancaster University showed cases of domestic abuse increased by 38% when England lost a match and 26% when they won or drew. This year the campaign featured classic football scarves with the slogans “No More Years of Hurt”, “He’s Coming Home” and “England Til I Die” to highlight the domestic abuse emergency. Every time England played a game during the Euros, the campaign would be shared across social media, raising crucial awareness so that survivors know where to turn if they need support.

The main goal of this campaign was raising awareness of domestic abuse, and it wasn’t just launched as a digital PR campaign, it was launched across different channels, including OOH, PR and social. However, I do think Women’s Aid missed a trick when it came to digital PR. If they had a dedicated landing page on site with the statistics, video content, and images, they could have earned links directly to this page.

Charities should definitely use digital PR as part of their wider marketing strategy. Through digital PR, charities can increase their organic visibility and brand awareness, build trust and authority, engage with their target audience and increase their reputation.

When charities are getting involved in digital PR, the most important factor they need to be aware of is relevancy. They need to ensure that their ideas are relevant to them as a brand, as this is the biggest level for digital PR success. At Digitaloft, we talk a lot about something called the digital PR sweet spot, and this sits between the content you want to create, which is typically very brand-focused, and the content journalists want to publish.

When launching digital PR activity, it’s also really important to make sure you get your tone of voice right throughout your campaigns and reactives – think about who you are targeting and how the campaign will resonate with them. Charities often shine a light on sensitive topics so it’s important you get the messaging and tone right.

If you are launching a data-led campaign as part of your digital PR strategy, it’s also really important to have a robust methodology that clearly outlines how you carried out the research and where you got the data from. It’s really important to ensure your campaign is ethical and your data is accurate and reliable.

If you are planning on partnering with a digital PR agency, then you really want this agency to be an extension of your team and really understand your overarching messaging as well as your wider goals. Working with an agency is a great way to get creative with your ideas that will gain traction across multiple channels! A digital PR agency can also ensure that your existing content works harder and starts earning relevant links and coverage.

Shared by Digitaloft Digital PR Director Amy Irvine


The Last Photo by CALM via Lou Ali

Charities are all about raising awareness and funds for good causes, so it’s really important that their Digital PR Campaigns find the balance of being sensitive to the cause, whilst drawing people in to deeply connect with the issue at hand.

A campaign that really stands out for me, in terms of finding that balance, is “The Last Photo” by Campaign Against Living Miserably (CALM); an organisation looking to support those battling depression and suicidal thoughts, as well as those that are close to them.

They created a short and simple video showing the final video footage of people that have ended their lives to highlight that “suicide doesn’t always look suicidal”. Deeply moving and thought-provoking, this campaign makes its point beautifully, especially with the addition of the haunting song choice. The whole piece is perfectly executed and absolutely serves a purpose by not just stirring emotions, but also linking to articles offering practical advice for people worried about someone in their lives to help them lean into what could be a challenging conversation.

This campaign is certain to have made a lot of people reconsider their perspective on what depression and death by suicide looks like, and I like to think it sparked conversations that saved and improved lives – which is better than any KPI I’ve ever worked with.

Relevancy is probably more important for Charities doing Digital PR than most other industries. Given their often limited budgets, it’s vital that all messaging leaves the audience clear, without question, what it is you stand for, who you want to help, how you want to help them and what your values are. So, all campaigns need to be not just newsworthy, but also tightly aligned to brand – don’t ever veer far from the core issue you’re representing.

Shared by Head of Digital PR at Honcho Lou Ali


The Anthony Nolan Supporter Awards via Georgina Radford

The Anthony Nolan Supporter Awards (ANSAs) is first and foremost a celebration of the incredible contributions of various individuals to the charity, serving as a way to give back and honour their dedication. However, the ANSAs also prove each year to be a fantastic source of media coverage, in turn boosting brand awareness and building SEO value through organic backlinks.

From a PR perspective, there are numerous reasons why this campaign works well; not only is it a heartwarming human-interest story that resonates with the general public, but the broad appeal of the topic means there are also diverse coverage opportunities, from national publications to regionals and business trade sites.

In many ways, there is no difference between conducting digital PR for charities; PRs will still be working to improve brand awareness and engagement, perfect their messaging and secure backlinks to improve SEO, just as one might with any other business.

However, the crucial difference is intention. These brands aren’t trying to sell a product or service, or at least not in the traditional sense. Instead, they are driven by a mission to create social impact, raise awareness for a cause and mobilise support and resources, usually in the form of donations.

It’s this intention that shapes the strategy and execution of digital PR campaigns. Charities need a clear vision of their campaign goals and the KPIs that will define success, as traditional PR metrics may not align with the campaign’s intentions.

Charities should also keep in mind that they are in the coveted position of driving real-world impact through their PR efforts. Purpose-driven messaging that strikes the right tone with the public is therefore crucial. This messaging should be authentic, compelling and emotionally resonant to communicate the importance of their cause, build trust and ultimately inspire action among supporters.

Shared by Screaming Frog Digital PR Manager Georgina Radford


Final Thoughts – And What Are Your Favourite Charity Campaigns?

Hopefully by now you will have seen the incredible power and value that digital PR campaigns, if done in the right way, could have on your charity or business.

If your charity, NGO or business is interested in exploring the opportunities that digital PR can provide, I’ve listed a few general useful resources below to carry on with your journey – note that these are general guides to digital PR and aren’t necessarily charity-specific:

The post Digital PR for Charities – 7 Experts Reveal Their Favourite Campaigns appeared first on Screaming Frog.

]]>
https://www.screamingfrog.co.uk/digital-pr-for-charities/feed/ 1
Planning for Black Friday – Getting Your Paid Campaigns Ready For The Big Day https://www.screamingfrog.co.uk/planning-for-black-friday/ https://www.screamingfrog.co.uk/planning-for-black-friday/#comments Fri, 06 Sep 2024 08:36:43 +0000 https://www.screamingfrog.co.uk/?p=284572 As the old saying goes, “…in this world nothing can be said to be certain, except death and taxes.” well now you can add “ as well as a huge spike in search interest surrounding Black Friday from the end of August onwards”. Admittedly, it doesn’t quite have the same...

The post Planning for Black Friday – Getting Your Paid Campaigns Ready For The Big Day appeared first on Screaming Frog.

]]>
As the old saying goes, “…in this world nothing can be said to be certain, except death and taxes.” well now you can add “ as well as a huge spike in search interest surrounding Black Friday from the end of August onwards”. Admittedly, it doesn’t quite have the same ring to it but still…

Black Friday may feel like a modern invention but its roots can be traced back to the US in the 1950s and whilst the actual meaning as to why it’s called Black Friday is a source of argument to some (I’m not going to run through its entire history, and run the risk of this post being like every recipe page you’ve ever seen on the internet) its importance to any ecommerce business has continued to grow each year.

According to data released by Shopify, $9.3 billion in sales went through their merchants over the 2023 Black Friday weekend, which was a 24% increase when compared to 2022, and this figure is expected to increase further in 2024.

With that being said, here are the 10 tips to help you optimise your ecommerce Google Ads campaigns for Black Friday and make the most of the increased traffic.


Start Early & Stay Ahead of the Wave

Preparation is key. People start looking for Black Friday information in August and so you need to make sure that everyone in the business is aware of the plans for Black Friday and knows how it’s going to go in the period leading up to (and immediately after) that initial weekend.

You’ll need to clearly define and (most likely adjust) your objectives for the Black Friday period – If you’re using a value or ROAS based bid strategy then you’ll need to take the change in user behaviour into account. You’re probably likely to see average basket value dip over the sales period which is replaced by larger volumes.


Ramp up Interest

In the lead up to Black Friday you can use your campaigns to ramp up the excitement. These can include campaigns specifically focused on driving newsletter subscriptions and/or account creations that offer the user early access to the Black Friday offers before everyone else.

You can also look at increasing brand awareness by targeting those users who have shown a strong interest in your products by adding relevant affinity audiences to your campaigns so that you start capturing the attention of those users in the run-up to Black Friday.


Make Sure Your Product Feed/s are up to Date

For any ecommerce business, Google Shopping Ads are essential so make sure your shopping feed/s & inventory are ready for the sale period. Whatever you do, don’t suddenly make wholesale changes the night before (or even worse the morning of) the sale as you’ll more than likely find that your changes end up in editorial review (AKA Black Friday purgatory) for most, if not all, the sales period, completely defeating the object!

You will want to make sure that your feed is refreshed more regularly on the day though, especially as the day draws on and your stock levels get lower and lower.


Leverage Remarketing Campaigns

Remarketing is a powerful tool at the best of times but it can really kick on during your Black Friday activity. Use your lists to help target specific site behaviour (cart abandoners, specific product views) and then create tailored ads that speak to those users.

For example, you can use your remarketing to give previous purchasers the exclusive pre-sale discounts (as mentioned before) or help with up/cross-selling opportunities by showing them new and related products that they’ll likely be interested in (and increasing their LTVs).


Take Advantage of Additional Audience Targeting & Segmentation

By creating (and then targeting) custom intent-based audiences you’ll then be able to target those users specifically searching for products similar to yours, these audiences could be built around ‘black Friday’ queries or the specific product categories that drive the most value for you.

Using your internal data you can also use customer lists to segment your audience by lifetime value (LTV). By segmenting your users in this way you can then create specific offers and ads depending on these values.

When thinking about your audiences, don’t just think about the positive targeting options. To optimise your delivery (and spend), create and use audience exclusions so you know that your ads are only serving to those users who are most relevant and valuable (e.g. It’s probably wise to exclude anyone who’s just purchased as you might upset them showing them a discount they didn’t receive!)


Update Your Smart Bidding Strategies with Seasonality Adjustments

On the day itself you know competition is really going to hit its peak, making it crucial that your campaigns can react quickly and effectively. We’ve already spoken about adjusting targets but you need to make sure that Google knows the inflated conversion behaviour over Black Friday isn’t the new norm and that volumes will likely return to pre-sale levels.

To improve bidding performance during this short term sale period you’ll want to take advantage of Seasonality Adjustments. To do this, you’ll need to set a start and end date for your Black Friday campaign and tell the platform the expected conversion rate change (based on your historical performance), this then provides Google’s algorithm with a more precise signal to temporarily increase or decrease bids in anticipation of higher or lower conversion rates.


Get Sign-Off on Your Ads

You’ll want your Black Friday ads written, approved, and uploaded to the campaigns in plenty of time so you know that when the time comes you’re ready to go. You do not want one of the dreaded “Ads Disapproved” emails sitting in your inbox first thing on the Friday.

Make sure you’re getting the most out of your use of ad assets:

  • Promotion extensions
  • Sitelinks
  • Callouts
  • Structured Snippets
  • Call extensions
  • Location extensions
  • Price extensions
  • App extensions

Prepare for Mobile Shoppers

The internet has pretty much moved on from mobile first to mobile only, and this is even more the case when it comes to shoppers on Black Friday. You’ll need to make sure your site (and checkout process) is fully mobile friendly, so that means that it loads quickly, has an easy/quick checkout and offers as easy a navigation as possible.

You’ll also want to make sure that your ads use mobile friendly copy. For example, shorter, punchy headlines and short, snappy USP heavy descriptions.


Set Your Budgets Accordingly

There’s no getting away from it, you’re going to need to increase your budget by quite a bit. Depending on how long you’re going to be running the sale for this could be anywhere from a 50% increase to x2 or x 3 your usual figure. Use your previous Black Friday performance to help estimate the potential increase needed.

Not that you won’t, but keep an eye on your spend throughout the day. As the numbers come in, you’ll be changing budgets based on the best performing products etc. but you will also need to make sure you don’t run out of budget before any expected peaks.


Monitor and Adjust Campaigns in Real-Time

Black Friday is a fast-paced event and you might (hopefully) find that you’ve done a whole day’s worth of sales by the time you’ve finished brewing your first cup of coffee. As it moves so quickly you’ll need to be all over your campaign data, both from within Google Ads but also through your Merchant platform, so that you can make adjustments in real-time. We’ve already said that you’ll likely need to make budget changes during the day but you might also find that your campaigns have done so well that you need to switch some things off as you’re completely out of stock of certain products.

Black Friday doesn’t just run 9-5 (quite the opposite!) and so you also need to make sure that if any updates are needed outside of business hours there’s a clear process and agreement in place as to what can and can’t happen (and who’s owning those changes) so there aren’t any shocks the next morning!

If you’ve read this and are now panicking that you’re not ready for the big day don’t worry you still have (some) time but hopefully these tips help you get the most out of your Black Friday campaigns and you get to take full advantage of the biggest ecommerce event of the year!

The post Planning for Black Friday – Getting Your Paid Campaigns Ready For The Big Day appeared first on Screaming Frog.

]]>
https://www.screamingfrog.co.uk/planning-for-black-friday/feed/ 1
How Your Phone Doesn’t Listen To You: Explaining Targeted Ads https://www.screamingfrog.co.uk/how-your-phone-doesnt-listen-you/ https://www.screamingfrog.co.uk/how-your-phone-doesnt-listen-you/#comments Mon, 01 Jul 2024 10:35:51 +0000 https://www.screamingfrog.co.uk/?p=277630 We’ve all had that moment. A conversation with a friend or family member where you decide that there’s something you want. Maybe you’ve just moved house and realised that you don’t own a hoover (as was the case with me), or perhaps it’s something simpler like deciding you want takeaway...

The post How Your Phone Doesn’t Listen To You: Explaining Targeted Ads appeared first on Screaming Frog.

]]>
We’ve all had that moment. A conversation with a friend or family member where you decide that there’s something you want. Maybe you’ve just moved house and realised that you don’t own a hoover (as was the case with me), or perhaps it’s something simpler like deciding you want takeaway for dinner? Whatever the case, the next time you go online and an ad for that very thing pops up, you’re left asking yourself the question: Is someone listening to my conversations?

Short answer: no. Long answer: kind of, but not in the way that you think.

Let’s use Dave as a demographic example for our case study. Dave has just seen an ad for a new game that he’s very interested in, the weird thing is, he’d only heard of the game earlier that day from a conversation he had with a co-worker, so how did it know to show him the ad now? The best example to illustrate this is Facebook. A number of years ago a project called Sharelab attempted to detail all the different ways that Facebook gathers data and it begins with four main areas.


Data Gathering

Profile and Account
Dave is a man in his late 20’s and he’s just signed up to Facebook. In his profile he has filled out several pieces of information such as his gender, age, date of birth and family members. On top of this he has detailed his previous education, including his study of history at university, and his current job as a product manager. All this information is used to build a user profile for Dave and people like him as well as allow for associations to be made between Dave and others.

Actions & Behaviour
Dave begins to scroll through Facebook and he comes across a few posts about the historical accuracy of video games that he likes or comments on and some that he even shares. Eventually he joins a group dedicated to the discussion of this and gets recommendations for other pages and events where he can meet up with like-minded people. This all then feeds into action and content stores which can be used to predict the kind of decisions Dave is likely to make in the future, what kind of content he already enjoys, and what similar areas he may be interested in.

Digital Footprint
As Dave is making his way across Facebook other information is being gathered. What time does he log in?, does he do it on his phone or his computer?, where is he when he logs in?. This is collected primarily through cookies on websites. This also helps to further flesh out his user profile by determining times when actions might be taken and his preferred device. It also feeds into location data and can place Dave into a general Locale.

Third Party
Finally, we come to those times when Dave isn’t on Facebook but information can still be gathered depending on what he is doing. Using companies affiliated with Facebook such as when he puts a post on Instagram or plays on his Meta Quest headset, he is providing further action, profile, and digital footprint Data to Facebook. There can also be information gathered from Third Party partners such as analytics, marketing, and other advertising services, that are accepted when he visits other websites.

The image above shows the culmination of the research done by Sharelabs and illustrates how Dave can take some rather trivial actions and still give an algorithm a lot of data to work with. This can seem quite intimidating and indeed several times companies such as Facebook have come under fire for the way in which they collect user information and their subsequent handling of it. The most notable example of which is the Cambridge analytica scandal in which significant personal information was taken not just from those who agreed to be surveyed but their entire Facebook friends list. Despite this, there are ongoing efforts to increase privacy and reduce the potentially predatory ways in which companies gather and use your data.

In 2018, the UK introduced a new version of the Data Protection Act (DPA) which superseded the previous one from 1998. This new DPA supplemented the General Data Protection Regulation and gave users the right to be informed of how the government and other organisations make use of your data, the ability to update incorrect information, have data erased, as well as stop the processing of your data, amongst other things. This year, Google introduced further changes to its consent policies from 2015 including the necessity for sites to ask permission before enabling any method of tracking on-site behaviour. Laws and policies such as these have led to greater privacy and user control over data and reflect the overall commitment to fostering an environment that is mutually beneficial to both consumers and marketers.


Processing and Targeting

Going back to Dave, the algorithm now has a plethora of information that it has gathered and has been subsequently stored into several categories. The next step is the processing, which we’ve already touched on briefly. Marketing requires several factors to be successful, you want to hit the right person, with the right ad, at the right time and the processing of data is what achieves this.

The hypothetical product in question is a new mediaeval game releasing in the UK in the next month with its key selling point being its immersive, sword-based combat. With what we know about Dave already, we know that he is likely a good choice of customer for this game. He has an interest in the subject matter as informed by the groups and pages he has visited on Facebook as well as his educational background. He is in the correct age and gender demographic for gaming and we know he has an interest because of his use of an Oculus headset. Finally, we know where he’s based because of location data gathered from cookies, when he’s available based on his login times, and roughly how much he’s likely to earn based on his employment details.

All this information comes together to inform the algorithm that Dave would be interested in the new game as well as when best to serve him the ads that he is most likely to engage with.

This was just the specific method that Facebook uses but lots of digital platforms will gather information in some way or another and to varying degrees. Hopefully that gives you a good idea of how ads can be so detailed in their targeting, using data that you may not have even been aware was shared. But this still leaves us with a mystery: Why did Dave only see the ad after he had spoken about it with his coworker? To answer that we have to shift away from algorithms and into the realm of Psychology.


Biases in Marketing

In 1994, a man named Terry Mullen wrote a letter to a newspaper in Minnesota in which he described something he had dubbed the Baader-Meinhof phenomenon. He explains that he had spoken to a friend about the notorious German terrorist group Baader-Meinhof and the following day that friend had called him to tell him about a news story he’d seen in the newspaper regarding that same group. Later down the line, a linguistics professor coined the term ‘frequency illusion’ to describe the same effect.

The frequency illusion is a form of cognitive bias in which people are more likely to notice a product, word, or idea if they have recently been introduced to it. This is primarily due to two factors. The first factor is selective attention and is a situation in which the subject unconsciously filters what they perceive to be distracting or useless information whilst focusing on what they deem important. What this appears as in practice, using Dave as our example, could be a situation in which an ad for the game had actually been shown to him before but because he did not have prior knowledge of it, he simply ignored it and scrolled past.

The second factor is confirmation Bias, in which subjects are more likely to seek evidence that confirms their existing beliefs and sometimes even overlook contrary evidence. For Dave, once made aware of the game and having it at the forefront of his mind, the ad suddenly jumps out at him and because he has concerns about being listened to he takes this as evidence for that theory

It’s also entirely possible however, that this was just a coincidence. For those with less experience in the industry it can seem like something more nefarious but after working in the industry myself it’s easier to see the inner workings and understand that you’re only just seeing ads because the marketing for the game has only just started or a myriad of other reasons such as budget, audience expansion, new keywords etc.


Final Thoughts

There are concerns about data and privacy in the modern day that are still being addressed both through laws and company policies and with technology still progressing and the rise of AI, these will come even further to the forefront. I hope I’ve managed to highlight just how vast the flow of information is for many sites and how this can create a surprisingly accurate demographic profile of a person just through certain actions and account information. Going back to our original question: Is someone listening to your conversations? No, because ultimately they don’t need to (unless you’re on some kind of watchlist).

The post How Your Phone Doesn’t Listen To You: Explaining Targeted Ads appeared first on Screaming Frog.

]]>
https://www.screamingfrog.co.uk/how-your-phone-doesnt-listen-you/feed/ 12
Screaming Frog SEO Spider Update – Version 20.0 https://www.screamingfrog.co.uk/seo-spider-20/ https://www.screamingfrog.co.uk/seo-spider-20/#comments Tue, 07 May 2024 07:54:04 +0000 https://www.screamingfrog.co.uk/?p=268713 We’re delighted to announce Screaming Frog SEO Spider version 20.0, codenamed internally as ‘cracker’. It’s incredible to think this is now our 20th major release of the software, after it started as a side project in a bedroom many years ago. Now is not the time for reflection though, as...

The post Screaming Frog SEO Spider Update – Version 20.0 appeared first on Screaming Frog.

]]>
We’re delighted to announce Screaming Frog SEO Spider version 20.0, codenamed internally as ‘cracker’.

It’s incredible to think this is now our 20th major release of the software, after it started as a side project in a bedroom many years ago.

Now is not the time for reflection though, as this latest release contains cool new features from feedback from all of you in the SEO community.

So, lets take a look at what’s new.


1) Custom JavaScript Snippets

You’re now able to execute custom JavaScript while crawling. This means you’re able to manipulate pages or extract data, as well as communicate with APIs such as OpenAI’s ChatGPT, local LLMs, or other libraries.

Go to ‘Config > Custom > Custom JavaScript’ and click ‘Add’ to set up your own custom JS snippet, or ‘Add from Library’ to select one of the preset snippets.

Custom JavaScript snippets

You will also need to set JavaScript rendering mode (‘Config > Spider > Rendering’) before crawling, and the results will be displayed in the new Custom JavaScript tab.

Custom JavaScript tab

In the example above, it shows the language of the body text of a websites regional pages to identify any potential mismatches.

The library includes example snippets to perform various actions to act as inspiration of how the feature can be used, such as –

  • Using AI to generate alt text for images.
  • Triggering mouseover events.
  • Scrolling a page (to crawl some infinite scroll set ups, or trigger lazy loading).
  • Downloading and saving various content locally (like images, or PDFs etc).
  • Sentiment, intent or language analysis of page content.
  • Connecting to SEO tool APIs that are not already integrated, such as Sistrix.
  • Extracting embeddings from page content.

And much more.

While it helps to know how to write JavaScript, it’s not a requirement to use the feature or to create your own snippets. You can adjust our templated snippets by following the comments in them.

Please read our documentation on the new custom JavaScript feature to help set up snippets.

Crawl with ChatGPT

You can select the ‘(ChatGPT) Template’ snippet, open it up in the JS editor, add your OpenAI API key, and adjust the prompt to query anything you like against a page while crawling.

At the top of each template, there is a comment which explains how to adjust the snippet. You’re able to test it’s working as expected in the right-hand JS editor dialog pre-crawling.

Custom JavaScript Editor

You can also adjust the OpenAI model used, specific content analysed and more. This can help perform fairly low-level tasks like generating image alt text on the fly for images for example.

ChatGPT Alt Text while crawling

Or perhaps coming up with new meta descriptions for inspiration.

ChatGPT meta descriptions

Or write a rap about your page.

ChatGPT rap about page content

Possibly too far.

There’s also an example snippet which demonstrates how to use LLaVa (Large Language and Vision Assistant) running locally using Ollama as an alternative.

Obviously LLMs are not suited to all tasks, but we’re interested in seeing how they are used by the community to improve upon ways of working. Many of us collectively sigh at some of the ways AI is misused, so we hope the new features are used responsibly and for genuine ‘value-add’ use cases.

Please read our new tutorial on ‘How To Crawl With ChatGPT‘ to set this up.

Share Your Snippets

You can set up your own snippets, which will be saved in your own user library, and then export/import the library as JSON to share with colleagues and friends.

Share JS Snippets

Don’t forget to remove any sensitive data, such as your API key before sharing though!

Unfortunately we are not able to provide support for writing and debugging your own custom JavaScript for obvious reasons. However, we hope the community will be able to support each other in sharing useful snippets.

We’re also happy to include any unique and useful snippets as presets in the library if you’d like to share them with us via support.


2) Mobile Usability

You are now able to audit mobile usability at scale via the Lighthouse integration.

There’s a new Mobile tab with filters for common mobile usability issues such as viewport not set, tap target size, content not sized correctly, illegible font sizes and more.

Mobile Usability Tab

This can be connected via ‘Config > API Access > PSI’, where you can select to connect to the PSI API and collect data off box.

Or as an alternative, you can now select the source as ‘Local’ and run Lighthouse in Chrome locally. More on this later.

Mobile usability checks in PageSpeed Insights

Granular details of mobile usability issues can be viewed in the lower ‘Lighthouse Details’ tab.

Lighthouse Details tab

Bulk exports of mobile issues including granular details from Lighthouse are available under the ‘Reports > Mobile’ menu. Please read our guide on How To Audit Mobile Usability.


3) N-grams Analysis

You can now analyse phrase frequency using n-gram analysis across pages of a crawl, or aggregated across a selection of pages of a website.

To enable this functionality, ‘Store HTML / Store Rendered HTML’ needs to be enabled under ‘Config > Spider > Extraction’. The N-grams can then be viewed in the lower N-grams tab.

N-grams tab

While keywords are less trendy today, having the words you want to rank on the page typically helps in SEO.

This analysis can help improve on-page alignment, identify gaps in keywords and also provide a new way to identify internal link opportunities.

New Approach to Identifying Internal Linking Opportunities

The N-grams feature provides an alternative to using Custom Search to find unlinked keywords for internal linking.

Using n-grams you’re able to highlight a section of a website and filter for keywords in ‘Body Text (Unlinked)’ to identify link opportunities.

Click on the image to see a larger version.

N-grams internal linking opportunities

In the example above, our tutorial pages have been highlighted to search for the 2-gram ‘duplicate content’.

The right-hand side filter has been set to ‘Body Text (Unlinked)’ and the column of the same name shows the number of instances unlinked on different tutorial pages that we might want to link to our appropriate guide on how to check for duplicate content.

Multiple n-grams can be selected at a time and exported in bulk via the various options.

This feature surprised us a little during development at how powerful it could be having your own internal database of keywords to query. So we’re looking forward to seeing how it’s used in practice and could be extended.

Please read our guide on How To Use N-Grams.


4) Aggregated Anchor Text

The ‘Inlinks’ and ‘Outlinks’ tab have new filters for ‘Anchors’ that show an aggregated view of anchor text to a URL or selection of URLs.

Aggregated anchor text in inlinks tab

We know the text used in links is an important signal, and this makes auditing internal linking much easier.

You can also filter out self-referencing and nofollow links to reduce noise (for both anchors, and links).

Aggregated Anchors filtered

And click on the anchor text to see exactly what pages it’s on, with the usual link details.

Aggregated anchors, show links with anchor text

This update should aid internal anchor text analysis and linking, as well as identifying non-descriptive anchor text on internal links.


5) Local Lighthouse Integration

It’s now possible run Lighthouse locally while crawling to fetch PageSpeed data, as well as Mobile as outlined above. Just select the source as ‘Local’ via ‘Config > API Access > PSI’.

Lighthouse Integration into the SEO Spider

You can still connect via the PSI API to gather data externally, which can include CrUX ‘field’ data. Or, you can select to run Lighthouse locally which won’t include CrUX data, but is helpful when a site is in staging and requires authentication for access, or you wish to check a large number of URLs.

This new option provides more flexibility for different use cases, and also different machine specs – as Lighthouse can be intensive to run locally at scale and this might not be the best fit for some users around the world.


6) Carbon Footprint & Rating

Like Log File Analyser version 6.0, the SEO Spider will now automatically calculate carbon emissions for each page using CO2.js library.

Alongside the CO2 calculation there is a carbon rating for each URL, and new ‘High Carbon Rating’ opportunity under the ‘Validation’ tab.

Carbon Footprint calculation

The Sustainable Web Design Model is used for calculating emissions, which considers datacentres, network transfer and device usage in calculations. The ratings are based upon their proposed digital carbon ratings.

These metrics can be used as a benchmark, as well as a catalyst to contribute to a more sustainable web. Thank you to Stu Davies of Creative Bloom for encouraging this integration.


Other Updates

Version 20.0 also includes a number of smaller updates and bug fixes.

  • Google Rich Result validation errors have been split out from Schema.org in our structured data validation. There are new filters for rich result validation errors, rich result warnings and parse errors, as well as new columns to show counts, and the rich result features triggered.
  • Internal and External filters have been updated to include new file types, such as Media, Fonts and XML.
  • Links to media files (in video and audio tags) or mobile alternate URLs can be selected via ‘Config > Spider > Crawl’.
  • There’s a new ‘Enable Website Archive‘ option via ‘Config > Spider > Rendering > JS’, which allows you to download all files while crawling a website. This can be exported via ‘Bulk Export > Web > Archived Website’.
  • Viewport and rendered page screenshot sizes are now entirely configurable via ‘Config > Spider > Rendering > JS’.
  • APIs can ‘Auto Connect on Start’ via a new option.
  • There’s a new ‘Resource Over 15mb‘ filter and issue in the Validation Tab.
  • Visible page text can be exported via the new ‘Bulk Export > Web > All Page Text’ export.
  • The ‘PageSpeed Details’ tab has been renamed to ‘Lighthouse Details’ to include data for both page speed, and now mobile.
  • There’s a new ‘Assume Pages are HTML’ option under ‘Config > Spider > Advanced’, for pages that do not declare a content-type.
  • Lots of (not remotely tedious) Google rich result validation updates.
  • The SEO Spider has been updated to Java 21 Adoptium.

That’s everything for version 20.0!

Thanks to everyone for their continued support, feature requests and feedback. Please let us know if you experience any issues with this latest update via our support.

Small Update – Version 20.1 Released 20th May 2024

We have just released a small update to version 20.1 of the SEO Spider. This release is mainly bug fixes and small improvements –

  • Updated carbon ratings and ‘High Carbon Rating‘ opportunity to be displayed only in JavaScript rendering mode when total transfer size can be accurately calculated.
  • ChatGPT JS snippets have all been updated to use the new GPT-4o model.
  • Added new Google Gemini JS Snippets. The Gemini API is available in select regions only currently. It’s not available to the UK, or other regions in Europe. Obviously it’s the users responsibility if they circumvent via a VPN.
  • Included a couple of user submitted JS snippets to the system library for auto accepting cookie pop-ups, and AlsoAsked unanswered questions.
  • Re-established the ‘Compare’ filter in the ‘View Source’ tab in Compare mode that went missing in version 20.
  • Fixed issue loading in crawls saved in memory mode with the inspection API enabled.
  • Fixed a few issues around URL parsing.
  • Fixed various crashes.

Small Update – Version 20.2 Released 24th June 2024

We have just released a small update to version 20.2 of the SEO Spider. This release is mainly bug fixes and small improvements –

  • Update to PSI 12.0.0.
  • Schema.org validation updated to v.27.
  • Updated JavaScript Libary to use Gemini 1.0.
  • Show more progess when opening a saved crawl in memory mode.
  • Retry Google Sheets writing on 502 responeses from the API.
  • Added Discover Trusted Certificates option to make setup for users with a MITM proxy easier.
  • Added ‘Export’ button back to Lighthouse details tab.
  • Fixed intermittent hang when viewing N-Grams on Windows.
  • Fixed issue with UI needless resizing on Debian using KDE.
  • Fixed issue preventing High Carbon Rating being used in the Custom Summary Report.
  • Fixed handling of some URLs containing a hash fragment.
  • Fixed various crashes.

Small Update – Version 20.3 Released 23rd September 2024

We have just released a small update to version 20.3 of the SEO Spider. This release is mainly bug fixes and small improvements –

  • Fixed bug where the SEO Spider was incorrectly flatting iframes found in the head when inserted via JS.
  • Fixed bug with URLs in list mode not being imported if they had capital lettings in the protocol.
  • Fixed bug with filtering column in custom extraction.
  • Fixed bug in basic authentication with non ASCII usernames.
  • Fixed various crashes.
  • Update Java to 21.0.4.

Small Update – Version 20.4 Released 22nd October 2024

We have just released a small update to version 20.4 of the SEO Spider. This release is mainly bug fixes and small improvements –

  • Fixed an issue with JavaScript rendering on macOS Sequoia.
  • Fixed freeze in a crawl if sitemap is discovered late in the process.
  • Fixed crash when connecting via RDP.

The post Screaming Frog SEO Spider Update – Version 20.0 appeared first on Screaming Frog.

]]>
https://www.screamingfrog.co.uk/seo-spider-20/feed/ 60
The brightonSEO Crawling Clinic April ’24 https://www.screamingfrog.co.uk/brightonseo-april-24/ https://www.screamingfrog.co.uk/brightonseo-april-24/#respond Mon, 15 Apr 2024 10:08:10 +0000 https://www.screamingfrog.co.uk/?p=269402 Somehow we’re already halfway through April, which means brightonSEO is just around the corner! Once again, we’ll be in our usual spot (stand 34, right hand side of the exhibition hall as you walk in), chatting about all things crawling. Come and meet the team and discuss any issues you’re...

The post The brightonSEO Crawling Clinic April ’24 appeared first on Screaming Frog.

]]>
Somehow we’re already halfway through April, which means brightonSEO is just around the corner! Once again, we’ll be in our usual spot (stand 34, right hand side of the exhibition hall as you walk in), chatting about all things crawling.

Come and meet the team and discuss any issues you’re experiencing, our exciting new features, things you’d like to see added to the SEO Spider and more. We’re also offering a full 2 week trial licence, and the team are more than happy to give you a primer on how best to use it.

We’re also an agency, so if you’d like to chat to one of the team about our award winning agency services please do!

We’ll also have some brand new Screaming Frog merch for you to get your hands on.

Firstly, we’ll be dishing out the enamel pins that made their debut in BrightonSEO San Diego.

Secondly, Screaming Frog bucket hats! If it’s sunny, you can wear a bucket hat at the beach. If it’s raining, you can also wear a bucket hat at the beach. It’s a win-win.


Managing Crawling & Indexing Training Course

As usual, we’re running the Managing Crawling & Indexing workshop at brightonSEO on the Wednesday (24th April) with SEO veteran Charlie Williams.

If you want a better understanding of how search engines visit, interpret and index your site, there’s still a few spots left.

We look forward to seeing you all next week!

The post The brightonSEO Crawling Clinic April ’24 appeared first on Screaming Frog.

]]>
https://www.screamingfrog.co.uk/brightonseo-april-24/feed/ 0