Get in touch with our team


4 min read

Brighton SEO: Sam Marsden – Next Level Content Audits With Crawlers

This article was updated on: 07.02.2022

Sam Marsden is a content and SEO manager at DeepCrawl. He is responsible for optimising for organic, content strategy and tagging for the site. Sam is talking about how you can take your content audits to the next level.

The need of a new site

Sam started at DeepCrawl last year, soon after starting received Series A funding. DeepCrawl were therefore able to ‘scale up’. Funds were made available for a site redesign which was much needed.

A site redesign is a long, time-consuming process. As well as the redesign they needed to move to a new CMS because they were suffering from a plugin bloat which was making it difficult to upload content.

They had to manually upload all the content. Any low value content wasn’t worth migrating where Sam decided a content audit was needed.

Four stages of a content audit

Sam broke down the best way to complete a content audit into 4 easy to follow steps:

  1. Full extent of the inventory
  2. Attach relevant performance data
  3. Create a set of criteria
  4. Make a decision

He then explained that he googled ‘what is a content audit?’ to get more of an insight into how to conduct a content audit. This source brought a featured snippet from marketing land that didn’t hit the nail on the head.

DeepCrawl wanted a data-driven approach. Sam continued to look and develop a comprehensive, fresh approach that took all of the site’s content inventory into account.

A content audit is like a spring clean

Sam listed four core stages of a site audit, comparing it to a Spring clean:

  1. Find all the crap you have hidden in your home / URLs on your site.
  2. Decide what is off-limits. Core pages you don’t want to get rid of.
  3. What’s your reasoning behind what will go? Create a set of criteria.
  4. Make the call on what gets binned? What isn’t needed?

First things first

You need to find all the existing content on your site. You can export all pages from your CMS – but pages might be missed.

You can run a crawl of your site – recommended – but this will give a limited view of data in isolation.

Other guides say export from third party tools which can be very luborious and time consuming.

DeepCrawl is the best solution as it not limited by scale and can easily integrate data into that crawl. Crawl data needs to be at the centre of your audit rather than just a single source.

You can then move on to extracting out onpage data such as author data etc. to review all the content thoroughly on your site.

You are going to have a lot of metrics in your crawl that might not be useful so whittle that down so you can start making some decisions and decide on the core pages you want to keep.

Questions you’ll want to answer

1/. What is an isn’t performing well?

Dependant on the nature of your site. News publisher will define this completely differently to a niche B2B site like DeepCrawl.

2/. How can you deal with content that isn’t performing well?

Go into content inventory and add in an action column.

Keep –> Cut –> Combine –> Convert.

On DeepCrawl’s site there was a lot of ‘best practise’ content which was very outdated so looked at whether the content was published in the last year, if it was definitely keep. Also looked at impressions etc.

Ask yourself important questions about the content: Is this page being seen in search and receiving traffic? Is the page bringing value to the site?

3/. How do you get the most out of content that is performing well?

Examine ways you can maximise your top performing content. Start using full data sources and look at tag pages etc.

4/.How can you inform your content strategy going forward?

Ensure that resources are invested into more of what works and less of what doesn’t. This will be particularly useful for large sites where a page-by-page assessment isn’t possible.

Look at relationships…

Tool of choice will be the pivot table..

1/. Performance by channel / category / content

Do some types of content perform better than others. Review articles, average PageViews and average shares.

2/. Is content length positively correlated with engagement?

Is there long form content that isn’t being engaged with? Maybe there can be guidelines for your content team created from this.

3/. Is page speed harming bounce rate and conversions?

Branded3 did some research that showed that the load time of a page is closely related to conversion rate.

4/. Performance and engagement by author

How does various bits of content perform on the site and relate this back to the author? This is very useful for sites like news sites where there is a high turnover of content.

5/. Performance fluctuations by publish date and time

Is content better received on specific days or at specific times?

Automating the process

The above is just the beginning, you will want to automate the process. This could initially be by scheduling crawls. You could then look at creating automated rules such as automated alerts for when traffic drops etc. You could also noindex low quality UGC.

You will want to have a look at how you can pull that data into dashboards for continuous monitoring, such as Google Data Studio.