Get in touch with our team


Indexing for Ecommerce – RankUp with Dan Taylor

Subscribe to RankUp on Spotify | Apple Podcasts | Google Podcasts | RSS

Ecommerce businesses and other large websites often face challenges in getting all of their content crawled and indexed by search engines. Given that index coverage is a key part of achieving good organic visibility, we wanted to bring you a conversation on how to monitor and improve indexing for your site.

Dan Taylor joined us for this conversation. Dan is the head of research and development at Salt, and brought years of technical SEO expertise to the podcast.

He talked us through why this conversation is important to have, tools for identifying and dealing with indexing issues, and tips for making sure that your solutions actually get implemented.

If you want to hear more from Dan, you can find him on Twitter at @TaylorDanRW. Twitter is also the best place to find the rest of the RankUp team: Edd (@EddJTW), Ben (@BenJGarry) and Liv-Mae (@seoliviamae).

Introducing Dan

Dan: Like most SEOs, I never actually intended to get into SEO. In the early 2,000s, my Dad started working as a PA announcer at Grimsby Town FC, and they had just bought a brand new scoreboard but had one to operate it. My dad said, “My lad knows computers,” so aged 11 I got thrust into this small room and was operating the scoreboard. I was surrounded by the marketing and commercial teams, which, as an 11 year old child surrounded by ex-footballlers, seemed pretty fun.

So I went on to marketing at university, coupled with business and economics. I started off in broad marketing and fell into SEO about eight years ago as an in-house specialist, before taking the agency plunge about seven years ago.

Why is indexing an imporant issue for large sites?

Dan: It’s an important issue outside of ecommerce as well, but within ecommerce we need to match the organic performance to how the business model actually operates.

A good majority of ecommerce websites will have an element of seasonality to their product lines. For example a fast fashion brand I work with has stable seasonal lines, but also has one-off two or three week lines that mean they have fast product rotation and high churn.

Ecommerce in terms of how people buy is still only really 20 years old. Most people have made that jump to ecommerce in only the last 10-15 years. So the mindset of people buying products still follows from the old brick and mortar methodologies where if you put something else, it’s available and people can buy it.

But online, your shelf is oftentimes dependent on Google indexing and then actually serving the product. That disconnect makes this more interesting and nuanced as a problem to solve. This will differ between the brand and the business model, but that’s one of the more common issues we come across.

Examples of different indexing issues

Dan: In January this year, I had three ecommerce sites around the same time all experiencing something different from Google.

For one of them, with about 10,000 SKUs, it started to randomly prefer the more popular variable products within a range over others. We saw Google overriding canonicals and other issues that don’t necessarily relate to its understanding.

For another, it essentially just started to consolidate everything into categories. It’s a mechanical engineering site, so there are lots of subtle changes between the products, which were then combined into what we saw in the index, and serving more specific ones for specific queries. This marks the difference between what is being indexed and what’s actually being served.

On the third site, which is about 300,000 SKUs, it just decided to drop pretty much all of them from the index in one go, then rebuild all of them over the next two or three weeks as it was deciding what was of value. This happened more or less overnight, then slowly clawed itself back.

The three sites were almost like three boxes. With one Google could sort of just throw it all in the box, with the other, just tidy it up a bit, and then the third was like a big box of Lego that it just tipped out.

How the CMS affects indexing

Dan: One of the biggest issues I come across if a site hasn’t had performance SEO vs ‘box tick’ SEO is that different expectations lead to different outcomes. Whatever is set as a KPI is oftentimes the thing that will get done.

With CMS’s, I see overreliance on out of the box solutions. So on Shopify, Magento, Salesforce, you’ll have a home page template, you’ll have the product listing page (PLP) template and you’ll have product detail pages (PDPs) which are your actual product pages. You’ll have different elements for cross-sell and upsell, and you’ll have pagination.

Outside of that, very rarely do I come across businesses who have actually taken a look at the structure and tried to expand it beyond what we traditionally see in this kind of catalogue structure.

This means you’ll have products being added to a category on page six or seven because that’s the last page, versus being on page one and being indexed quicker. We know XML sitemaps exist and that Google uses them, but we also know that Google values finding things through direct crawling.

So doing things like special landing pages, template pages or even pseudo-HTML sitemaps that look nice for the user but are there just for Google to crawl to access deeplinks – that’s what I mean by going beyond just what the out-of-the-box CMS platform gives you.

Example of CMS changes in action

Dan: I was working with a shoe retailer. They had the brand, then they had sub-brands within that, and the sub-brands had all different variants. When you looked at how people were searching, they were looking for specific brands, variations, colour patterns and special editions, but from an architecture perspective on the website, these pages didn’t exist. They would often filter pages from faceted navigation or use the search feature.

More often than not in these situations, someone somewhere will have nofollowed all of the faceted navigation or blocked all of the parameters within robots.txt because of index bloat and crawl bloat, rather than doing it based on whether or not we want something indexed.

So by creating special product listing pages targeting specific variations and adding hot links within the site architecture, we were able to provide a more direct path to those specific variations, which is a better user experience.

What are the tools that you use for optimising and maintaining ecommerce stores?

Dan: I’ve got access to the usual suspects: Screaming Frog, Sitebulb, Ahrefs, Mangools. From there a lot of the stuff we do, especially with large ecom, is that we take Search Console and Google Analytics and their APIs, and put them through Google Cloud functions, then into a graphic interface of some sort, or Data Studio.

We can then carry out routinely scheduled crawls through Screaming Frog, use the API and find indexability that will automatically match itself up through Cloud functions to, say, a list of products that we know are good revenue drivers.

If we see that these products are dropping by 30-40% in revenue, we can can at least look to see if there’s an indexing issue, or a linking issue. If there’s no issue it might just be seasonal or user preference, but at least we can feed it back to the client and make suggestions like changing up the top sticky links or the top main links.

Is there a trend in more indexing issues at the moment?

Dan: For me there are three factors outside of SEO that play into this. There’s the physical factor of storing that much data, either physically or in the cloud, which is an expanding universe, not a contracting one. This leads to the second point, which is that there is a saturation point at which it stops being feasible for Google to store, process and retain data from a financial perspective. Remember, Google has no obligation to index and store everything.

That brings me to my third point, which is to ask if there’s anything on the internet being published today that doesn’t already exist in some form or other. Interestingly, in Google’s Quality Rater Guidelines they’ve started using the phrase ‘beneficial purpose’ in line with page quality.

So if there were already 100,000 pages talking about agile estimation using Fibonacci sequencing, and you’re adding another page to that mix without being a market leader, you’ve not got any authority. You’re just the 100,001st article repeating what already exists. From a cost standpoint, Google likely won’t index it unless it’s better than the top 100 already out there.

I’d also argue that oftentimes content isn’t needed, and isn’t being indexed because we have to take into account user satisfaction. Even if we produce a piece worthy of the top 10 because of its value proposition, depth, authority etc, but Google is getting positive user metrics for the relevant SERPs, it has no urgency to rush in and disrupt what’s there by trialling a new piece of content.

I think that saturation point is starting to transpire a lot more now.

John Mueller has also said in the last three or four months that he expects it to be normal for 20% of a website not to be indexed. He also said that we’ve seen websites drop for what he called peripheral queries, where their pages were relevant but not 100% spot on. He described that as bug fixing, where Google realised you were ranking for a result you probably shouldn’t have been.

How do you go about getting buy-in for issues that need fixing?

Dan: It’s a three stage process: identification, collaboration and presentation. First, you identify an issue and verify that something is broken. We list it out in a ticket format like if you were using Trello, Asana, Jira.

For me, the key thing is then understanding the language that the development team and the business will be used to. When we understand the language around development, even if they don’t understand SEO fully, everyone understands development and development cost.

When we go into the collaboration stage, we will say that we’ve got issue X, here’s what issue X is from an SEO perspective. Based on all the data we can see and our experience, it will have Y impact in terms of performance.

We then turn to the development team, explain the issue to them and explain how we’ve seen it fixed in the past. We identify the acceptance criteria, go through it with them, and they would turn around and say that from a development perspective, it will take Z resource.

So we end up with a list of items sortable by SEO impact and development resource.

When we go to present this at C-level and to stakeholders, we’re talking in a language they already understand, because we’ve spoken to the development team numerous times.

Listen to the episode for more!

We don’t have the space to include everything that Dan covered in his interview, so listen to the full podcast to hear all of his points on indexing for ecommerce.

We’ll be back soon with a new episode. In the meantime, you can find the team on Twitter at @BenJGarry, @EddJTW and @seoliviamae.

If you’re interested in being a guest on the show, please reach out to us on Twitter or via email:

Subscribe to RankUp on Spotify | Apple Podcasts | Google Podcasts | RSS