We set out with two initial strategies in mind when approaching this task:
A short-term strategy – targeting and optimising areas of the domain that received the most traffic to achieve higher rankings and to drive conversions
A long-term strategy – improving the domain’s overall indexation and content quality, as we discovered a number of pages that needed to be optimised.
Since CEIC’s domain was so vast, we needed to get a full picture of how many URLs were on the site to gain insights on indexable content and index traps. We used a cloud-based solution to crawl 6m+ URLs, as standard SEO crawlers lacked the RAM capacity for a site this large.
To identify areas for improvement for content on the website, we created a sitemap hierarchy that reflected the categorisation of indicators by country and content quality. We followed a set criteria for judging these pages and set out to optimise them.
We then divided the sitemap into new and old content, ensuring that new content was crawled more frequently. This was coupled with updating internal links on top level indicator pages, which would make it easier to find pages with indexation issues.
Rather than approaching content page by page, we developed our own automation programme using Python, which enabled us to improve and optimise content en masse across 6 million+ URLs.