Majestic

  • Site Explorer
    • Majestic
    • Summary
    • Ref Domains
    • Backlinks
    • * New
    • * Lost
    • Context
    • Anchor Text
    • Pages
    • Topics
    • Link Graph
    • Related Sites
    • Advanced Tools
    • Author ExplorerBeta
    • Summary
    • Similar Profiles
    • Profile Backlinks
    • Attributions
  • Compare
    • Summary
    • Backlink History
    • Flow Metric History
    • Topics
    • Clique Hunter
  • Link Tools
    • My Majestic
    • Recent Activity
    • Reports
    • Campaigns
    • Verified Domains
    • OpenApps
    • API Keys
    • Keywords
    • Keyword Generator
    • Keyword Checker
    • Search Explorer
    • Link Tools
    • Bulk Backlinks
    • Neighbourhood Checker
    • Submit URLs
    • Experimental
    • Index Merger
    • Link Profile Fight
    • Mutual Links
    • Solo Links
    • PDF Report
    • Typo Domain
  • Free SEO Tools
    • Get started
    • Backlink Checker
    • Majestic Million
    • Browser Plugins
    • Google Sheets
  • Support
    • Blog External Link
    • Support
    • Get started
    • Tools
    • Subscriptions & Billing
    • FAQs
    • Glossary
    • How To Videos
    • API Reference Guide External Link
    • Contact Us
    • About Backlinks and SEO
    • SEO in 2024
    • Link Building Guides
    • Webinars
  • Sign Up for FREE
  • Plans & Pricing
  • Login
  • Language flag icon
    • English
    • Deutsch
    • Español
    • Français
    • Italiano
    • 日本語
    • Nederlands
    • Polski
    • Português
    • 中文
  • Get started
  • Login
  • Plans & Pricing
  • Sign Up for FREE
    • Summary
    • Ref Domains
    • Map
    • Backlinks
    • New
    • Lost
    • Context
    • Anchor Text
    • Pages
    • Topics
    • Link Graph
    • Related Sites
    • Advanced Tools
    • Summary
      Pro
    • Backlink History
      Pro
    • Flow Metric History
      Pro
    • Topics
      Pro
    • Clique Hunter
      Pro
  • Bulk Backlinks
    • Keyword Generator
    • Keyword Checker
    • Search Explorer
      Pro
  • Neighbourhood Checker
    Pro
    • Index Merger
      Pro
    • Link Profile Fight
      Pro
    • Mutual Links
      Pro
    • Solo Links
      Pro
    • PDF Report
      Pro
    • Typo Domain
      Pro
  • Submit URLs
    • Summary
      Pro
    • Similar Profiles
      Pro
    • Profile Backlinks
      Pro
    • Attributions
      Pro
  • Custom Reports
    Pro
    • Get started
    • Backlink Checker
    • Majestic Million
    • Browser Plugins
    • Google Sheets
    • Get started
    • Tools
    • Subscriptions & Billing
    • FAQs
    • Glossary
    • How To Videos
    • API Reference Guide External Link
    • Contact Us
    • The Company
    • Style Guide
    • Terms & Conditions
    • Privacy Policy
    • GDPR
    • Contact Us
    • What is Trust Flow?
    • SEO in 2024
    • Link Building Guides
    • Webinars
  • Blog External Link
    • English
    • Deutsch
    • Español
    • Français
    • Italiano
    • 日本語
    • Nederlands
    • Polski
    • Português
    • 中文

Improve your internal links using Python string-matching

Andreas Voniatis

To round out our internal linking odyssey, Andreas Voniatis from Artios explains how Python can do much more for your internal links than the humble spreadsheet you might be used to.

 
Andreas Voniatis 2024 podcast cover with logo
« Back to SEO in 2024
More SEO in 2024 YouTube Podcast Playlist Link Spotify Podcast Playlist Link Audible Podcast Playlist Link Apple Podcast Playlist Link

Improve your internal links using Python string-matching

Andreas says: “Use Python’s string-matching functions to increase the relevance of your internal links on your website.”

Can you give a brief explanation of the value of using Python for SEO?

“What I love about Python is that it can scale SEO really well. A lot of SEOs will be working in spreadsheets and there are obviously restrictions or limitations in terms of what a spreadsheet can do. They are limited in the scale of the data they can handle, like the number of rows, but also in the complexity of the functions and calculations that they can perform with that data.

For example, if you're optimizing a high-traffic website with tons of pages, like Amazon, then you're going to find scalable SEO analysis in Excel or Google Sheets pretty limiting.

Instead, you can use an IPython notebook known as Jupyter, that will allow you to run Python code. If you import string-matching functions, you can take a target keyword and compare that to the title tags of your site pages to try and find the best page to send internal links to.”

Are you using this to determine whether a page or a piece of content is sufficiently optimized or just to find the most appropriate internal page to link to?

“You could also use it for measuring how optimized your content is, which is a different use case for Python. Python has many use cases for scalable and data-driven SEO. In this case, though, we're trying to find content like blog posts where you can place internal links that will help reshape the importance of your target content for Google and other search engines.”

What content elements are you looking for?

“The great thing about doing this is that there are so many different ways to approach it. On a basic level, you could take your target keyword and the title tags of all of your content, and then simply use a string-matching function to calculate the similarity between them. Based on that similarity metric, you could use a quick rule of thumb to say that anything that's 60% or above would be considered suitable pages to place internal links on, for example.

You could do it at the body content level but that's a bit more complex because you need to ingest that content into a spreadsheet cell (or what we call a DataFrame in Python language) to do that kind of calculation. That’s possible thanks to Python.

If you don’t know what a good rule of thumb is, you can go even deeper. You can say, ‘I want to model the median’ or ‘I want to model the 95th percentile of what's considered relevant.’ You can determine your rule of thumb on a statistical basis rather than on something that you pulled out of thin air.”

Would you be able to incorporate intent into what you're looking for?

“You absolutely can. If you had the target keyword for your site content then you could create another separate column in which you've predetermined whether those two keywords share the same search intent or not.”

What data sources are required for this?

“If you wanted to do this at a basic level, you could just rely on crawling data alone. If you want to get search intent involved, then you'll need SERP data so that you can determine the similarity between your target keyword and the focus keyword of the content page you're comparing the search intent of. If you wanted to look at whether Google was crawling that page live, you would obviously use server logs.”

How do you clean URLs that you wouldn't want to link to?

“That’s a slightly separate issue, but let's get into it. One of the things that I do is model the page rank or link equity of a website using crawl data and external backlink data, so that I get both the internal and external page rank. Then, I amalgamate those two data sources together to get what I would call the ‘effective page rank’, which combines both the internal and the external.

Using that, you can transform or pivot your existing site structure away from the typical catalogue/product group structure (which might make sense from a librarian’s perspective) and move it more towards the type of content structure that the internet is more interested in.”

Should all SEOs be doing this or is it primarily for technical SEOs?

“To me, any SEO should have a holistic view, and all SEOs should understand it. If you call yourself an SEO generalist or an SEO consultant, then you should have a level of competency, if not experience or understanding, in the holistic elements of SEO.

You should be competent in your technical, your content, and your backlinks/off-page SEO. Technical SEOs should know how to do this themselves, but SEO content strategists might not need to.”

How can you use statistical distributions to model relevance and highlight under-served target content?

“If you look at the median number of internal links to a product category on an e-commerce site, for example, those will be very different from the median number of internal links to a product item. I don’t want to create a hard-and-fast rule. I don’t want to say that any pages that have less than 10 internal links need more links, or that you should add a certain number of links to those pages. If you use statistical distributions, you're taking a smarter, more tailored approach. You're taking a segmented approach, and you're accounting for the fact that not all content is equal.

You would expect your product categories to have more internal links, so the threshold will be high. Your product items may have fewer internal links, or it might be the other way around. The point is to take a segmented approach. By using distributions, you're moving away from hard-and-fast rules.”

Is this just for internal links or can this approach be used to determine the optimum landing page for external links as well?

“You can apply it to absolutely everything. That's the whole premise of being data-driven.”

How do you measure the ROI of improved internal linking?

“You would benchmark the ROI beforehand and then it's almost like a split test. You would benchmark what it was before, then you could make the change following the model’s recommendations and see what the ROI is afterwards. However, if you're going to make this change site-wide, then you would want to do a split A/A test because you're comparing the result of the internal linking on the same URL against itself, before and after.

If you wanted to make it truly scientific, then you would conduct a split A/B test. In that case, you would only make that change on a collection of unlinked URLs, measure the revenue before and after, then compare it to the control group.”

Does providing better and more relevant internal links also enhance usability?

“In theory (and, in many cases, in a practical sense), search engine SEO and user experience are often aligned. By optimizing your content for the search engines, you should also be optimizing it for the user. If the user knows what they're getting before they click on the link, and the link is more relevant for their needs, then that should improve their experience.”

If an SEO is struggling for time, what should they stop doing right now so they can spend more time doing what you suggest in 2024?

“Stop getting better at Excel and retrain in Python.

Personally, I rarely use Excel. I use Google Sheets but only for putting together nice graphs because the ones produced by Python are a bit too sciencey for a business audience.

A more diplomatic and practical approach would be to say, ‘Limit your use of Excel and retrain in Python’. You’ll start noticing that you can invest ten minutes or one hour working out how to solve a dilemma in Python rather than Excel and, eventually, it will get to the point where you can do so much more in Python that you will drop Excel like a hot potato.

Python is also well future-proofed. That’s not to say there won't be a language in 10, 15, or 20 years that will supersede Python. However, the great thing is that, once you learn a computing language, those skills are transferable to almost any other computing language. I started out using R, which is a statistical computing language. Once I saw that more of the SEO industry was favouring Python, it was really easy for me to switch. A lot of the function names are identical.”

Andreas Voniatis is Founder at Artios, and you can find him over at Artios.io.

 

Choose Your Own Learning Style

Webinar iconVideo

If you like to get up-close with your favourite SEO experts, these one-to-one interviews might just be for you.

Watch all of our episodes, FREE, on our dedicated SEO in 2024 playlist.

youtube Playlist Icon

Podcast iconPodcast

Maybe you are more of a listener than a watcher, or prefer to learn while you commute.

SEO in 2024 is available now via all the usual podcast platforms

Spotify Apple Podcasts Audible

Book iconBook

This is our favourite. Sometimes it's better to sit and relax with a nice book.

The best of our range of interviews is available right now as a physical copy and eBook.

Amazon US Amazon UK

Don't miss out

Opt-in to receive email updates.

It's the fastest way to find out more about SEO in 2025.


Could we improve this page for you? Please tell us

Fresh Index

Unique URLs crawled 331,189,122,665
Unique URLs found 791,452,257,864
Date range 23 Jul 2024 to 20 Nov 2024
Last updated 1 hour 34 minutes ago

Historic Index

Unique URLs crawled 4,502,566,935,407
Unique URLs found 21,743,308,221,308
Date range 06 Jun 2006 to 26 Mar 2024
Last updated 03 May 2024

SOCIAL

  • LinkedIn
  • YouTube
  • Facebook
  • Bluesky
  • Twitter / X
  • Blog External Link

COMPANY

  • Flow Metric Scores
  • About
  • Terms and Conditions
  • Privacy Policy
  • GDPR
  • Contact Us

TOOLS

  • Plans & Pricing
  • Site Explorer
  • Compare Domains
  • Bulk Backlinks
  • Search Explorer
  • Developer API External Link

MAJESTIC FOR

  • Link Context
  • Backlink Checker
  • SEO Professionals
  • Media Analysts
  • Influencer Discovery
  • Enterprise External Link
top ^