In light of the latest Google updates, more webmasters are getting penalised for having dodgy links. If you own a site, you need to conduct a backlink audit in order to assess the risk your website might be under, even if you’ve not been penalised by the search engine giant.
So, conducting a backlink audit is not important just for SEOs but for anyone who owns a website as it gives you the chance to take a closer look at the quality of your links.
To make the auditing process easier, Excel is an invaluable tool. If you have thousands of links to analyse, Excel makes it simpler to go through all of them and remove those bad links that might be harming your site.
This guide will teach you how to gather, sort and clean up your list of links in just a few steps.
Let’s get started.
Get the data
Whatever backlink checker tool you’re using, export the data in any Excel compatible format available.
You may need to use more than one tool to ensure you get all the links. I usually use Webmaster Tools (WMT) (both the All Domains and Latest links options), Ahrefs, Moz’s Open Site Explorer and MajesticSEO to get all the links. Why all of these tools? Because each one has its own crawlers which means there’s a good chance one tool will get backlinks that another didn’t.
Even if Google shows you the links in WMT, those are not all the links that Google knows about in relation to your site, so make sure you use other tools as well to ensure you find as many links as possible.
Next, gather all the links in one sheet. Different tools will come back with different data. Make sure all the exports from each tool have the backlink page in the first column. Rearrange columns as needed to ensure when you combine the reports, the information is in a format that makes sense.
Here’s a tip: When you’re working with long lists and lots of columns, it’s always helpful to keep the header of the list in place when you scroll down to see what each column represents.
To do this, simply go to View > Freeze Panes and select if you want to freeze the header or the first column. Watch the video below to see how:
So now you have a full list of links – what next?
Step 1: Clean up the data
The chances are that the backlink checker tools you’ve used have returned some links that are duplicated across all one or several tools. So now you need to remove these duplicate links.
Excel has a built-in function called Remove Duplicates. To locate it, go to Data > Remove Duplicates.
This video shows you how to use this function in Excel:
Make sure you select your entire table before removing duplicates. Then, when asked which columns you want to delete duplicate values from, only select the column that you want to de-dupe. Doing this will ensure you don’t remove rows which include different URLs but have duplicate values in other columns.
Extract all URLs from your list of linked keywords
Next, you will need to extract all URLs from your list of linked keywords. Some tools will give you the anchor text as a hyperlink so to extract those links:
1. Open Visual Basic Editor (use ALT + F11 shortcut);
2. Navigate Insert -> Module to adds a module
3. Paste the code below
Sub ExtractHL() Dim HL As Hyperlink For Each HL In ActiveSheet.Hyperlinks HL.Range.Offset(0, 1).Value = HL.Address Next End Sub
4. Navigate Run > Run Sub/UserForm (or hit F5)
5. Close the Visual Basic Editor (use ALT + Q)
When you return to your sheet you will see that the links have been pulled from the anchor texts.
Watch the video to see this in action:
Here’s a tip: When you extract the links, it will always pull the links to the next column on the right. Make sure you leave that column empty so you don’t overwrite your data.
You need to create this module every time you want to apply it. If you want a more permanent solution you will need to create a macro.
Show just one URL per domain
Not all exports will show you the main domain that links to you. Some will just give you the page.
To extract the domain only from any URL you can use another Add-In called URL Tools Excel. Download it and follow the instructions provided on that page.
Once installed you will get access to new functions such as:
- subdomain() – This returns the domain part of a URL including any subdomains e.g ‘http://www.example.com/index.htm’ becomes ‘www.example.com’.
- nowww() – This removes ‘www.’ from the URL (note that www must be at the very beginning of the URL).
- wwwsubdomain() – This is a combination of the above functions, equivalent to ‘nowww(subdomain(A1))’. This is what you’ll want to use most of the time so make sure you memorise it.
- urlencode() – This converts a string so that it can be used on a URL query string. It replaces characters that are illegal in URLs, such as blank spaces, with the code that represent those characters as defined in RFC 3986, Uniform Resource Identifier (URI): Generic Syntax. For example, blank spaces are replaced with %20, and exclamation points are replaced with %21.
- urldecode() – Un-encodes a URL string. Try it on a string like ‘Hello%20World%21’ and you will see that it replaces %20 with a blank space and %21 with an exclamation point.
- tld() – Returns the TLD of a URL. For example, you can choose to retrieve only links with a .co.uk extension.
- geturl() – Extracts the URL from a Hyperlink. For example, from a hyperlink like http://www.allrecipes.com//Recipes/desserts/cookies/Main.aspx it will extract the actual link which is allrecipes.com.
As you can see, geturl() does the same thing as the module we’ve created above. So if you use the add-in you won’t need to create the module every time you want to extract the URL from a hyperlink.
Now on to step two.
Step 2: Sort by anchor texts
To exclude good backlinks, sort your list by anchor texts. This allows you to see more clearly how many pages are linking to the same anchor or if links are targeting money keywords.
Remove all the links that use a branded anchor or a natural sounding one so you’ll have a shorter list to work with.
Step 3: Get the status of the pages that link to you
Not every tool will scan your links on the spot and some pages that have linked to you in the past might not even be there anymore. If you want to see their header status, you can use an Excel Add-in called SEO Tools for Excel.
Follow the installation steps from the download page and when you’re done you’ll notice a new tab has appeared in your Excel:
You can use all the functions by either writing them yourself or by selecting them from the menu.
Watch this video to see how to use this function:
Be aware that your Excel might freeze when you’re trying to check the status of a long list of URLs. So, to be safe I recommend applying a formula for batches of pages. I usually do it for 50 pages at a time or my entire work sheet and leave Excel open to work throughout the night.
Now you’ll be able to remove websites from your list that return a 4XX or 5XX status. You will also see which websites have been redirected (3XX) and where, and then check if the new site still has links.
Here’s a tip: Having a large number of formulas in a sheet might slow down Excel so make sure you always copy your cells that have formulas and paste them as values. Watch the video to learn how to do this:
Step 4: Is my URL still on the page?
Some of the pages that returned a 200 OK status might have been cleaned up since the backlink checker tool found it. Instead of going through all the links, you can trim down some of them by searching for your URL in the code of the page. You don’t even need to leave your Excel sheet!
To do that, you will need to use the isfoundonpage function that comes with SEO Tools for Excel.
- You need to have all the pages you want to search in in one column. (column A)
- You also need to have the URL you’re looking for in another column. (column B)
- Then in column C you add the formula =isfoundonpage(A,B) à On Page A is text B found?
Watch the video to see how it’s done:
In this video, I’ve searched for:
- Twitter.com – It has been found 11 times as each post has a Twitter link and there are also social buttons in the footer of the website.
- Plus.google.com – It only shows once in the footer.
- Heartinternet.uk – This URL is present 126 times as I’m searching the Heart website and there are links to various pages and they all include “heartinternet.uk”
- Google.de – Comes back as 0 since there aren’t any mentions of this domain.
You can also use the same formula to find keywords on a page without having to actually go on that page.
I also like to use it when checking my own websites for things like code snippets (canonical tags, hreflangs).