Broken Internal Link Checker Tool

Script Verification: This Python automation workflow is tested and fully compatible with Python 3.10 through Python 3.12. Ensure your requirements.txt dependencies (like BeautifulSoup4 and requests) are updated.

Broken internal links return 404 errors for both users and Googlebot — they waste crawl budget and destroy the link equity flow between your pages. This Python script crawls your sitemap, follows every internal link on each page, and flags any URL returning a non-200 status code. It pairs directly with the 3xx redirect checker and the full technical SEO audit suite for comprehensive URL health monitoring.

Click the following link to open the Python Script: Broken Internal Link Checker

How Does the Broken Internal Link Checker Script Work?

The Python script provided is a comprehensive tool for scanning a website's sitemap (check your sitemap URL format simultaneously), identifying broken links, and generating reports. Let's break down the key functions and explain their significance.

  1. Installing Necessary Libraries The script begins by installing the requests_cache library using the !pip install command. This library enables caching of HTTP responses, which significantly speeds up subsequent requests.
  2. Importing Required Libraries The script imports several libraries, including csv, multiprocessing, urllib.parse, pandas, requests, requests_cache, bs4 (BeautifulSoup), and tqdm. These libraries facilitate web scraping, data manipulation, and parallel processing.
  3. URL Checking Function The check_url(url) function uses the cached session to check the validity of a given URL. If the URL returns a 200 status code, it is considered valid, and the function returns True.
  4. Sitemap Parsing Function The parse_sitemap(sitemap_url) function fetches the sitemap URL and uses BeautifulSoup to extract URLs. These URLs are then filtered through the check_url() function to ensure they are valid. This function creates a list of URLs free from broken links.
  5. Link Checking Function The check_links(url) function performs an in-depth analysis of a URL's internal links. It uses the extract_links() function to parse HTML content and identify all anchor tags (<a>) with valid URLs. The function then proceeds to check each link's validity using the check_link() function.
  6. Link Extraction Function The extract_links(html, base_url) function uses BeautifulSoup to extract and normalize internal links within a webpage's HTML content. This ensures that only valid links within the same domain are considered.
  7. Individual Link Checking Function The check_link(link, session) function assesses the validity of an individual link by sending a HEAD request. If the link returns a status code of 400 or higher, it is considered broken.
  8. Main Script Execution The script takes user input for the sitemap URL and initiates the link checking process. It uses the multiprocessing library to parallelize link checking, improving efficiency. The results are stored in a dictionary, categorizing broken links by the URL they were found in.
  9. Generating Reports If broken links are detected, the script generates a report in both CSV and XLSX formats. These reports provide valuable insights into the broken links, including the pages they were found in and the actual broken URLs.

Why Broken Internal Links Hurt Your SEO Rankings

The provided Python script offers several benefits that can greatly enhance website management and SEO:

  1. Improved User Experience Broken links can lead to user frustration and tarnish the reputation of a website. By proactively identifying and fixing broken links, you ensure that visitors can seamlessly navigate your site, leading to a positive user experience.
  2. Enhanced SEO Performance Search engines value user experience, and websites with broken links can receive lower rankings. By regularly scanning and fixing broken links, you demonstrate your commitment to maintaining a high-quality website, potentially boosting your search engine rankings.
  3. Time and Effort Savings Manually checking each link on a website can be time-consuming and error-prone. This script automates the process, significantly reducing the time and effort required to identify broken links.
  4. Comprehensive Reporting The script generates detailed reports that provide insights into broken links, allowing you to address the issues efficiently. The generated reports serve as valuable references for tracking improvements over time.
  5. Bulk Processing The script uses parallel processing to check multiple links simultaneously, which is especially useful for larger websites with numerous internal links.

Fix Broken Links Systematically: Run This Script Weekly

In the digital landscape, maintaining a functional and user-friendly website is paramount. Broken links can have detrimental effects on user experience and SEO rankings. The Python script discussed in this article provides a powerful solution for identifying and addressing broken links efficiently. By using this script, you can enhance your website's user experience, improve SEO performance, and save valuable time in the process. Embrace the capabilities of this script to ensure your website remains in optimal condition and delivers a seamless browsing experience to your visitors.