Check website links with Python

Our small Python-based Markdown link-checking script is effective for large (thousands of pages, tens of thousands of links) Markdown-based websites/ It is immensely faster than the legacy HTML LinkChecker program of the next section. Alternatives exist for Go and JavaScript.

If you’re using Netlify, consider a link-checking plugin that checks tens of thousands of links for each “git push” of the website Markdown in about two minutes.

HTML LinkChecker

If your website is not Markdown-based, there is a large HTML LinkChecker Python program that was an effective offline or online method to recursively check websites from the command line. However, it is not frequently maintained, and has a growing number of false positives and false negatives.

Install

The PyPi releases are out of date so instead of the usual

pip install linkchecker

we recommend using the development Linkchecker code

git clone --depth 1 https://github.com/linkchecker/linkchecker/

cd linkchecker

python -m pip install -e .

Internal/external links are tested recursively. This example is for a Jekyll website running on my laptop:

linkchecker --check-extern localhost:4000

The checking process takes several minutes, perhaps even 20-30 minutes, depending on your website size (number of pages & links). Pipe to a file as below if you want to save the result (recommended).

Examples

list options for recursion depth, format output and much more:

linkchecker -h

save the output to a text file

linkchecker --check-extern http://localhost:4000 &> check.log

monitor progress with

tail -f check.log