Fix corrupt UTF8 files with Python

I find that sometimes files included in Python projects, for example Fortran files, have corrupted characters that are incorrect UTF-8 characters. Maybe it’s a case of bad OCR that also plagues LaTeX / BibTeX copy / paste references from journal websites. Thus, this method will also apply to BibTeX files.

Pure Python script find_bad_characters.py recursively:

  1. finds such corrupt files
  2. removes the corrupted characters
  3. backs up original file and overwrites if desired