To use Tesseract-OCR on PDF convert PDF to TIFF.
For single page PDF and multipage PDF:
magick -density 300 in.pdf -depth 1 -strip -background white -alpha off out.tiff
This binary (black or white only) TIFF file is about 1 MB / page.
Consider doing groups of pages for large/complicated PDFs.
Pages are 0-indexed, so to do say pages 4-7 of the PDF:
magick -density 300 in.pdf[3-6] -depth 1 -strip -background white -alpha off out.tiff
While at least 300 DPI is recommended, sometimes increasing resolution can make Tesseract performance worsen, particularly for poor quality text.
In such cases, it may be better to work on filtering/processing the input imagery more before inputting into Tesseract.
Run OCR: Tesseract can also output PDF or other formats.
Be aware that not all documentation/tips on the web address the machine learning models present in Tesseract 4.x.
tesseract out.tiff out
Tesseract processing can be controlled in numerous ways.
ImageMagick uses policy.xml to set read/write permissions by file format.
When read permissions are disabled for a format such as PDF, ImageMagick operations might fail like:
Markdown as a de facto documentation syntax has many variants.
The relative linking syntax seems to be widely supported by sites including
GitHub
and
GitLab
among others.
The syntax is simply like:
[TODO list](./TODO.md)
then even when cloned, forked, renamed, etc. the relative links will continue to work.
IPython console in Spyder IDE by default opens non-interactive Matplotlib plots in the same inline “notebook”.
Fix this by creating separate windows for interactive figures in Spyder:
Switch between compilers e.g. g++-7 and g++-8 with simple commands.
Note: We suggest NOT using sudo, but rather to make the links under ~/.local/bin, which should already be in your PATH (or start using it as in step 1).
(one-time) Setup shell to use ~/.local/bin instead of system-wide /usr.
This is generally beneficial in any case.
update-alternatives works with virtually any program including
Python.
Compiler version priority order: last number of update-alternatives --install is priority.
The highest priority number is used in “automatic” update-alternatives mode.
Troubleshooting: If accidentally reversed the order of the link and target or if used sudo in /usr/bin then may need to reinstall the compiler.
Like Python, Fortran does not understand tilde ~ in commands like open().
Just like in Python and some Matlab functions, an expanduser() procedure is needed.
We provide this functionality in
fortran-filesystem expanduser().
The shell typically expands ~ itself.
It becomes an issue when reading an absolute path involving ~ from say a config file using Fortran and then trying to open that filename read from the config file.
Numerous Python packages use PyTest extensively.
Upon apt install a package that depends on python-astropy, python-pytest is also installed.
The system version of a package is typically several minor versions behind for stable Linux distros like Ubuntu.
To find which specific package is responsible for a package being installed, use a command like:
Ubuntu Gnome Agent remembers SSH private key passwords until you log out.
If someone knows an Ubuntu user password, they also have access to any SSH private keys loaded since last logon.
This also fixes error upon trying to use ssh or sshfs:
GPSTk Python
examples
require installing GPSTk for Python.
GPSTk is a complicated program that is more difficult to install than typical Python programs.