In data science we often deal with messy, heterogeneous data and file types too.
Python Pandas is a very powerful data science tool.
A simple but not infrequent mistake is using the wrong Pandas function to read data, that is, using read_excel to read CSV data or read_csv to read Excel spreadsheet data.
Note: Pandas cannot read ODS OpenDocument formats, so for those using LibreOffice/OpenOffice, convert ODS data to XLSX first.
SCP does not have an option to exclude files while copying remote files over SSH.
This is a problem when you have Git-managed code you’ve modified on say:
offline computer
HPC, and don’t want to put your Git host credentials on the HPC
If you just use scp -r, you’ll also overwrite the .git directory which can destroy work done on this or other branches.
We want to just copy the code files, NOT the .git tree.
Note exclusion under bin/ of the repository.
The HPC probably runs a different Linux distro and the compilation is optimized for a different CPU, so the HPC binaries wouldn’t generally be useful elsewhere.
The Fortran 2003
standard
constitutes a strong foundation of “modern Fortran”.
Modern Fortran (Fortran ≥ 2003) is so different in capabilities and coding style from Fortran 77 as to be a distinct, highly backward compatible language.
Almost all of Fortran 95 was incorporated into Fortran 2003, except for a few obscure little used and confusing features deprecated and already unsupported by some popular compilers.
Writing to console effectively: write(*,*) grew out of non-standard use of Fortran 66’s write statement that was introduced for device-independent sequential I/O.
Although write(*,*) became part of Fortran 77 for printing to console standard output, the Fortran 77 print command is more concise and more importantly visually distinct.
That is, where the full versatility of the write command is not needed, print should be used to help make those cases where write is needed more distinct.
Assembly language comparison: print *,'hi' and write(*,*) 'hi' are IDENTICAL in assembly, within modern compilers as it should be.
In general, disassemble Fortran executables with:
Fortran 2003 finally settled the five-decade old ambiguity over console I/O with the intrinsic iso_fortran_env module, which is often invoked at the top of a Fortran module like:
The => operators are here for renaming (they have other meanings for other Fortran statements).
It’s not necessary to rename, but it’s convenient for the popularly used names for these console facilities.
Recommendation: routine console printing:
print*,'Hello text'
For advanced console printing, whether to output errors, use non-advancing text, or toggle between log files and printing to console, use write(stdout,*) or the like.
Example: print to stdout console if output filename not specified
use,intrinsic::iso_fortran_env,only:stdout=>output_unitimplicitnone(type,external)character(:),allocatable::fninteger::i,u,Lcallget_command_argument(1,length=L,status=i)if(i/=0)error stop"first command argument not available"allocate(character(L)::fn)callget_command_argument(1,fn)if(i==0)thenprint'(a)','writing to '//fnopen(newunit=u,file=fn,form='formatted')elseu=stdoutendifi=3! test data
write(u,*)i,i**2,i**3if(u/=stdout)close(u)! closing stdout can disable text console output, and writes to file `fort.6` in gfortran
print*,'goodbye'! end program implies closing all file units, but here we close in case you'd use in subprogram (procedure), where the file reference would persist.
endprogram
Polymorphism is a part of generic programming enabled by Fortran 2003.
Typically one should encapsulate procedures in modules, even when the whole program is contained in a single file.
Example: addtwo() automatically selects the correct type thanks to the interface block.
modulefuncsuse,intrinsic::iso_fortran_env,only:sp=>real32,dp=>real64implicitnone(type,external)!! takes affect for all procedures within module
interfaceaddtwoprocedureaddtwo_s,addtwo_d,addtwo_iendinterfaceaddtwocontainselementalreal(sp)functionaddtwo_s(x)result(y)real(sp),intent(in)::xy=x+2endfunctionaddtwo_selementalreal(dp)functionaddtwo_d(x)result(y)real(dp),intent(in)::xy=x+2endfunctionaddtwo_delementalintegerfunctionaddtwo_i(x)result(y)integer,intent(in)::xy=x+2endfunctionaddtwo_iendmodulefuncsprogramtest2usefuncsimplicitnone(type,external)real(sp)::twos=2._spreal(dp)::twod=2._dpinteger::twoi=2print*,addtwo(twos),addtwo(twod),addtwo(twoi)endprogram
PEP8 code style benefits code readability.
“flake8” checks for PEP8 compliance, as well as catch some syntax errors in unexecuted code.
flake8 is typically part of
continuous integration.
pip install flake8
in the top directory of the particular Python package type:
Python 3.7 was released in June 2018, adding performance to common operations, and adds user-visible changes in the following categories.
The boilerplate copy-paste required for Python classes can seem inelegant.
Python 3.7
data class
eliminates the boilerplate code in initializing classes.
The @dataclass decorator enables this template.
@dataclassclassRover:
'''Class for robotic rover.''' name: str uid: int battery_charge: float=0. temperature: floatdefcheck_battery_voltage(self) -> float:
returnself.aioread(port35) / 256 * 4.1
Python 3.7 introduced
breakpoint,
which breaks into the debugger.
x=1y=0breakpoint()
z = x/y
It’s very common to have more than one version of Python installed.
Likewise, multiple versions of the same library may be installed, overriding other versions.
For example, system Numpy may be overridden with a pip installed Numpy.
Python ≥ 3.7 gives the absolute path and filename from which the ImportError was generated.
fromnumpyimport blah
Python < 3.7:
ImportError: cannot import name ‘blah’
Python ≥ 3.7:
ImportError: cannot import name ‘blah’ from ’numpy’ (c:/Python37/Lib/site-packages/numpy/init.py)
The popular and efficient argparse module can now handle intermixed positional and optional arguments, just like the shell.
fromargparseimport ArgumentParser
p = ArgumentParser()
p.add_argument('xmlfn')
p.add_argument('--plottype')
p.add_argument('indices',nargs='*',type=int)
p = p.parse_intermixed_args() # instead of p.parse_args()print(p)
whereas if you have used p.parse_args() you would have gotten
error: unrecognized arguments: 2 3
Note: optparse was deprecated in 2011 and is no longer maintained.
Python ≥ 3.7 can do
importa.basc
instead of Python ≤ 3.6 needing
fromaimport b as c
The discussion
makes the details clear for those who are really interested in Python import behavior.
Python ≥ 3.7 disassembler
dis.dis()
can reach more deeply inside Python code, adding a depth parameter useful for recursive functions, and elements including:
list comprehension: x2 = [x**2 for x in X] (greedy eval)
generator expressions: x2 = (x**2 for x in X) (lazy eval)
Case-insensitive regex sped up by as much as 20x.
Python 3.7 added constants that allow controlling subprocess priority in Windows.
This allows keeping the main Python program at one
execution priority, while launching subprocesses at another priority.
The ability to start subprocesses without opening a new console window is enabled by
subprocess.CREATE_NO_WINDOW.
The confusingly named but important universal_newlines boolean parameter is now named text.
When text=True, stdin/stderr/stdout will emit/receive text stream instead of bytes stream.
Pytest is the de facto standard for Python unit testing and continuous integration.
To be complete in testing, one should test the interactive console scripts that for many Python programs is the main method of use.
Console script testing can be added through
Pytest Console Scripts
addon, but I usually simply use subprocess.check_call directly like Pytest Console Scripts addon does.
Note that “sys.executable” is the
recommended way
to securely get the Python executable path, to ensure testing with the same Python interpreter.
Matlab should generally be installed NOT using sudo.
Upon upgrading operating system, or if you installed Matlab on a laptop using a docking station, and then run off the docking station, Matlab may complain about a changed host ID.
If Matlab is already installed, but won’t open the desktop due to a licensing error,reactivate Matlab:
look for the WiFi link/ether hexadecimal value.
If connected to the internet via WiFi, you can confirm the correct device by comparing the value for inet or inet6 vs. https://ident.me
Install to the home directory and do NOT use sudo.
Make a directory for Matlab installs:
mkdir ~/.local/matlab
Start the Matlab install NOT as root or sudo
./install
Install to directory like “~/matlab/”
Activate via Internet and sign in to select the license key.
In GNU Radio Companion, look for the RTL-SDR Source block.
Test RTL2832 PLL Frequency range:
rtl_test -t
Output should be like:
E4000 tuner
Found 1 device(s): 0: ezcap USB 2.0 DVB-T/DAB/FM dongle
Using device 0: ezcap USB 2.0 DVB-T/DAB/FM dongle
Found Elonics E4000 tuner
Supported gain values (18): -1.0 1.5 4.0 6.5 9.0 11.5 14.0 16.5 19.0 21.5 24.0 29.0 34.0 42.0 43.0 45.0 47.0 49.0
Benchmarking E4000 PLL...
E4K PLL not locked for 53000000 Hz!
E4K PLL not locked for 2217000000 Hz!
E4K PLL not locked for 1109000000 Hz!
E4K PLL not locked for 1248000000 Hz!
E4K range: 54 to 2216 MHz
E4K L-band gap: 1109 to 1248 MHz
R820
Found 1 device(s): 0: Realtek, RTL2838UHIDIR, SN: 00000001
Using device 0: Generic RTL2832U OEM
Detached kernel driver
Found Rafael Micro R820T tuner
Supported gain values (29): 0.0 0.9 1.4 2.7 3.7 7.7 8.7 12.5 14.4 15.7 16.6 19.7 20.7 22.9 25.4 28.0 29.7 32.8 33.8 36.4 37.2 38.6 40.2 42.1 43.4 43.9 44.5 48.0 49.6
[R82XX] PLL not locked!
Sampling at 2048000 S/s.
No E4000 tuner found, aborting.
Reattached kernel driver
Record the entire passband ~ 2 MHz bandwidth, not just the demodulated audio.
Example command:
rtl_sdr ${TMPDIR}/cap.bin -s 1.8e6 -f 90.1e6
Press
Ctrlc
to stop recording after several seconds so that your hard drive doesn’t fill up.
You can read the cap.bin file in MATLAB, Python or GNU Radio.
Troubleshooting:
is RTL-SDR recognized? Before and after inserting the RTL-SDR receiver into the USB port of your Linux PC, type:
lsusb
should show Realtek device.
try a different, non-USB 3 port (USB 2).
librtlsdr0 provides file /lib/udev/rules.d/60-librtlsdr0.rules that allows the RTL-SDR stick to be recognized upon USB plugin.
dmesg should show dozens of messages with RTL2832 when the USB receiver is plugged in
Other popular programs for the RTL-SDR:
MATLAB RTL-SDR support has several examples and a free eBook. Matlab also supports USRP and PLUTO SDR hardware among others.
GNU Radio (start with GNU Radio Companion graphical SDR IDE)
apt install gnuradio
pyrtlsdr: pure Python wrapper for librtlsdr and less bulky than GNU Radio.