Code profiling and micro-benchmarks can be useful to discover hot spots in code that might be targets for improving efficiency and overall execution time.
Matlab has long had an effective
profiler
built in from the factory.
Commercial and free software often prioritized quality and correctness over speed as part of a good quality software product.
There are cases where the Matlab factory functions are simply not optimized due to development time constraints.
Yair Altman has expertise with the undocumented scripts inside Matlab that many Matlab users rely on. He has written a three-part series on profiling and optimizing Matlab factory code, which I find to be worthwhile reading even for general Matlab code optimization.
Homebrew
is a popular framework for quickly installing development tools on macOS, including Gfortran.
macOS cloud services such as MacInCloud may say they provide Homebrew, but you may not be able to install packages without a sudo/admin account.
To install Homebrew without sudo/admin in the user home directory for cloud or physical hardware, follow these steps:
If one truly doesn’t have sudo / admin access as typical with a managed (less-expensive) cloud macOS plan, and if Xcode is not the appropriate version, GCC may compile from source, which can take tens of minutes.
This may occur when doing
Running compiled M-scripts using Matlab Compiler
deploytool
may result in warning messages about tbb.dll.
Updating the Threaded Building Blocks library may help.
Also consider updating the Matlab version if applicable.
Download and install latest
Intel oneTBB.
If using a command prompt, make oneTBB available by command
tbbvars.
Recompile with Matlab deploytool.
Where “files” is set appropriately for the project.
Making a per-project files is strongly recommended to ensure files aren’t missed in the type check.
One can make a system-wide ~/.mypy.ini, that is overridden by the per-project pyproject.toml.
We often use executables from Python with data transfer via:
stdin/stdout (small transfers, less than a megabyte)
temporary files (arbitrarily large data)
This provides a language-agnostic interface that we can use from other scripted languages like Matlab or Julia, future-proofing efforts at the price of some runtime efficiency due to the out-of-core data transfer.
Here is a snipping we use to compile a single C code executable from Python (from GeoRINEX program):
"""
save this in say src/mypkg/build.py
then from the code that needs the output executable, say "myprog.bin":
from .build import build
exe = "myprog.bin"
...
if not exe.is_file():
build("src/myprog.c")
# code that passes data via stdin/stdout and/or files using subprocess.run()
"""importsubprocessimportshutilfrompathlibimport Path
defbuild(src: Path, cc: str = None) -> int:
"""
Parameters
----------
src: pathlib.Path
path to single C source file.
cc: str, optional
desired compiler path or name
Returns
-------
ret: int
return code from compiler (0 is success)
"""if cc:
return do_compile(cc, src)
compilers = ["cc", "gcc", "clang", "icx", "clang-cl"]
ret = 1for cc in compilers:
if shutil.which(cc):
ret = do_compile(cc, src)
if ret == 0:
breakreturn ret
defdo_compile(cc: str, src: Path) -> int:
ifnot src.is_file():
raiseFileNotFoundError(src)
if cc.endswith("cl"): # msvc-like cmd = [cc, str(src), f"/Fe:{src.parent}"]
else:
cmd = [cc, str(src), "-O2", f"-o{src.with_suffix('.bin')}"]
ret = subprocess.run(cmd).returncode
return ret
LTE smartwatches may get up to 90% of the communications range of a smartphone.
Most providers have turned off (or are turning off) 2G and 3G so coverage may be dynamic.
Generally Bluetooth headsets can be used with LTE smartwatches, which helps call quality for any phone device.
Mobile devices including smartwatches may switch frequency bands when going from idle to phone call or data usage:
E
2G EDGE, the oldest digital network mode still in use, very slow.
H
3G HSPA/HSPA+, good enough for basic web browsing and email.
4G
really good 3G. Carriers may call their upgraded 3G networks 4G.
LTE
actually using 4G LTE.
5G
not necessarily faster than LTE when in NSA (non-standalone mode), but can be much faster in SA (standalone mode).
The signal bars may jump up/down a few notches when going from idle to active due to the phone band switching e.g. 700 MHz vs. 1900 MHz.
Apps like Network Cell Info can help reveal these behaviors.
For continuous integration, it’s important to test the traditional package install
pip install .
along with the more commonly used in situ pip development mode
pip install -e .
Otherwise, the Python package install may depend on files not included in the
MANIFEST.in file
and fail for most end users who don’t use “pip install -e” option.
A particular failure this will catch on Windows CI is graft path/to/
where the trailing / will fail on Windows only.
Python psutil allows accessing numerous aspects of system parameters, including CPU count.
We recommend using a recent version of PSutil to cover more computing platforms.
Ncpu = psutil.cpu_count(logical=False)
usually gives the physical CPU count.
PSutil uses Python script and compiled C code to determine CPU count–it’s not just a simple Python script.
CMake (via CTest) can
run tests in parallel.
Some tests need to be run not in parallel, for example tests using MPI that use lots of CPU cores, or tests that use a lot of RAM, or tests that must access a common file or hardware device.
We have found that using the
RUN_SERIAL
makes whole groups of tests run sequentially instead of individually running sequentially when fixtures are used.
That is, all the
FIXTURES_SETUP
run, then all
FIXTURES_REQUIRED
that have RUN_SERIAL.
This is not necessarily desired, because we had consuming fixtures that didn’t have to wait for all the fixtures to be setup.
We found that using
RESOURCE_LOCK
did not suffer from this issue, and allows the proper test dependencies and the expected parallelism.
CMake Resource Groups are orthogonal to Resource Locks, and are much more complicated to use.
There may be some systems that would benefit from Groups, but many can just use the simple Locks.
For simplicity, this example omits the necessary add_test() and just show the properties.
The test has an MPI-using quick setup “Quick1” and then a long test “Long1” also using MPI.
Finally, we have a quick Python script “Script1” checking the output.
In the real setup, we have Quick1, Quick2, … QuickN and so on.
When we used RUN_SERIAL, we had to wait for ALL Quick* before Long* would start.
With RESOURCE_LOCK the tests intermingle, making better use of CPU particularly on large CPU count systems, and with lots of tests.
The name “cpu_mpi” is arbitrary like the other names.
Matlab
system()
lacks features needed for blackbox interfacing with executables, including lack of stdin pipe.
matlab-stdlib subprocess_run()
can exchange data in stdin, stdout, stderr pipes, cwd, environment variables, and more using Java
ProcessBuilder.
Matlab can also call Python
subprocess.