Home
2019Jan01
Site
00 - Introduction
First steps
Installing python
Installing extra dependencies
Course objectives
Motivation (editorial)
Concurrency vs. parallelism
Threads and processes
Threads and processes in Python
Python global intepreter lock
Thread scheduling
Note that these three threads are taking turns, resulting in a computation that runs slightly slower (because of overhead) than running on a single thread
Releasing the GIL
01 - Introduction to joblib
Creating a process or thread pool with joblib
Running a threadsafe function
Setup logging so we can know what process and thread we are running
Create two functions, one to print thread and process ids, and one to run the wait_for loop
Now repeat this holding the GIL
Now repeat with processes instead of threads
Summary
02: xarray, netcdf and zarr
The current defacto standard in atmos/ocean science
Some challenges with netcdf
create an xarray
Download toy model data
Sort in numeric order
Make an xarray
Dump to a zarr file
03 - Using dask and zarr for multithreaded input/output
zarr
dask
Example, write and read zarr arrays using multiple threads
Create 230 Mbytes of fake data
Copy to a zarr file on disk, using multiple threads
Add some attributes
Create an array of zeros – note that compression shrinks it from 230 Mbytes to 321 bytes
copy input to output using chunks
Create a dask array from a zarr disk file
The following calculation uses numpy, so it releases the GIL
Note that result hasn’t been computed yet
Now do the calculation
You can evaluate your own functions on dask arrays
04 - Using numba to release the GIL
Timing python code
Define a function that does a lot of computation
now time it with pure python
Now try this with numba
Make two identical functions: one that releases and one that holds the GIL
now time wait_loop_withgil
not bad, but we’re only using one core
05 - Using conda to manage C++ libraries
Using conda to manage python libraries
Accessing the functions from python using CFFI
Running this in a threadpool
06 - Using conda to manage C++ libraries with pybind11
Page
00 - Introduction
01 - Introduction to joblib
02: xarray, netcdf and zarr
03 - Using dask and zarr for multithreaded input/output
04 - Using numba to release the GIL
05 - Using conda to manage C++ libraries
Using conda to manage python libraries
06 - Using conda to manage C++ libraries with pybind11
00 - Introduction
01 - Introduction to joblib
02: xarray, netcdf and zarr
03 - Using dask and zarr for multithreaded input/output
04 - Using numba to release the GIL
05 - Using conda to manage C++ libraries
Using conda to manage python libraries
06 - Using conda to manage C++ libraries with pybind11
Index