Table of Contents

  • 1  04 - Using numba to release the GIL

      • 1.0.1  Timing python code

      • 1.0.2  Now try this with numba

      • 1.0.3  Make two identical functions: one that releases and one that holds the GIL

      • 1.0.4  now time wait_loop_withgil

      • 1.0.5  not bad, but we’re only using one core

pip install contexttimer
conda install numba
conda install joblib
[1]:
from IPython.display import Image
import contexttimer
import time
import math
from numba import jit
from joblib import Parallel
import logging

04 - Using numba to release the GIL

Timing python code

One easy way to tell whether you are utilizing multiple cores is to track the wall clock time measured by time.perf_counter against the total cpu time used by all threads meausred with time.process_time

I’ll organize these two timers using the contexttimer module.

To install, in a shell window type:

pip install contexttimer

Define a function that does a lot of computation

[2]:
def wait_loop(n):
    """
    Function under test.
    """
    for m in range(n):
        for l in range(m):
            for j in range(l):
                for i in range(j):
                    i=i+4
                    out=math.sqrt(i)
                    out=out**2.
    return out

now time it with pure python

[3]:
nloops=200
with contexttimer.Timer(time.perf_counter) as pure_wall:
    with contexttimer.Timer(time.process_time) as pure_cpu:
        result=wait_loop(nloops)
print(f'pure python wall time {pure_wall.elapsed} and cpu time {pure_cpu.elapsed}')
pure python wall time 12.900637587998062 and cpu time 12.683904

Now try this with numba

Numba is a just in time compiler that can turn a subset of python into machine code using the llvm compiler.

Reference: Numba documentation

Make two identical functions: one that releases and one that holds the GIL

[4]:
@jit('float64(int64)', nopython=True, nogil=True)
def wait_loop_nogil(n):
    """
    Function under test.
    """
    for m in range(n):
        for l in range(m):
            for j in range(l):
                for i in range(j):
                    i=i+4
                    out=math.sqrt(i)
                    out=out**2.
    return out
[5]:
@jit('float64(int64)', nopython=True, nogil=False)
def wait_loop_withgil(n):
    """
    Function under test.
    """
    for m in range(n):
        for l in range(m):
            for j in range(l):
                for i in range(j):
                    i=i+4
                    out=math.sqrt(i)
                    out=out**2.
    return out

now time wait_loop_withgil

[6]:
nloops=500
with contexttimer.Timer(time.perf_counter) as numba_wall:
    with contexttimer.Timer(time.process_time) as numba_cpu:
        result=wait_loop_withgil(nloops)
print(f'numba wall time {numba_wall.elapsed} and cpu time {numba_cpu.elapsed}')
print(f"numba speed-up factor {(pure_wall.elapsed - numba_wall.elapsed)/numba_wall.elapsed}")
numba wall time 0.05427086600684561 and cpu time 0.051916000000000295
numba speed-up factor 236.70834219543877

not bad, but we’re only using one core