Parallellism

  • cython原生支援平行處理(支援OpenMP),但必須release GIL。
# 平行處理迴圈
cython.parallel.prange([start,] stop[, step][, nogil=False][, schedule=None[, chunksize=None]][, num_threads=None])
  • OpenMP automatically starts a thread pool and distributes the work according to the schedule used. step must not be 0.
  • This function can only be used with the GIL released. If nogil is true, the loop will be wrapped in a nogil section.

  • schedule參數

    • static
      • If a chunksize is provided, iterations are distributed to all threads ahead of time in blocks of the given chunksize.
      • no chunksize is given, the iteration space is divided into chunks that are approximately equal in size, and at most one chunk is assigned to each thread in advance.
    • dynamic
      • The iterations are distributed to threads as they request them, with a default chunk size of 1.
    • guided
      • As with dynamic scheduling, the iterations are distributed to threads as they request them, but with decreasing chunk size. The size of each chunk is proportional to the number of unassigned iterations divided by the number of participating threads, decreasing to 1 (or the chunksize if provided).
    • runtime
      • The schedule and chunk size are taken from the runtime scheduling variable, which can be set through the openmp.omp_set_schedule() function call, or the OMP_SCHEDULE environment variable.
cython.parallel.parallel(num_threads=None)

# Returns the id of the thread. For n threads, the ids will range from 0 to n-1.
cython.parallel.threadid()
  • This directive can be used as part of a with statement to execute code sequences in parallel.
  • This is currently useful to setup thread-local buffers used by a prange.

  • Example with thread-local buffers ```python from cython.parallel import parallel, prange from libc.stdlib cimport abort, malloc, free

cdef Py_ssize_t idx, i, n = 100 cdef int * local_buf cdef size_t size = 10

with nogil, parallel(): local_buf = malloc(sizeof(int) * size) if local_buf == NULL: abort()

# populate our local buffer in a sequential loop
for i in xrange(size):
    local_buf[i] = i * 2

# share the work using the thread-local buffer(s)
for i in prange(n, schedule='guided'):
    func(local_buf)

free(local_buf)

## 編譯

```python
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize

ext_modules = [
    Extension(
        "hello",
        ["hello.pyx"],
        # link to openmp library
        extra_compile_args=['-fopenmp'],
        extra_link_args=['-fopenmp'],
    )
]

setup(
    name='hello-parallel-world',
    ext_modules=cythonize(ext_modules),
)

使用OpenMP函數

  • 可用cimport引入openmp函數
from cython.parallel cimport parallel
cimport openmp

cdef int num_threads

openmp.omp_set_dynamic(1)
with nogil, parallel():
    num_threads = openmp.omp_get_num_threads()
    ...

results matching ""

    No results matching ""