CuPy

CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them. CuPy shares the same API set as NumPy and SciPy, allowing it to be a drop-in replacement to run NumPy/SciPy code on GPU. CuPy supports Nvidia CUDA GPU platform, and AMD ROCm GPU platform starting in v9.0.

CuPy has been initially developed as a backend of Chainer deep learning framework, and later established as an independent project in 2017.

CuPy is a part of the NumPy ecosystem array libraries and is widely adopted to utilize GPU with Python, especially in high-performance computing environments such as Summit, Perlmutter, EULER, and ABCI.

CuPy is a NumFOCUS affiliated project.

Features
CuPy implements NumPy/SciPy-compatible APIs, as well as features to write user-defined GPU kernels or access low-level APIs.

NumPy-compatible APIs
The same set of APIs defined in the NumPy package (numpy.*) are available under cupy.* package.


 * Multi-dimensional array (cupy.ndarray) for boolean, integer, float, and complex data types
 * Module-level functions
 * Linear algebra functions
 * Fast Fourier transform
 * Random number generator

SciPy-compatible APIs
The same set of APIs defined in the SciPy package (scipy.*) are available under cupyx.scipy.* package.


 * Sparse matrices (cupyx.scipy.sparse.*_matrix) of CSR, COO, CSC, and DIA format
 * Discrete Fourier transform
 * Advanced linear algebra
 * Multidimensional image processing
 * Sparse linear algebra
 * Special functions
 * Signal processing
 * Statistical functions

User-defined GPU kernels

 * Kernel templates for element-wise and reduction operations
 * Raw kernel (CUDA C/C++)
 * Just-in-time transpiler (JIT)
 * Kernel fusion

Distributed computing

 * Distributed communication package (cupyx.distributed), providing collective and peer-to-peer primitives

Low-level CUDA features

 * Stream and event
 * Memory pool
 * Profiler
 * Host API binding
 * CUDA Python support

Interoperability

 * DLPack
 * CUDA Array Interface
 * NEP 13 (__array_ufunc__)
 * NEP 18 (__array_function__)
 * Array API Standard

Applications

 * spaCy
 * XGBoost
 * turboSETI (Berkeley SETI)
 * NVIDIA RAPIDS
 * einops
 * Chainer