NumPy, short for Numerical Python, has such advantages:

Fast vectorized array operations for data munging and cleaning, subsetting and filtering, transformation, and any other kinds of computations
Common array algorithms like sorting, unique, and set operations
Efficient descriptive statistics and aggregating/summarizing data
Data alignment and relational data manipulations for merging and joining together heterogeneous datasets
Expressing conditional logic as array expressions instead of loops with if-elif-else branches
Group-wise data manipulations (aggregation, transformation, function application)

Pure Python leaves many details to runtime environment:

specifying variable types
memory allocation/deallocation, etc

NumPy is fast.

NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy’s library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.
NumPy operations perform complex computations on entire arrays without the need for Python for loops.

import numpy as np

# speed test for numpy
my_arr = np.arange(1000_000)
my_list = list(range(1000_000))

%%time
my_arr2 = my_arr * 2

Wall time: 2.99 ms

%%time
my_list2 = [x * 2 for x in my_list]

Wall time: 114 ms

1 ndarray

An ndarray (N dimensional array) is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type (free of type checking). Every array has a shape, a tuple indicating the size of each dimension; a dtype, an object describing the data type of the array; a ndim, an integer indicating the dimension of the ndarray.

data = np.random.randn(3, 4)
data

array([[ 0.59905431, -1.08465416, -0.95319914,  1.93616798],
       [ 0.92951762,  0.81376066,  0.87067676, -0.05844408],
       [ 0.61867604, -0.78530194, -0.93464763,  0.74309266]])

data.shape

(3, 4)

data.dtype

dtype('float64')

data.ndim

1.1 Creating ndarrays

1.1.1 `array` function.

array function accepts any sequence-like object (list, list of lists, other arrays, etc) and produces a new NumPy array containing the passed data. Unless explicitly specified, np.array tries to infer a good data type for the array that it creates. The data type is stored in a special dtype metadata object.

# passing list
arr1 = np.array([1, 2, 3, 4])
print(arr1)
print(arr1.shape)
print(arr1.dtype)
print(arr1.ndim)

[1 2 3 4]
(4,)
int32
1

# passing list of lists
arr2 = np.array([[1, 2, 3],
                 [4, 5, 6]])
print(arr2)
print(arr2.shape)
print(arr2.dtype)
print(arr2.ndim)

[[1 2 3]
 [4 5 6]]
(2, 3)
int32
2

# passing array
arr3 = np.array(arr2)
print(arr3)
print(arr3.shape)
print(arr3.dtype)
print(arr3.ndim)

[[1 2 3]
 [4 5 6]]
(2, 3)
int32
2

1.1.2 `zeros`,`ones`,`empty`,`arange` etc

Function	Description
array	Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype; copies the input data by default
asarray	Convert input to ndarray, but do not copy if the input is already an ndarray
arange	Like the built-in range but returns an ndarray instead of a list
ones	Produce an array of all 1s with the given shape and dtype
ones_like	Takes another array and produces a ones array of the same shape and dtype
zeros	Like ones but producing arrays of 0s instead
zeros_like	Like ones_like but producing arrays of 0s instead
empty	Create new arrays by allocating new memory, but do not populate with any values
empty_like	Like ones_like but do not populate with any values
full	Produce an array of the given shape and dtype with all values set to the indicated “fill value”
full_like	full_like takes another array and produces a filled array of the same shape and dtype
eye, identity	Create a square NxN identity matrix

# np.zeros
np.zeros(4)

array([0., 0., 0., 0.])

# np.ones
np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

# np.empty return uninitialized "garbage" values, which can later be populated with data
np.empty((2, 3, 2))

array([[[0.59905431, 1.08465416],
        [0.95319914, 1.93616798],
        [0.92951762, 0.81376066]],

       [[0.87067676, 0.05844408],
        [0.61867604, 0.78530194],
        [0.93464763, 0.74309266]]])

# arange ia an array-valued version of the build-in Python range function
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

1.2 Data Types for ndarrays

The data type or dtype is a special object containing the information (or metadata, data about data) the ndarray needs to interpret a chunk of memory as a particular type of data. In most cases dtype provide a mapping directly onto an underlying disk or memory representation, which makes it easy to read and write binary streams of data to disk and also to connect to code written in a low-level language like C or Fortran.

1.3 Arithmetic with NumPy Arrays

Vectorization - Any arithmetic operations between equal-size arrays applies the operation element-wise.

arr = np.random.randn(3, 4)
arr

array([[ 0.07338553,  0.8116425 , -2.17800941, -0.32029061],
       [ 0.41584223, -1.32481565, -1.3783163 ,  0.26723131],
       [-0.31385858,  1.30899248,  1.13523462, -0.68452327]])

arr * 2

array([[ 0.14677106,  1.62328501, -4.35601882, -0.64058122],
       [ 0.83168445, -2.64963129, -2.75663261,  0.53446262],
       [-0.62771717,  2.61798496,  2.27046924, -1.36904654]])

arr - arr

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

1 / arr

array([[13.62666475,  1.23206953, -0.45913484, -3.1221646 ],
       [ 2.4047582 , -0.754822  , -0.72552287,  3.74207649],
       [-3.18614831,  0.76394633,  0.88087518, -1.46087072]])

Broadcasting - Operations between differently sized arrays.

1.4 Basic Indexing and Slicing

One-dimensional array indexing and slicing act similarly to Python lists.

# indexing
arr = np.arange(10)
print(arr)
print(arr[0])

[0 1 2 3 4 5 6 7 8 9]
0

# slicing
print(arr[1:4])

[1 2 3]

Array slices are views on the original array, which means any modification to the view will be reflected in the source array. This design intends to obtain high performance and save memory.

arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arr[1:5] = 10
arr

array([ 0, 10, 10, 10, 10,  5,  6,  7,  8,  9])

For higher dimensional arrays, we can access every individual element recursively. First, indexing moves along axis 0 as the “rows” of the array and then axis 1 as the “columns”.

arr = np.arange(9).reshape(3,3)
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

# indexing along axis 0
arr[1]

array([3, 4, 5])

# recursively indexing
arr[1][0]

# easy and equivalent way
arr[1, 0]

# indexing with slices, slice along axis 0
arr[:2]

array([[0, 1, 2],
       [3, 4, 5]])

# recursively indexing with slices
arr[:2, :2]

array([[0, 1],
       [3, 4]])

1.5 Boolean Indexing

Selecting data from an array by boolean indexing always creates a copy of the data.

arr = np.array([["Bob", 1, 2, 3],
                ["Luffy", 2, 3, 4],
                ["Joe", 6, 7, 8]])
arr

array([['Bob', '1', '2', '3'],
       ['Luffy', '2', '3', '4'],
       ['Joe', '6', '7', '8']], dtype='<U11')

names = arr[:, 0]
names

array(['Bob', 'Luffy', 'Joe'], dtype='<U11')

luffy_selected = (names == "Luffy")
luffy_selected

array([False,  True, False])

# boolean indexing slice along axis 0, select 'true' rows
arr[luffy_selected]

array([['Luffy', '2', '3', '4']], dtype='<U11')

# select everything except luffy, use != or negate the condition using ~
arr[names != "Luffy"]

array([['Bob', '1', '2', '3'],
       ['Joe', '6', '7', '8']], dtype='<U11')

arr[~luffy_selected]

array([['Bob', '1', '2', '3'],
       ['Joe', '6', '7', '8']], dtype='<U11')

# select two of the three names to combine multiple boolean conditions, use boolean arithmetic operators like & and |
mask = (names=="Bob")|(names=="Joe")
mask

array([ True, False,  True])

arr[mask]

array([['Bob', '1', '2', '3'],
       ['Joe', '6', '7', '8']], dtype='<U11')

1.6 Fancy Indexing

Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.
The result of fancy indexing is always one-dimensional.
Fancy indexing always copies the data into a new array.

# create a 8x4 array
arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

# fancy indexing by passing a list
arr[[4, 3, 5, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.]])

# passing multiple index arrays selects a one-dimensional array of elements corresponding to each tuple of indices
# select (1,0), (5,3), (7,1), (2,2)
arr = np.arange(32).reshape(8,4)
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

# trying to select a rectangular region
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

1.7 Transposing Arrays and Swapping Axes

Transposing is a special form of reshaping that similiarly returns a view on the underlying data without copying anything. Arrays have the transpose method and also the special T attribute.

arr = np.arange(6).reshape(2,3)
arr

array([[0, 1, 2],
       [3, 4, 5]])

arr.T

array([[0, 3],
       [1, 4],
       [2, 5]])

2 Universal Functions

ufunc, short for universal function, is a function that performs element-wise operations on data in ndarrays.

# unary ufuncs
arr = np.arange(10)
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

# binary ufuncs
x = np.random.randn(8)
y = np.random.randn(8)
np.maximum(x, y)

array([ 0.7741251 , -0.13054798,  0.92974564, -0.83160733,  0.90103776,
        0.68551387, -0.34032336, -0.08283191])

3 Array-Oriented Programming with Arrays

points = np.arange(-5, 5, 0.01)
xs, ys = np.meshgrid(points, points)

# think of xs as points on the x axis, ys as points on the y axis 
xs

array([[-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       ...,
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99]])

ys

array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
       [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
       [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
       ...,
       [ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
       [ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
       [ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]])

z = np.sqrt(xs **2 + ys**2)

3.1 Expressing Conditional Logic as Array Operations

# list comprehension edition for conditional logic
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

result = [(x if c else y) for x,y,c in zip(xarr, yarr, cond)]
result

[1.1, 2.2, 1.3, 1.4, 2.5]

# np.where edition for conditional logic
result = np.where(cond, xarr, yarr)
result

array([1.1, 2.2, 1.3, 1.4, 2.5])

3.2 Mathematical and Statiscal Methods

A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible as methods of the array class and the top-level NumPy function.

arr = np.random.randn(5, 4)
arr

array([[-0.32270463, -2.47923282,  0.51142065,  1.64402202],
       [-1.11875424, -0.50816377, -0.24412379, -0.35906071],
       [-0.80848479, -1.5290442 ,  0.33861759, -1.84812779],
       [-0.38523178, -1.14234316,  1.07015372, -0.7025341 ],
       [-0.742273  ,  0.62327938, -0.24617117, -0.87927529]])

# find the mean of an array
arr.mean()

-0.45640159378569933

# using the top-level NumPy function to find the mean of an array
np.mean(arr)

-0.45640159378569933

# computing the mean over the axis 0
arr.mean(axis=0)

array([-0.67548969, -1.00710091,  0.2859794 , -0.42899517])

np.mean(arr, axis=0)

array([-0.67548969, -1.00710091,  0.2859794 , -0.42899517])

3.3 Methods for Boolean Arrays

Boolean values are coerced to 1 (True) and 0 (False) in the preceding methods. Thus, sum is often used as a means of counting True values in a boolean array.

arr = np.random.randn(100)
(arr > 0).sum() # Number of positive values

There are two additional methods, any and all, useful especially for boolean arrays. any tests whether one or more values in an array is True, while all checks if every value is True.

bools = np.array([False, False, True, False])

bools.any()

True

bools.all()

False

3.4 Sorting

NumPy arrays can be sorted in-place with the sort method.

arr = np.random.randn(6)
arr

array([-0.64050099,  0.37239892,  0.48466042,  0.0832035 , -0.24079602,
       -0.62832189])

arr.sort()

arr

array([-0.64050099, -0.62832189, -0.24079602,  0.0832035 ,  0.37239892,
        0.48466042])

We can sort each one-dimensional section of values in a multidimensional array in-place along an axis by passing the axis number to sort.

arr = np.random.randn(5, 3)
arr

array([[ 0.22066993,  1.28272713, -2.80933259],
       [ 1.24150303,  0.6821006 ,  0.21857812],
       [ 0.38492004,  2.30910114,  0.354785  ],
       [-0.99229831, -0.81723761,  0.19111813],
       [ 0.46279363,  0.11871894,  0.7152068 ]])

arr.sort(1)

arr

array([[-2.80933259,  0.22066993,  1.28272713],
       [ 0.21857812,  0.6821006 ,  1.24150303],
       [ 0.354785  ,  0.38492004,  2.30910114],
       [-0.99229831, -0.81723761,  0.19111813],
       [ 0.11871894,  0.46279363,  0.7152068 ]])

The top-level method np.sort returns a sorted copy of an array instead of modifying the array in-place. A quick-and-dirty way to compute the quantiles of an array is to sort it and select the value at a particular rank.

large_arr = np.random.randn(1000)
large_arr.sort()
large_arr[int(0.05 * len(large_arr))] # 5% quantile

-1.5942713719535697

3.5 Unique and Other Set Logic

# np.unique returns the sorted unique values in an array
numbers = np.array([1, 2, 3, 3, 4, 4, 6])
np.unique(numbers)

array([1, 2, 3, 4, 6])

# pure python edition
sorted(set(numbers))

[1, 2, 3, 4, 6]