Numpy has many different types of data "containers": lists, dictionaries, tuples etc. However none of them allows for efficient numerical calculation, in particular not in multi-dimensional cases (think e.g. of operations on images). Numpy has been developed exactly to fill this gap. It provides a new data structure, the numpy array, and a large library of operations that allow to:
Numpy is the base of almost the entire Python scientific programming stack. Many libraries build on top of Numpy, either by providing specialized functions to operate on them (e.g. scikit-image for image processing) or by creating more complex data containers on top of it. The data science library Pandas that will also be presented in this course is a good example of this with its dataframe structures.
import numpy as np
from svg import numpy_to_svg
Let us create the simplest example of an array by transforming a regular Python list into an array (we will see more advanced ways of creating arrays in the next chapters):
mylist = [2,5,3,9,5,2]
mylist
myarray = np.array(mylist)
myarray
type(myarray)
We see that myarray
is a Numpy array thanks to the array
specification in the output. The type also says that we have a numpy ndarray (n-dimensional). At this point we don't see a big difference with regular lists, but we'll see in the following sections all the operations we can do with these objects.
We can already see a difference with two basic attributes of arrays: their type and shape.
Just like when we create regular variables in Python, arrays receive a type when created. Unlike regular list, all elements of an array always have the same type. The type of an array can be recovered through the .dtype
method:
myarray.dtype
Depending on the content of the list, the array will have different types. But the logic of "maximal complexity" is kept. For example if we mix integers and floats, we get a float array:
myarray2 = np.array([1.2, 6, 7.6, 5])
myarray2
myarray2.dtype
In general, we have the possibility to assign a type to an array. This is true here, as well as later when we'll create more complex arrays, and is done via the dtype
option:
myarray2 = np.array([1.2, 6, 7.6, 500], dtype=np.uint8)
myarray2
The type of the array can also be changed after creation using the .astype()
method:
myfloat_array = np.array([1.2, 6, 7.6, 500], dtype=np.float)
myfloat_array.dtype
myint_array = myfloat_array.astype(np.int8)
myint_array.dtype
A very important property of an array is its shape or in other words the dimensions of each axis. That property can be accessed via the .shape
property:
myarray
myarray.shape
We see that our simple array has only one dimension of length 6. Now of course we can create more complex arrays. Let's create for example a list of two lists:
my2d_list = [[1,2,3], [4,5,6]]
my2d_array = np.array(my2d_list)
my2d_array
my2d_array.shape
We see now that the shape of this array is two-dimensional. We also see that we have 2 lists of 3 elements. In fact at this point we should forget that we have a list of lists and simply consider this object as a matrix with two rows and three columns. We'll use the follwing graphical representation to clarify some concepts:
numpy_to_svg(my2d_array)
We have seen that we can turn regular lists into arrays. However this becomes quickly impractical for larger arrays. Numpy offers several functions to create particular arrays.
For example an array full of zeros or ones:
one_array = np.ones((2,3))
one_array
zero_array = np.zeros((2,3))
zero_array
One can also create diagonal matrix:
np.eye(3)
By default Numpy creates float arrays:
one_array.dtype
However as mentioned before, one can impose a type usine the dtype
option:
one_array_int = np.ones((2,3), dtype=np.int8)
one_array_int
one_array_int.dtype
Often one needs to create arrays of same shape. This can be done with "like-functions":
same_shape_array = np.zeros_like(one_array)
same_shape_array
one_array.shape
same_shape_array.shape
np.ones_like(one_array)
We are not limited to create arrays containing ones or zeros. Very common operations involve e.g. the creation of arrays containing regularly arrange numbers. For example a "from-to-by-step" list:
np.arange(0, 10, 2)
Or equidistant numbers between boundaries:
np.linspace(0,1, 10)
Numpy offers in particular a random
submodules that allows one to create arrays containing values from a wide array of distributions. For example, normally distributed:
normal_array = np.random.normal(loc=10, scale=2, size=(3,4))
normal_array
np.random.poisson(lam=5, size=(3,4))
Until now we have almost only dealt with 1D or 2D arrays that look like a simple grid:
myarray = np.ones((5,10))
numpy_to_svg(myarray)
We are not limited to create 1 or 2 dimensional arrays. We can basically create any-dimension array. For example in microscopy, images can be volumetric and thus they are 3D arrays in Numpy. For example if we acquired 5 planes of a 10px by 10px image, we would have something like:
array3D = np.ones((10,10,5))
numpy_to_svg(array3D)
All the functions and properties that we have seen until now are N-dimensional, i.e. they work in the same way irrespective of the array size.
We have seen until now multiple ways to create arrays. However, most of the time, you will import data from some source, either directly as arrays or as lists, and use these data in your analysis.
Numpy can efficiently save and load arrays in its own format .npy
. Let's create an array and save it:
array_to_save = np.random.normal(10, 2, (4,5))
array_to_save
np.save('my_saved_array.npy', array_to_save)
ls
Now that this array is saved on disk, we can load it again using np.load
:
new_array = np.load('my_saved_array.npy')
new_array
If you have several arrays that belong together, you can also save them in a single file using np.savez
in npz
format. Let's create a second array:
array_to_save2 = np.random.normal(10, 2, (1,2))
array_to_save2
np.savez('multiple_arrays.npz', array_to_save=array_to_save, array_to_save2=array_to_save2)
ls
And when we load it again:
load_multiple = np.load('multiple_arrays.npz')
type(load_multiple)
We get here an NpzFile
object from which we can read our data. Note that when we load an npz
file, it is only loaded lazily, i.e. data are not actually read, but the content is parsed. This is very useful if you need to store large amounts of data but don't always need to re-load all of them. We can use methods to actually access the data:
load_multiple.files
load_multiple.get('array_to_save2')
Images are a typical example of data that are array-like (matrix of pixels) and that can be imported directly as arrays. Of course, each domain will have it's own importing libraries. For example in the area of imaging, the scikit-image package is one of the main libraries, and it offers and importer of images as arrays which works both with local files and web addresses:
import skimage.io
image = skimage.io.imread('https://upload.wikimedia.org/wikipedia/commons/f/fd/%27%C3%9Cbermut_Exub%C3%A9rance%27_by_Paul_Klee%2C_1939.jpg')
We can briefly explore that image:
type(image)
image.dtype
image.shape
We see that we have an array of integeres with 3 dimensions. Since we imported a jpg image, we know that the thrid dimension corresponds to three color channels Red, Green, Blue (RGB).
You can also read regular CSV files directly as Numpy arrays. This is more commonly done using Pandas, so we don't spend much time on this, but here is an example on importing data from the web:
oilprice = np.loadtxt('https://raw.githubusercontent.com/guiwitz/Rdatasets/master/csv/quantreg/gasprice.csv',
delimiter=',', usecols=range(2,3), skiprows=1)
oilprice