numpy
- numpy在官網的tutorial內容已經相當完整,因此本篇重點在於常用的細節與技巧。
numpy的基本資料結構,也是最重要的資料結構是numpy.ndarray(假設為arr),雖然看起來很像列向量(列矩陣),但是其四則運算都是element-wise,而非矩陣的運算。其中重要的屬性有
- ndarray.ndim: 陣列的維度
- ndarray.shape: 陣列的形狀
- ndarray.size: 陣列的元素個數,等於陣列形狀元素值的乘積
- ndarray.dtype: 陣列元素的形態,可用形態詳見data type
- ndarray.itemsize: 陣列每一個元素所使用的記憶體空間(bytes)
- ndarray.data: 陣列所使用的buffer
檢查BLAS/LAPACK linkage
import numpy as np
np.__config__.show()
# or
np.show_config()
nosetests
- 安裝numpy與scipy後,先進入python shell輸入以下指令做測試。
import numpy
numpy.test('full')
import scipy
scipy.test('full')
view
- view()函式的功能是將array中元素在記憶體中用不同的方式解讀,而不是拷貝生成新資料後再解讀。
x = np.array([(1, 2)], dtype=[('a', np.int8), ('b', np.int8)])
# 0000000100000010
y = x.view(dtype=np.int16, type=np.matrix) # matrix([[513]], dtype=int16)
- 高維陣列的轉置也是傳回view
x = np.array([1,2,3],[4,5,6])
y = x.T
四則運算
import numpy as np
arr =np.arange(4) # array([0, 1, 2, 3])
arr += 1 # array([1, 2, 3, 4]), +=是inplace operator
arr -= 2 # array([-1, 0, 1, 2])
arr *= 2 # array([-2, 0, 2, 4]
arr /= 2 # array([-1, 0, 1, 2])
矩陣運算
- 兩個向量的內積要使用dot()函式
import numpy as np
x = np.array([1,2])
y = np.array([3,4])
# inner product
print (x.dot(y)) # 11 = 1*3 + 2*4
print (np.dot(x,y)) # 11
- 向量對矩陣的乘法也是用dot()函式,注意得到的結果不是column vector,而是一維的陣列,很容易犯錯!!
import numpy as np
x = np.array([1,2])
Y = np.array([[1,2],[3,4],[5,6]])
# Yx: (3,2)*(2,1)=(3,1)
print (Y.dot(x)) # array([ 5, 11, 17])
print (np.dot(Y,x)) # array([ 5, 11, 17])
broadcast
* broadcast允函式以有意義的方式處理具有不完全相同形狀的輸入。
- Two dimensions are compatible when
- they are equal, or
- one of them is 1
# broadcast example
A (2d array): 5 x 4
B (1d array): 1
Result (2d array): 5 x 4
A (2d array): 5 x 4
B (1d array): 4
Result (2d array): 5 x 4
A (3d array): 15 x 3 x 5
B (3d array): 15 x 1 x 5
Result (3d array): 15 x 3 x 5
A (3d array): 15 x 3 x 5
B (2d array): 3 x 5
Result (3d array): 15 x 3 x 5
A (3d array): 15 x 3 x 5
B (2d array): 3 x 1
Result (3d array): 15 x 3 x 5
# cannot broadcast example
A (1d array): 3
B (1d array): 4 # trailing dimensions do not match
A (2d array): 2 x 1
B (3d array): 8 x 4 x 3 # second from last dimensions mismatched
- 一般的一維陣列對二維陣列(矩陣)的broadcast計算是by row,而且row向量維度必須相同,如下。
import numpy as np
x = np.arange(1,3) # array([1, 2])
Y = np.array([[1,2],[3,4],[5,6]])
# 一維陣列對二維陣列的加、減、乘、除,會broadcast by row
print (x+Y) #array([[2, 4], [4, 6], [6, 8]])
print (x-Y) array([[ 0, 0], [-2, -2], [-4, -4]])
# row向量不同,不可計算
z = np.array([1,2,3])
print (z-Y) #ValueError
broadcast by column
- 如果希望維陣列對二維陣列(矩陣)的broadcast計算是by column,則column向量維度必須相同,如下。
import numpy as np
z = np.array([1,2,3])
Y = np.array([[1,2],[3,4],[5,6]])
# 先把z轉成column vector再brocadcast
print (z[:,np.newaxis]*Y) # array([[ 1, 2], [ 6, 8], [15, 18]])
Structured arrays
- 可將index視為row id,而每個struct的欄位視為column id取值
# x為structed array,每個元素有三個值,
# 第一個值名為foo,type為4-byte integer,
# 第二個值名為bar, type為4-byte float,
# 第三個值名為baz,type為10-byte or less byte string
x = np.array([(1,2.,'Hello'), (2,3.,"World")],
... dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])
x[1] # (2,3.,"World")
x['bar'] # array([ 2., 3.], dtype=float32)
exception
預設行為:
- 'warn' for invalid, divide, and overflow
- 'ignore' for underflow.
預設等級:
- ‘ignore’ : Take no action when the exception occurs.
- ‘warn’ : Print a RuntimeWarning (via the Python warnings module).
- ‘raise’ : Raise a FloatingPointError.
- ‘call’ : Call a function specified using the seterrcall function.
- ‘print’ : Print a warning directly to stdout.
- ‘log’ : Record error in a Log object specified by seterrcall.
>>> oldsettings = np.seterr(all='warn')
>>> np.zeros(5,dtype=np.float32)/0.
invalid value encountered in divide
>>> j = np.seterr(under='ignore')
>>> np.array([1.e-100])**10
>>> j = np.seterr(invalid='raise')
>>> np.sqrt(np.array([-1.]))
FloatingPointError: invalid value encountered in sqrt
>>> def errorhandler(errstr, errflag):
... print("saw stupid error!")
>>> np.seterrcall(errorhandler)
<function err_handler at 0x...>
>>> j = np.seterr(all='call')
>>> np.zeros(5, dtype=np.int32)/0
FloatingPointError: invalid value encountered in divide
saw stupid error!
>>> j = np.seterr(**oldsettings) # restore previous
... # error-handling settings
展開多維array
可使用ravel或是flatten函式
兩個函式的差異在:
- ravel傳回array的view,修正傳回值會影響到原本的陣列。
- flatten傳回array的copy,因此修改傳回值不會影響到原本的陣列。
- What is the difference between flatten and ravel in numpy?
import numpy as np
Y = np.arange(1,7).reshape(3,2) #array([[1, 2], [3, 4], [5, 6]])
zf = Y.flatten() # array([1, 2, 3, 4, 5, 6]
zf[3]= 99 # zf: array([ 1, 2, 3, 99, 5, 6])
# Y: array([[1, 2], [3, 4], [5, 6]])不受影響
zv = Y.ravel() # array([1, 2, 3, 4, 5, 6]
zv[3] = 99 # zv: array([ 1, 2, 3, 99, 5, 6])
# Y: array([[ 1, 2], [ 3, 99], [ 5, 6]])被影響
masked array
- masked array為含有missing或invalid資料的陣列。
- numpy.ma提供了處理masked array的模組
datetime與timedelta(after numpy 1.7)
- 詳見Offical docs
- data type: “datetime64”
- 建立datetime物件的最基本方法是使用ISO 8601字串或是datetime格式。
- date unit: years (‘Y’), months (‘M’), weeks (‘W’), and days (‘D’)
- time unit: hours (‘h’), minutes (‘m’), seconds (‘s’), milliseconds (‘ms’)
import numpy as np
# A simple ISO date
np.datetime64('2005-02-25') # numpy.datetime64('2005-02-25')
# Using months for the unit
np.datetime64('2005-02') # numpy.datetime64('2005-02')
# Specifying just the month, but forcing a ‘days’ unit:
np.datetime64('2005-02', 'D') # numpy.datetime64('2005-02-01')
# From a date and time
np.datetime64('2005-02-25T03:30') # numpy.datetime64('2005-02-25T03:30')
# array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64[D]')
np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
# array(['2001-01-01T12:00:00.000-0600', '2002-02-03T13:56:03.172-0600'], dtype='datetime64[ms]')
np.array(['2001-01-01T12:00', '2002-02-03T13:56:03.172'], dtype='datetime64')
np.arange('2005-02', '2005-03', dtype='datetime64[D]')
# array(['2005-02-01', '2005-02-02', '2005-02-03', '2005-02-04',
'2005-02-05', '2005-02-06', '2005-02-07', '2005-02-08',
'2005-02-09', '2005-02-10', '2005-02-11', '2005-02-12',
'2005-02-13', '2005-02-14', '2005-02-15', '2005-02-16',
'2005-02-17', '2005-02-18', '2005-02-19', '2005-02-20',
'2005-02-21', '2005-02-22', '2005-02-23', '2005-02-24',
'2005-02-25', '2005-02-26', '2005-02-27', '2005-02-28'],
dtype='datetime64[D]')
- datetime的運算
np.datetime64('2009-01-01') - np.datetime64('2008-01-01')
# numpy.timedelta64(366,'D')
np.datetime64('2009') + np.timedelta64(20, 'D')
# numpy.datetime64('2009-01-21')
np.datetime64('2011-06-15T00:00') + np.timedelta64(12, 'h')
# numpy.datetime64('2011-06-15T12:00-0500')