numpy

numpy在官網的tutorial內容已經相當完整，因此本篇重點在於常用的細節與技巧。
Numpy basic
numpy的基本資料結構，也是最重要的資料結構是numpy.ndarray(假設為arr)，雖然看起來很像列向量(列矩陣)，但是其四則運算都是element-wise，而非矩陣的運算。其中重要的屬性有
- ndarray.ndim: 陣列的維度
- ndarray.shape: 陣列的形狀
- ndarray.size: 陣列的元素個數，等於陣列形狀元素值的乘積
- ndarray.dtype: 陣列元素的形態，可用形態詳見data type
- ndarray.itemsize: 陣列每一個元素所使用的記憶體空間(bytes)
- ndarray.data: 陣列所使用的buffer

檢查BLAS/LAPACK linkage

import numpy as np
np.__config__.show()

# or
np.show_config()

nosetests

安裝numpy與scipy後，先進入python shell輸入以下指令做測試。

import numpy
numpy.test('full')

import scipy
scipy.test('full')

view

view()函式的功能是將array中元素在記憶體中用不同的方式解讀，而不是拷貝生成新資料後再解讀。

x = np.array([(1, 2)], dtype=[('a', np.int8), ('b', np.int8)])
# 0000000100000010
y = x.view(dtype=np.int16, type=np.matrix)  # matrix([[513]], dtype=int16)

高維陣列的轉置也是傳回view

x = np.array([1,2,3],[4,5,6])
y = x.T

四則運算

import numpy as np
arr =np.arange(4) # array([0, 1, 2, 3])
arr += 1    # array([1, 2, 3, 4]), +=是inplace operator
arr -= 2    # array([-1,  0,  1,  2])
arr *= 2    # array([-2,  0,  2,  4]
arr /= 2    # array([-1,  0,  1,  2])

矩陣運算

兩個向量的內積要使用dot()函式

import numpy as np
x = np.array([1,2])
y = np.array([3,4])

# inner product
print (x.dot(y))    # 11 = 1*3 + 2*4
print (np.dot(x,y)) # 11

向量對矩陣的乘法也是用dot()函式，注意得到的結果不是column vector，而是一維的陣列，很容易犯錯!!

import numpy as np
x = np.array([1,2])
Y = np.array([[1,2],[3,4],[5,6]])

# Yx: (3,2)*(2,1)=(3,1)
print (Y.dot(x))    # array([ 5, 11, 17])
print (np.dot(Y,x)) # array([ 5, 11, 17])

broadcast

Offical docs

*　broadcast允函式以有意義的方式處理具有不完全相同形狀的輸入。

Two dimensions are compatible when
- they are equal, or
- one of them is 1

# broadcast example
A      (2d array):  5 x 4
B      (1d array):      1
Result (2d array):  5 x 4

A      (2d array):  5 x 4
B      (1d array):      4
Result (2d array):  5 x 4

A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 1
Result (3d array):  15 x 3 x 5

# cannot broadcast example
A      (1d array):  3
B      (1d array):  4 # trailing dimensions do not match

A      (2d array):      2 x 1
B      (3d array):  8 x 4 x 3 # second from last dimensions mismatched

一般的一維陣列對二維陣列(矩陣)的broadcast計算是by row，而且row向量維度必須相同，如下。

import numpy as np
x = np.arange(1,3)  # array([1, 2])
Y = np.array([[1,2],[3,4],[5,6]])

# 一維陣列對二維陣列的加、減、乘、除，會broadcast by row
print (x+Y) #array([[2, 4], [4, 6], [6, 8]])
print (x-Y) array([[ 0,  0], [-2, -2], [-4, -4]])

# row向量不同，不可計算
z = np.array([1,2,3])
print (z-Y) #ValueError

broadcast by column

如果希望維陣列對二維陣列(矩陣)的broadcast計算是by column，則column向量維度必須相同，如下。

import numpy as np
z = np.array([1,2,3])
Y = np.array([[1,2],[3,4],[5,6]])

# 先把z轉成column vector再brocadcast
print (z[:,np.newaxis]*Y) # array([[ 1,  2], [ 6,  8], [15, 18]])

Structured arrays

可將index視為row id，而每個struct的欄位視為column id取值

# x為structed array，每個元素有三個值，
# 第一個值名為foo，type為4-byte integer，
# 第二個值名為bar, type為4-byte float,
# 第三個值名為baz，type為10-byte or less byte string
x = np.array([(1,2.,'Hello'), (2,3.,"World")],
...              dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])

x[1]        # (2,3.,"World")
x['bar']    # array([ 2.,  3.], dtype=float32)

exception

預設行為:
- 'warn' for invalid, divide, and overflow
- 'ignore' for underflow.
預設等級：
- ‘ignore’ : Take no action when the exception occurs.
- ‘warn’ : Print a RuntimeWarning (via the Python warnings module).
- ‘raise’ : Raise a FloatingPointError.
- ‘call’ : Call a function specified using the seterrcall function.
- ‘print’ : Print a warning directly to stdout.
- ‘log’ : Record error in a Log object specified by seterrcall.

>>> oldsettings = np.seterr(all='warn')
>>> np.zeros(5,dtype=np.float32)/0.
invalid value encountered in divide
>>> j = np.seterr(under='ignore')
>>> np.array([1.e-100])**10
>>> j = np.seterr(invalid='raise')
>>> np.sqrt(np.array([-1.]))
FloatingPointError: invalid value encountered in sqrt
>>> def errorhandler(errstr, errflag):
...      print("saw stupid error!")
>>> np.seterrcall(errorhandler)
<function err_handler at 0x...>
>>> j = np.seterr(all='call')
>>> np.zeros(5, dtype=np.int32)/0
FloatingPointError: invalid value encountered in divide
saw stupid error!
>>> j = np.seterr(**oldsettings) # restore previous
...                              # error-handling settings

展開多維array

可使用ravel或是flatten函式
兩個函式的差異在:
- ravel傳回array的view，修正傳回值會影響到原本的陣列。
- flatten傳回array的copy，因此修改傳回值不會影響到原本的陣列。
- What is the difference between flatten and ravel in numpy?

import numpy as np
Y = np.arange(1,7).reshape(3,2) #array([[1, 2], [3, 4], [5, 6]])
zf = Y.flatten()    # array([1, 2, 3, 4, 5, 6]
zf[3]= 99           # zf: array([ 1,  2,  3, 99,  5,  6])
                    # Y: array([[1, 2], [3, 4], [5, 6]])不受影響
zv = Y.ravel()      # array([1, 2, 3, 4, 5, 6]
zv[3] = 99          # zv: array([ 1,  2,  3, 99,  5,  6])
                    # Y: array([[ 1,  2], [ 3, 99], [ 5,  6]])被影響

masked array

masked array為含有missing或invalid資料的陣列。
numpy.ma提供了處理masked array的模組

datetime與timedelta(after numpy 1.7)

詳見Offical docs
data type: “datetime64”
建立datetime物件的最基本方法是使用ISO 8601字串或是datetime格式。
- date unit: years (‘Y’), months (‘M’), weeks (‘W’), and days (‘D’)
- time unit: hours (‘h’), minutes (‘m’), seconds (‘s’), milliseconds (‘ms’)

import numpy as np

# A simple ISO date
np.datetime64('2005-02-25') # numpy.datetime64('2005-02-25')

# Using months for the unit
np.datetime64('2005-02') # numpy.datetime64('2005-02')

# Specifying just the month, but forcing a ‘days’ unit:
np.datetime64('2005-02', 'D') # numpy.datetime64('2005-02-01')

# From a date and time
np.datetime64('2005-02-25T03:30') # numpy.datetime64('2005-02-25T03:30')

# array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64[D]')
np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')

# array(['2001-01-01T12:00:00.000-0600', '2002-02-03T13:56:03.172-0600'], dtype='datetime64[ms]')
np.array(['2001-01-01T12:00', '2002-02-03T13:56:03.172'], dtype='datetime64')

np.arange('2005-02', '2005-03', dtype='datetime64[D]')
# array(['2005-02-01', '2005-02-02', '2005-02-03', '2005-02-04',
       '2005-02-05', '2005-02-06', '2005-02-07', '2005-02-08',
       '2005-02-09', '2005-02-10', '2005-02-11', '2005-02-12',
       '2005-02-13', '2005-02-14', '2005-02-15', '2005-02-16',
       '2005-02-17', '2005-02-18', '2005-02-19', '2005-02-20',
       '2005-02-21', '2005-02-22', '2005-02-23', '2005-02-24',
       '2005-02-25', '2005-02-26', '2005-02-27', '2005-02-28'],
       dtype='datetime64[D]')

datetime的運算

np.datetime64('2009-01-01') - np.datetime64('2008-01-01')
# numpy.timedelta64(366,'D')

np.datetime64('2009') + np.timedelta64(20, 'D')
# numpy.datetime64('2009-01-21')

np.datetime64('2011-06-15T00:00') + np.timedelta64(12, 'h')
# numpy.datetime64('2011-06-15T12:00-0500')

numpy

numpy

檢查BLAS/LAPACK linkage

nosetests

view

四則運算

矩陣運算

broadcast

broadcast by column

Structured arrays

exception

展開多維array

masked array

datetime與timedelta(after numpy 1.7)

results matching ""

No results matching ""