numpy

  • numpy在官網的tutorial內容已經相當完整,因此本篇重點在於常用的細節與技巧。
  • Numpy basic

  • numpy的基本資料結構,也是最重要的資料結構是numpy.ndarray(假設為arr),雖然看起來很像列向量(列矩陣),但是其四則運算都是element-wise,而非矩陣的運算。其中重要的屬性有

    • ndarray.ndim: 陣列的維度
    • ndarray.shape: 陣列的形狀
    • ndarray.size: 陣列的元素個數,等於陣列形狀元素值的乘積
    • ndarray.dtype: 陣列元素的形態,可用形態詳見data type
    • ndarray.itemsize: 陣列每一個元素所使用的記憶體空間(bytes)
    • ndarray.data: 陣列所使用的buffer

檢查BLAS/LAPACK linkage

import numpy as np
np.__config__.show()

# or
np.show_config()

nosetests

  • 安裝numpy與scipy後,先進入python shell輸入以下指令做測試。
import numpy
numpy.test('full')

import scipy
scipy.test('full')

view

  • view()函式的功能是將array中元素在記憶體中用不同的方式解讀,而不是拷貝生成新資料後再解讀。
x = np.array([(1, 2)], dtype=[('a', np.int8), ('b', np.int8)])
# 0000000100000010
y = x.view(dtype=np.int16, type=np.matrix)  # matrix([[513]], dtype=int16)
  • 高維陣列的轉置也是傳回view
x = np.array([1,2,3],[4,5,6])
y = x.T

四則運算

import numpy as np
arr =np.arange(4) # array([0, 1, 2, 3])
arr += 1    # array([1, 2, 3, 4]), +=是inplace operator
arr -= 2    # array([-1,  0,  1,  2])
arr *= 2    # array([-2,  0,  2,  4]
arr /= 2    # array([-1,  0,  1,  2])

矩陣運算

  • 兩個向量的內積要使用dot()函式
import numpy as np
x = np.array([1,2])
y = np.array([3,4])

# inner product
print (x.dot(y))    # 11 = 1*3 + 2*4
print (np.dot(x,y)) # 11
  • 向量對矩陣的乘法也是用dot()函式,注意得到的結果不是column vector,而是一維的陣列,很容易犯錯!!
import numpy as np
x = np.array([1,2])
Y = np.array([[1,2],[3,4],[5,6]])

# Yx: (3,2)*(2,1)=(3,1)
print (Y.dot(x))    # array([ 5, 11, 17])
print (np.dot(Y,x)) # array([ 5, 11, 17])

broadcast

* broadcast允函式以有意義的方式處理具有不完全相同形狀的輸入。

  • Two dimensions are compatible when
    • they are equal, or
    • one of them is 1
# broadcast example
A      (2d array):  5 x 4
B      (1d array):      1
Result (2d array):  5 x 4

A      (2d array):  5 x 4
B      (1d array):      4
Result (2d array):  5 x 4

A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 1
Result (3d array):  15 x 3 x 5

# cannot broadcast example
A      (1d array):  3
B      (1d array):  4 # trailing dimensions do not match

A      (2d array):      2 x 1
B      (3d array):  8 x 4 x 3 # second from last dimensions mismatched
  • 一般的一維陣列對二維陣列(矩陣)的broadcast計算是by row,而且row向量維度必須相同,如下。
import numpy as np
x = np.arange(1,3)  # array([1, 2])
Y = np.array([[1,2],[3,4],[5,6]])

# 一維陣列對二維陣列的加、減、乘、除,會broadcast by row
print (x+Y) #array([[2, 4], [4, 6], [6, 8]])
print (x-Y) array([[ 0,  0], [-2, -2], [-4, -4]])

# row向量不同,不可計算
z = np.array([1,2,3])
print (z-Y) #ValueError

broadcast by column

  • 如果希望維陣列對二維陣列(矩陣)的broadcast計算是by column,則column向量維度必須相同,如下。
import numpy as np
z = np.array([1,2,3])
Y = np.array([[1,2],[3,4],[5,6]])

# 先把z轉成column vector再brocadcast
print (z[:,np.newaxis]*Y) # array([[ 1,  2], [ 6,  8], [15, 18]])

Structured arrays

  • 可將index視為row id,而每個struct的欄位視為column id取值
# x為structed array,每個元素有三個值,
# 第一個值名為foo,type為4-byte integer,
# 第二個值名為bar, type為4-byte float,
# 第三個值名為baz,type為10-byte or less byte string
x = np.array([(1,2.,'Hello'), (2,3.,"World")],
...              dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])

x[1]        # (2,3.,"World")
x['bar']    # array([ 2.,  3.], dtype=float32)

exception

  • 預設行為:

    • 'warn' for invalid, divide, and overflow
    • 'ignore' for underflow.
  • 預設等級:

    • ‘ignore’ : Take no action when the exception occurs.
    • ‘warn’ : Print a RuntimeWarning (via the Python warnings module).
    • ‘raise’ : Raise a FloatingPointError.
    • ‘call’ : Call a function specified using the seterrcall function.
    • ‘print’ : Print a warning directly to stdout.
    • ‘log’ : Record error in a Log object specified by seterrcall.
>>> oldsettings = np.seterr(all='warn')
>>> np.zeros(5,dtype=np.float32)/0.
invalid value encountered in divide
>>> j = np.seterr(under='ignore')
>>> np.array([1.e-100])**10
>>> j = np.seterr(invalid='raise')
>>> np.sqrt(np.array([-1.]))
FloatingPointError: invalid value encountered in sqrt
>>> def errorhandler(errstr, errflag):
...      print("saw stupid error!")
>>> np.seterrcall(errorhandler)
<function err_handler at 0x...>
>>> j = np.seterr(all='call')
>>> np.zeros(5, dtype=np.int32)/0
FloatingPointError: invalid value encountered in divide
saw stupid error!
>>> j = np.seterr(**oldsettings) # restore previous
...                              # error-handling settings

展開多維array

import numpy as np
Y = np.arange(1,7).reshape(3,2) #array([[1, 2], [3, 4], [5, 6]])
zf = Y.flatten()    # array([1, 2, 3, 4, 5, 6]
zf[3]= 99           # zf: array([ 1,  2,  3, 99,  5,  6])
                    # Y: array([[1, 2], [3, 4], [5, 6]])不受影響
zv = Y.ravel()      # array([1, 2, 3, 4, 5, 6]
zv[3] = 99          # zv: array([ 1,  2,  3, 99,  5,  6])
                    # Y: array([[ 1,  2], [ 3, 99], [ 5,  6]])被影響

masked array

  • masked array為含有missing或invalid資料的陣列。
  • numpy.ma提供了處理masked array的模組

datetime與timedelta(after numpy 1.7)

  • 詳見Offical docs
  • data type: “datetime64”
  • 建立datetime物件的最基本方法是使用ISO 8601字串或是datetime格式。
    • date unit: years (‘Y’), months (‘M’), weeks (‘W’), and days (‘D’)
    • time unit: hours (‘h’), minutes (‘m’), seconds (‘s’), milliseconds (‘ms’)
import numpy as np

# A simple ISO date
np.datetime64('2005-02-25') # numpy.datetime64('2005-02-25')

# Using months for the unit
np.datetime64('2005-02') # numpy.datetime64('2005-02')

# Specifying just the month, but forcing a ‘days’ unit:
np.datetime64('2005-02', 'D') # numpy.datetime64('2005-02-01')

# From a date and time
np.datetime64('2005-02-25T03:30') # numpy.datetime64('2005-02-25T03:30')

# array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64[D]')
np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')

# array(['2001-01-01T12:00:00.000-0600', '2002-02-03T13:56:03.172-0600'], dtype='datetime64[ms]')
np.array(['2001-01-01T12:00', '2002-02-03T13:56:03.172'], dtype='datetime64')

np.arange('2005-02', '2005-03', dtype='datetime64[D]')
# array(['2005-02-01', '2005-02-02', '2005-02-03', '2005-02-04',
       '2005-02-05', '2005-02-06', '2005-02-07', '2005-02-08',
       '2005-02-09', '2005-02-10', '2005-02-11', '2005-02-12',
       '2005-02-13', '2005-02-14', '2005-02-15', '2005-02-16',
       '2005-02-17', '2005-02-18', '2005-02-19', '2005-02-20',
       '2005-02-21', '2005-02-22', '2005-02-23', '2005-02-24',
       '2005-02-25', '2005-02-26', '2005-02-27', '2005-02-28'],
       dtype='datetime64[D]')
  • datetime的運算
np.datetime64('2009-01-01') - np.datetime64('2008-01-01')
# numpy.timedelta64(366,'D')

np.datetime64('2009') + np.timedelta64(20, 'D')
# numpy.datetime64('2009-01-21')

np.datetime64('2011-06-15T00:00') + np.timedelta64(12, 'h')
# numpy.datetime64('2011-06-15T12:00-0500')

results matching ""

    No results matching ""