# Python: compute sum of a CSR matrix by column

edited
I want to compute the sum of all rows for each column in a CSR matrix. The result should be a list or numpy array. What function should I use for that?

E.g.

array([[1, 0, 2],

[0, 0, 1],

[3, 1, 2]])

Result should be: [4,1,5]

+1 vote
by (233k points)
selected by

The sum() function of Numpy can be used to calculate the sum of a CSR matrix by row or column. Once you have the sum, you can use the flatten() function to get the result as a list or an array.

Here is an example:

>>> from scipy.sparse import csr_matrix
>>> import numpy as np
>>> row = np.array([0, 0, 1, 2, 2, 2])
>>> col = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 1, 3, 1, 2])
>>> X=csr_matrix((data, (row, col)), shape=(3, 3))
>>> X
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
>>> X.toarray()
array([[1, 0, 2],
[0, 0, 1],
[3, 1, 2]])
>>> np.array(np.sum(X, axis=0)).flatten()   # to sum by column
array([4, 1, 5])

>>> np.array(np.sum(X, axis=1)).flatten()  # to sum by row
array([3, 1, 6])