稀疏矩阵的存储格式(SparseMatrixStorageFormats)稀疏矩阵的存储格式(Sparse Matrix Storage Formats)
Sason@CSDN
对于很多元素为零的稀疏矩阵,仅存储⾮零元素可使矩阵操作效率更⾼。现有许多种稀疏矩阵的存储⽅式,但是多数采⽤相同的基本技术,即存储矩阵所有的⾮零元素到⼀个线性数组中,并提供辅助数组来描述原数组中⾮零元素的位置。
以下是⼏种常见的稀疏矩阵存储格式:
1. Coordinate Format (COO)
这种存储⽅式的主要优点是灵活、简单。仅存储⾮零元素以及每个⾮零元素的坐标。
使⽤3个数组进⾏存储:values, rows, and column
values: 实数或复数数据,包括矩阵中的⾮零元素, 顺序任意。
rows: 数据所处的⾏。
columns: 数据所处的列.
参数:矩阵中⾮零元素的数量 nnz,3个数组的长度均为nnz.
2. Diagonal Storage Format (DIA)
If the sparse matrix has diagonals containing only zero elements, then the diagonal storage format can be used to reduce the amount of information needed to locate the non-zero elements. This storage format is particularly useful in many applications where the matrix arises from a finite element or finite difference discretization.
The Intel MKL diagonal storage format is specified by two arrays:values and distance, and two parameters:ndiag, which is the number of non-empty diagonals, and lval, which is the declared leading dimension in the calling (sub)programs.
values
A real or complex two-dimensional array is dimensioned as lval by ndiag. Each column of it contains the non-zero elements of
certain diagonal of A. The key point of the storage is that each element in values retains the row number of the original matrix.
To achieve this diagonals in the lower triangular part of the matrix are padded from the top, and those in the upper triangular part are padded from the bottom. Note that the value of distance(i) is the number of elements to be padded for diagonal i.
distance
An integer array with dimension ndiag. Element i of the array distance is the distance between i-diagonal and the main diagonal.
The distance is positive if the diagonal is above the main diagonal, and negative if the diagonal is below the main diagonal. The main diagonal has a distance equal to zero.
3. Compressed Sparse Row Format (CSR)
The Intel MKL compressed sparse row (CSR) format is specified by four arrays: the values,columns,pointerB, and pointerE. The following table describes the arrays in terms of the values, row, and column positions of the non-zero elements in a sparse matrix A.
values
A real or complex array that contains the non-zero elements of A. Values of the non-zero elements of A are mapped into
the values array using the row-major storage mapping described above.
columns
Element i of the integer array columns is the number of the column in A that contains the i-th value in the values array.
pointerB
Element j of this integer array gives the index of the element in the values array that is first non-zero element in a row j of A.
Note that this index is equal to pointerB(j) -pointerB(1)+1 .
pointerE
An integer array that contains row indices, such that pointerE(j)-pointerB(1) is the index of the element in the values array that is last non-zero element in a row j of A.
4. Compressed Sparse Column Format (CSC)
The compressed sparse column format (CSC) is similar to the CSR format, but the columns are used instead the rows. In other words, the CSC format is identical to the CSR format for the transposed matrix. The CSR format is specified by four arrays: values, columns, pointerB, and pointerE. The following table describes the arrays in terms of the values, row, and column positions of the non-zero elements in a sparse matrix A.
values
A real or complex array that contains the non-zero elements of A. Values of the non-zero elements of A are mapped into
the values array using the column-major storage mapping.
rows
Element i of the integer array rows is the number of the row in A that contains the i-th value in the values array.
pointerB
Element j of this integer array gives the index of the element in the values array that is first non-zero element in a column j
of A. Note that this index is equal to pointerB(j) -pointerB(1)+1 .
pointerE
An integer array that contains column indices, such that pointerE(j)-pointerB(1) is the index of the element in the values array
that is last non-zero element in a column j of A.
5. Skyline Storage Format
The skyline storage format is important for the direct sparse solvers, and it is well suited for Cholesky or LU decomposition when no pivoting is required.
The skyline storage format accepted in Intel MKL can store only triangular matrix or triangular part of a matrix. This format is specified by two arrays:values and pointers. The following table describes these arrays:
values
A scalar array. For a lower triangular matrix it contains the set of elements from each row of the matrix starting from the
first non-zero element to and including the diagonal element. For an upper triangular matrix it contains the set of elements
from each column of the matrix starting with the first non-zero element down to and including the diagonal element.
Encountered zero elements are included in the sets.
pointers
An integer array with dimension (m+1), where m is the number of rows for lower triangle (columns for the upper
index复数
triangle).pointers(i) -pointers(1)+1 gives the index of element in values that is first non-zero element in row (column)i. The value
of pointers(m+1) is set to nnz+pointers(1), where nnz is the number of elements in the array values.
6. Block Compressed Sparse Row Format (BSR)
The Intel MKL block compressed sparse row (BSR) format for sparse matrices is specified by four arrays:values,columns,pointerB,
and pointerE. The following table describes these arrays.
values
A real array that contains the elements of the non-zero blocks of a sparse matrix. The elements are stored block-by-block in
row-major order. A non-zero block is the block that contains at least one non-zero element. All elements of non-zero blocks
are stored, even if some of them is equal to zero. Within each non-zero block elements are stored in column-major order in
the case of one-based indexing, and in row-major order in the case of the zero-based indexing.
columns
Element i of the integer array columns is the number of the column in the block matrix that contains the i-th non-zero block.
pointerB
Element j of this integer array gives the index of the element in the columns array that is first non-zero block in a row j of the block matrix.
pointerE
Element j of this integer array gives the index of the element in the columns array that contains the last non-zero block in a
row j of the block matrix plus 1.
7.  ELLPACK (ELL)
8. Hybrid (HYB)
由ELL+COO两种格式结合⽽成。
选择稀疏矩阵存储格式的⼀些经验:
1. DIA和ELL格式在进⾏稀疏矩阵-⽮量乘积(sparse matrix-vector products)时效率最⾼,所以它们是应⽤迭代法(如共轭梯度法)解稀疏线性系统最快的格式;
2. COO和CSR格式⽐起DIA和ELL来,更加灵活,易于操作;
3. ELL的优点是快速,⽽COO优点是灵活,⼆者结合后的HYB格式是⼀种不错的稀疏矩阵表⽰格式;
4. 根据Nathan Bell的⼯作,CSR格式在存储稀疏矩阵时⾮零元素平均使⽤的字节数(Bytes per Nonzero Entry)最为稳定(float类型约为8.5,double类型约为12.5),⽽DIA格式存储数据的⾮零元素平均使⽤的字节数与矩阵类型有较⼤关系,适合于StructuredMesh结构的稀疏矩阵(float类型约为
4.05,double类型约为8.10),对于Unstructured Mesh以及Random Matrix,DIA格式使⽤的字节数是CSR格式的⼗⼏倍;
5. 从我使⽤过的⼀些线性代数计算库来说,COO格式常⽤于从⽂件中进⾏稀疏矩阵的读写,如matrix market即采⽤COO格式,⽽CSR格式常⽤于读⼊数据后进⾏稀疏矩阵计算。
其他相关链接:
1. Intel MKL 库中使⽤的稀疏矩阵格式

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。