Sparse matrices#

A sparse matrix is a matrix which has only a small fraction of non-zero elements. We noted already that in the example of the 1D wave equation the matrix is tridiagonal, so we need to store only the \(N + 2(N-1)=3N-2\) elements in the three diagonals, the other elements are zero. This happens because the second order finite-differencing that we used relates the function at grid cell \(i\), \(f_i\), only to its nearest neighbours \(f_{i+1}\) and \(f_{i-1}\).

For Laplace’s equation on a 2D grid, we have the difference equation

\[V_{i,j} = {1\over 4}\left(V_{i+1,j}+V_{i-1,j} + V_{i,j+1}+V_{i,j-1}\right),\]

so again if we were to write this as a matrix equation the matrix involved would be sparse. One way to do this is to reduce the 2D grid of values to a 1D vector, for example by defining the index \(k = i + j\times N\) where \(NxN\) is the size of the grid. The difference equation is then

\[V_k = {1\over 4}\left(V_{k-N} + V_{k-1} + V_{k+1} + V_{k+N}\right),\]

which we could write as

\[\mathbf{A} \cdot \mathbf{V} = 0\]

where \(\mathbf{V}\) is the vector of function values of length \(N^2\), and \(\mathbf{A}\) is a \(N^2\times N^2\) matrix. In this case, the matrix is a banded matrix - it has five non-zero diagonals (the main diagonal and the diagonals at distance \(k=\pm 1\) and \(k=\pm N\) away from the main diagonal).

You can see that these matrices can become large very quickly - even for a modest 2D problem with \(N=100\), we are dealing with a \(10^4\times 10^4\) matrix, or \(10^8\) (mostly zero) matrix elements.

Fortunately, there are techniques available for dealing with sparse matrices. As we will see these can sidestep the need to store the matrix itself (even in reduced, banded form). An important example is the conjugate gradient method. We’ll go through some of the theory behind this method first and then apply it to solving Laplace’s equation.

Conjugate gradient method#

The conjugate gradient method is an iterative way to solve the equation \(\mathbf{A}\mathbf{x}=\mathbf{b}\) when you have a sparse matrix \(\mathbf{A}\). The matrix \(\mathbf{A}\) needs to be symmetric and positive-definite, ie. \(\mathbf{x}^T\mathbf{A}\mathbf{x}\) is positive for any vector \(\mathbf{x}\). We start with an initial guess for the solution and iteratively improves the guess, each time applying a correction that is conjugate to previous corrections.

Conjugate vectors

Conjugate vectors \(\{\mathbf{p_i}\}\) of a matrix \(\mathbf{A}\) satisfy the orthogonality relation

\[\mathbf{p_i}^T\mathbf{A}\mathbf{p_j} \propto \delta_{ij}.\]

In terms of the these vectors, the solution to \(\mathbf{A}\mathbf{x}=\mathbf{b}\) can be written as

\[\mathbf{x} = \sum_i \alpha_i \mathbf{p_i}.\]

The coefficients \(\alpha_i\) can be obtained from

\[\mathbf{A}\mathbf{x}=\mathbf{b}\Rightarrow \sum_i \alpha_i \mathbf{A}\mathbf{p_i} = \mathbf{b}\Rightarrow \sum_i \alpha_i \mathbf{p_j}^T\mathbf{A}\mathbf{p_i} = \mathbf{p_j}^T\mathbf{b}\]
\[\Rightarrow \alpha_j = {\mathbf{p_j}^T\mathbf{b}\over \mathbf{p_j}^T\mathbf{A}\mathbf{p_j}}\]

Here’s how it works: start with a guess for the solution, \(\mathbf{x_0}\). We take the first conjugate vector to be the residual

\[\mathbf{p_0}=\mathbf{r_0} = \mathbf{b}-\mathbf{A}\mathbf{x_0}.\]

If we think of solving \(\mathbf{A}\mathbf{x}=\mathbf{b}\) as minimizing the function

\[f = {1\over 2}\mathbf{x}^T\mathbf{A}\mathbf{x} - \mathbf{x}^T\mathbf{b}\]

which has

\[\mathbf{\nabla} f = \mathbf{A}\mathbf{x} - \mathbf{b},\]

then this makes sense because it is equivalent to choosing \(\mathbf{p_0}=-\mathbf{\nabla} f(\mathbf{x_0})\), ie. in the direction of the downhill gradient at \(\mathbf{x_0}\). We update our guess by moving in this direction

\[\mathbf{x_1} = \mathbf{x_0} + \alpha_0 \mathbf{p_0}\]

by an amount that minimizes \(f(\mathbf{x_1}) = f( \mathbf{x_0} + \alpha_0 \mathbf{p_0})\). You can show that (differentiate with respect to \(\alpha_0\) and set to zero) this corresponds to

\[\alpha_0 = {\mathbf{p_0}^T\mathbf{r_0}\over \mathbf{p_0}^T\mathbf{A}\mathbf{p_0}}.\]

The new residual is

\[\mathbf{r_1} = \mathbf{b}-\mathbf{A}\mathbf{x_1}=\mathbf{r_0} - \alpha_0 \mathbf{A}\mathbf{p_0},\]

which is orthogonal to \(\mathbf{r_0}\) (since we moved to the point where \(f\) is minimized in the direction of \(\mathbf{r_0}\), so the gradient of \(f\) is now in an orthogonal direction).

For the next step, rather than moving fully in the direction of \(\mathbf{r_1}\), we first subtract off the component in the \(\mathbf{p_0}\) direction, so that we end up moving in a direction that is conjugate to \(\mathbf{p_0}\):

\[\mathbf{p_1} = \mathbf{r_1} - {\mathbf{p_0}^T\mathbf{A}\mathbf{r_1}\over \mathbf{p_0}^T\mathbf{A}\mathbf{p_0}}\mathbf{p_0}.\]

The next solution is \(\mathbf{x_2} = \mathbf{x_1} + \alpha_1 \mathbf{p_1}\) with residual \(\mathbf{r_2} = \mathbf{r_1} - \alpha_1 \mathbf{A}\mathbf{p_1}\) where

\[\alpha_1 = {\mathbf{p_1}^T\mathbf{r_1}\over \mathbf{p_1}^T\mathbf{A}\mathbf{p_1}}.\]

This continues to future iterations with

\[\mathbf{p_k} = \mathbf{r_k} - \sum_{i<k} {\mathbf{p_i}^T\mathbf{A}\mathbf{r_k}\over \mathbf{p_i}^T\mathbf{A}\mathbf{p_i}}\mathbf{p_i},\]

i.e. we take the part of \(\mathbf{r_k}\) that is conjugate to all the previous \(\mathbf{p_i}\)’s. The next iteration is then

\[\mathbf{x_{k+1}} = \mathbf{x_k} + \alpha_k \mathbf{p_k}, \hspace{1cm} \mathbf{r_{k+1}} = \mathbf{r_k} - \alpha_k \mathbf{A}\mathbf{p_k}\]

where

\[\alpha_k = {\mathbf{p_k}^T\mathbf{r_k}\over \mathbf{p_k}^T\mathbf{A}\mathbf{p_k}}.\]

This method will converge in \(N\) iterations (for an \(N\times N\) matrix \(\mathbf{A}\)), i.e. once we’ve minimized \(f\) in all \(N\) conjugate directions, but often a much smaller number of iterations is enough if the solution is dominated by the few largest components for example.

The only thing that we need to put this into practice is a more efficient way to calculate \(\mathbf{p_{k+1}}\). It turns out that we don’t need to store all the previous directions: it can be shown that to calculate \(\mathbf{p_{k+1}}\) we just need the previous vector \(\mathbf{p_k}\),

\[\mathbf{p_{k+1}} = \mathbf{r_{k+1}} + \beta_k \mathbf{p_k}\]

where

\[\beta_k = {\mathbf{r_{k+1}^T r_{k+1}}\over \mathbf{r_k^T r_k} }.\]

Also, when calculating \(\alpha\) we can use the orthogonality properties of \(\mathbf{r}\) and \(\mathbf{p}\) to replace the numerator \(\mathbf{p_k}^T\mathbf{r_k}\) by \(\mathbf{r_k}^T\mathbf{r_k}\).

Here is a function that implements this: (based on the examples from last year’s course)

def conjgrad(A, b, x = None, niter = 20):
    # Find a solution to Ax=b

    # If there is no initial guess, create one
    if x is None:
        x = 0*b

    # Calculate first residual r_0 and set p_0 equal to it
    r = b - A@x
    p = r.copy()

    # Keep track of the size of the residual and
    # exit early if it gets small enough
    rr = r@r

    for i in range(niter):
        Ap = A@p
        pAp = p@Ap
        alpha = rr / pAp
        x = x + alpha*p
        r = r - alpha*Ap

        rr_new = r@r
        beta = rr_new / rr
        p = r + beta*p

        rr = rr_new
        print('Iteration %d: pAp = %lg, residual=%lg' % (i, pAp, rr**0.5))

    return x

Exercise: try out the conjugate gradient method

Try this out!

To generate a positive definite matrix \(\mathbf{A}\) you can use

n = 1000
A = np.random.rand(n,n)
A = A@A.T
A = A + np.diag(n * np.ones(n))

where the last step makes the matrix diagonally-dominated.

Choose \(\mathbf{b}\) also as a vector of random numbers.

Apply the conjugate gradient method using the function above and check how well the solution it gives you satisfies the equation.

[Solution]

Application to Laplace’s equation#

The remarkable thing about the conjugate gradient method above is that the matrix \(\mathbf{A}\) only appears when it is multiplying a vector. This means that we don’t have to store a representation of the matrix itself, we just need to provide a function that multiplies a given vector by the matrix. This encourages us to think physically about what our sparse matrix is doing. In the Laplace problem, the matrix is taking the average of the neighbours for example. We can code that up in a few lines without having to store the \(N^2\times N^2\) matrix!

In the case of Laplace’s equation, we have (as written above)

\[\mathbf{A} \cdot \mathbf{V} = 0\]

where the matrix \(\mathbf{A}\) computes the difference between \(V\) at a given point and the average of its neighbours. It’s helpful to separate out the contribution from averaging the interior points and averaging the boundary conditions. The boundary condition term can be moved onto the right hand side and becomes our \(\mathbf{b}\) vector in \(\mathbf{A}\mathbf{x}=\mathbf{b}\).

Here’s how this would work in code. First, the operation \(\mathbf{A}\mathbf{x}\) is carried out setting \(V=0\) on the boundary points, so this represents the averaging with no information coming in from the boundaries:

def Ax(x, n, mask):
    V = x.copy()
    V = V.reshape((n,n))
    V[mask] = 0
    Vnew = V - 0.25 * (np.roll(V,1,0) + np.roll(V,-1,0) + 
                       np.roll(V,1,1) + np.roll(V,-1,1))
    Vnew[mask] = 0
    Vnew = Vnew.reshape(n*n)
    return Vnew

(we are using the same setup as before for the Laplace problem where mask is the mask that sets the boundary points). Then we set up the \(\mathbf{b}\) vector to contain the boundary information, ie. the contribution to the average that comes from the boundaries:

b = 0.25 * (np.roll(bc,1,axis=0)+np.roll(bc,-1,axis=0)+
    np.roll(bc,1,axis=1)+np.roll(bc,-1,axis=1))
b[mask] = 0
b = b.reshape(n*n)

(again we assume that as before the boundary conditions have been set in the array bc).

Exercise: conjugate gradient for Laplace’s equation

Implement the conjugate gradient method for the Laplace problem we looked at last time. You can use the same set_bcs function to set up the boundary conditions and boundary mask.

Then define b as above and use the conjugate gradient method with the Ax function above to evolve the initial guess for the potential until it converges.

Plot the maximum change in the values of \(V\) from one iteration to the next against the number of iterations, and compare with what we had last time from the relaxation method. You should see that conjugate gradient converges in \(\sim N\) steps rather than \(\sim N^2\).

[Solution]