I have the following implementation that takes in symmetric matrix W and returns matrix C

```
import torch
import numpy as np
import time
W = torch.tensor(((0, 1, 0, 0, 0, 0, 0, 0, 0),
(1, 0, 1, 0, 0, 1, 0, 0, 0),
(0, 1, 0, 3, 0, 0, 0, 0, 0),
(0, 0, 3, 0, 1, 0, 0, 0, 0),
(0, 0, 0, 1, 0, 1, 1, 0, 0),
(0, 1, 0, 0, 1, 0, 0, 0, 0),
(0, 0, 0, 0, 1, 0, 0, 1, 0),
(0, 0, 0, 0, 0, 0, 1, 0, 1),
(0, 0, 0, 0, 0, 0, 0, 1, 0)))
A = W.clone()
(n_node, _) = A.shape
nI = n_node * torch.eye(n_node)
# -- method 1
C = torch.empty(n_node, n_node)
for i in range(n_node):
for j in range(i, n_node):
B = A.clone()
B(i, j) = B(j, i) = 0
C(i, j) = C(j, i) = torch.inverse(nI - B)(i, j)
```

As you can see, I have a matrix `nI - B`

and change one element at each loop and compute its inverse. Im trying to use Sherman-Morrison formula to enhance the performance of the code. Here is my implementation:

```
# -- method 2
c = torch.empty(n_node, n_node)
inv_nI_A = torch.inverse(nI - A)
b = torch.div(A, 1 + torch.einsum('ij,ij->ij', A, inv_nI_A))
inv_nI_A_ = inv_nI_A.unsqueeze(1)
for i in range(n_node):
for j in range(i, n_node):
c(i, j) = c(j, i) = (inv_nI_A - b(i, j) * torch.matmul(inv_nI_A_(:, :, i), inv_nI_A_(j, :, :)))(i, j)
```

I was wondering if I can do further enhancements on my implementation. Thanks!

PS: W is not necessarily a sparse matrix.