I will simulate diffraction patterns of a normal incident Gaussian profile beam from a 2D network of point diffusers with a height distribution.

The 2D table of diffuser positions `X`

, `Y`

and `Z`

each has a size `N x N`

and these are summarized in each call to `E_ab(a, b, positions, w_beam)`

. It's done `M x M`

times to build the diffraction model.

If I estimate ten floating point operations per dispersion site per pixel and one nanosecond per flop (which my laptop does for numpy small matrices), I would expect time to be `10 M^2 N^2 1E-09`

seconds. For the small N, this runs a factor of 50 or 100 slower than that, and for the large `N`

(bigger than say 2000), it slows down even more. I guess it has something to do with the pagination of the large paintings in memory.

What can i do to increase the speed of fat `N`

?

**Note:** Right now, the height variation `Z`

is random, in the future I plan to also include an additional systematic pitch variation term, so even if the purely Gaussian variation might have an analytical solution, I have to do it numerically.

Since I randomly distribute the height of pZ here, the plots will be a little different each time. My output (run on a laptop) is this, and I can't even begin to understand why it takes longer (~ 16 seconds) when `w_beam`

is small only when it is large (~ 6 seconds).

My estimator `10 M^2 N^2 1E-09`

suggests 0.25 seconds, these are about 50 times slower, so there can be substantial room for improvement.

```
1 16.460583925247192
2 14.861294031143188
4 8.405776023864746
8 6.4988932609558105
```

Python script:

```
import numpy as np
import matplotlib.pyplot as plt
import time
def E_ab(a, b, positions, w_beam):
X, Y, Z = positions
Rsq = X**2 + Y**2
phases = k0 * (a*X + b*Y + (1 + np.sqrt(1 - a**2 - b**2))*Z)
E = np.exp(-Rsq/w_beam**2) * np.exp(-j*phases)
return E.sum() / w_beam**2 # rough normalization
twopi, j = 2*np.pi, np.complex(0, 1)
wavelength = 0.08
k0 = twopi/wavelength
z_noise = 0.05 * wavelength
N, M = 100, 50
x = np.arange(-N, N+1)
X, Y = np.meshgrid(x, x)
Z = z_noise * np.random.normal(size=X.shape) # use random Z noise for now
positions = (X, Y, Z)
A = np.linspace(0, 0.2, M)
answers = ()
for w_beam in (1, 2, 4, 8):
E = ()
tstart = time.time()
for i, a in enumerate(A):
EE = ()
for b in A:
e = E_ab(a, b, positions, w_beam)
EE.append(e)
E.append(EE)
print(w_beam, time.time() - tstart)
answers.append(np.array(E))
if True:
plt.figure()
for i, E in enumerate(answers):
plt.subplot(2, 2, i+1)
plt.imshow(np.log10(np.abs(E)), vmin=0.0001)
plt.colorbar()
plt.show()
```