python – improved speed of this diffraction calculator based on numpy

I will simulate diffraction patterns of a normal incident Gaussian profile beam from a 2D network of point diffusers with a height distribution.

The 2D table of diffuser positions X, Y and Z each has a size N x N and these are summarized in each call to E_ab(a, b, positions, w_beam). It's done M x M times to build the diffraction model.

If I estimate ten floating point operations per dispersion site per pixel and one nanosecond per flop (which my laptop does for numpy small matrices), I would expect time to be 10 M^2 N^2 1E-09 seconds. For the small N, this runs a factor of 50 or 100 slower than that, and for the large N (bigger than say 2000), it slows down even more. I guess it has something to do with the pagination of the large paintings in memory.

What can i do to increase the speed of fat N?

Note: Right now, the height variation Z is random, in the future I plan to also include an additional systematic pitch variation term, so even if the purely Gaussian variation might have an analytical solution, I have to do it numerically.

Since I randomly distribute the height of pZ here, the plots will be a little different each time. My output (run on a laptop) is this, and I can't even begin to understand why it takes longer (~ 16 seconds) when w_beam is small only when it is large (~ 6 seconds).

My estimator 10 M^2 N^2 1E-09 suggests 0.25 seconds, these are about 50 times slower, so there can be substantial room for improvement.

1 16.460583925247192
2 14.861294031143188
4 8.405776023864746
8 6.4988932609558105

Python script:

import numpy as np
import matplotlib.pyplot as plt
import time

def E_ab(a, b, positions, w_beam):
    X, Y, Z = positions
    Rsq = X**2 + Y**2
    phases = k0 * (a*X + b*Y + (1 + np.sqrt(1 - a**2 - b**2))*Z)
    E = np.exp(-Rsq/w_beam**2)  * np.exp(-j*phases)
    return E.sum() / w_beam**2 # rough normalization

twopi, j = 2*np.pi, np.complex(0, 1)

wavelength = 0.08
k0  = twopi/wavelength

z_noise = 0.05 * wavelength

N, M = 100, 50
x = np.arange(-N, N+1)
X, Y = np.meshgrid(x, x)
Z = z_noise * np.random.normal(size=X.shape) # use random Z noise for now
positions = (X, Y, Z)

A = np.linspace(0, 0.2, M)

answers = ()
for w_beam in (1, 2, 4, 8):
    E = ()
    tstart = time.time()
    for i, a in enumerate(A):
        EE = ()
        for b in A:
            e = E_ab(a, b, positions, w_beam)
    print(w_beam, time.time() - tstart)

if True:
    for i, E in enumerate(answers):
        plt.subplot(2, 2, i+1)
        plt.imshow(np.log10(np.abs(E)), vmin=0.0001)

enter description of image here