I am trying to perform a certain numerical computation in Python with Numpy:

```
import numpy as np
# Given:
I = 100
O = 1000
F = 10
o = np.random.random((O,F))
ev = np.random.choice(a=(False, True), size=(O,F))
x = np.random.random((I,F))
const1 = 1.3456
const2 = 2.3456
const3 = 3.3456
const4 = 4.3456
# My calculations:
diff = o(np.newaxis, :, :) - x.reshape((-1, 1, x.shape(x.ndim - 1)))
# Might be accessible from cache:
nsh = np.einsum('iof,iof,of,->io', diff, diff, ev, 1 / const1)
kv = const2 * np.exp(nsh + const3)
# :Might be accessible from cache
result = np.einsum('io,iof,of->if', kv, diff, ev)
```

I was trying to optimize for speed using broadcasting in the first line `diff =`

and einsum in the last line. However, as one can easily see, the first line produces a very large array `diff`

. Is it possible to avoid the creation of this large array while improving the speed or at least keeping the current speed?

One idea would be to replace

```
nsh = np.einsum('iof,iof,of,->io', diff, diff, ev, 1 / const1)
```

with

```
X = x
Y = o
X_sqr = np.sum(X ** 2, axis=1)
Y_sqr = np.sum(Y ** 2, axis=1)
nsh = (X_sqr(:, np.newaxis) - 2.0 * X.dot(Y.T) + Y_sqr) / const1
```

which is much faster. However, it does not include `ev`

yet, because I don’t know how I have to include it, to get it right. And also it does not remove the need to create `diff`

, because I still need it in the final line.