I am implementing a function that computes the angle between two vectors when given two n-dimensional arrays and an axis along which to operate. I want to do this with as few copies as possible, and as numerically stable as possible. (The latter gets problematic for small angles (see here)).

What I would like reviewed is primarily the speed of the implementation. There is a sign error hidden somewhere, so if you find it I’d love to know about it, too.

Here is a (primitive) test case for the function I implemented.

```
vec_a = (1, 0, 0)
vec_b = (0, 1, 0)
result = angle_between(vec_a, vec_b)
assert np.allclose(result, np.pi/2)
vec_a = (1, 0)
vec_b = (0, 1)
result = angle_between(vec_a, vec_b)
assert np.allclose(result, np.pi/2)
result = angle_between(vec_b, vec_a)
assert np.allclose(result, -np.pi/2)
vec_a = (1, 0)
vec_b = (0, 2)
result = angle_between(vec_a, vec_b)
assert np.allclose(result, -np.pi/2)
vec_a = ((1, 0), (2, 0))
vec_b = ((0, 2), (0, 1))
result = angle_between(vec_a, vec_b)
assert np.allclose(result, (-np.pi/2, np.pi/2))
```

Note that the test currently fails, because of the sign error, but you can remove the offending `assert`

.

Here is the function

```
def angle_between(vec_a: ArrayLike, vec_b: ArrayLike, *, axis:int=-1) -> np.ndarray:
"""Computes the signed angle from vec_a to vec_b (in a right-handed frame)
Notes
-----
Implementation is based on this StackOverflow post:
https://scicomp.stackexchange.com/a/27694
"""
vec_a = np.asarray(vec_a)(None, :)
vec_b = np.asarray(vec_b)(None, :)
if axis >= 0:
axis += 1
len_c = np.linalg.norm(vec_a - vec_b, axis=axis)
len_a = np.linalg.norm(vec_a, axis=axis)
len_b = np.linalg.norm(vec_b, axis=axis)
flipped = np.where(len_a >= len_b, 1, -1)
mask = len_a >= len_b
tmp = np.where(mask, len_a, len_b)
np.putmask(len_b, ~mask, len_a)
len_a = tmp
mask = len_c > len_b
mu = np.where(mask, len_b - (len_a - len_c), len_c - (len_a - len_b))
numerator = ((len_a - len_b) + len_c) * mu
denominator = (len_a + (len_b + len_c)) * ((len_a - len_c) + len_b)
mask = denominator != 0
angle = np.divide(numerator, denominator, where=mask)
np.sqrt(angle, out=angle)
np.arctan(angle, out=angle)
angle *= 2
angle(~mask) = np.pi
angle *= flipped
return angle(0)
```
```