I have a very large dataset (31552000 lines) of xyz coordinates in the following format

```
1 2 3
4 5 6
7 8 9
. . .
```

I have to take a distance using the special method below.

```
Distance({a_, b_, c_}, {d_, e_, f_}) :=
Sqrt((If(Abs(a - d) >= (40/2), Abs(a - d) - 40, Abs(a - d)))^2 + (If(
Abs(b - e) >= (40/2), Abs(b - e) - 40, Abs(b - e)))^2 + (If(
Abs(c - f) >= (40/2), Abs(c - f) - 40, Abs(c - f)))^2)
```

Then I import the data.

```
data = Partition(
Partition(ReadList("input.txt", {Real, Real, Real}), 16), 38);
```

The formatting is kind of strange. Every 16 rows is one molecule, and every 38 molecules is one timestep. I take the distance between the 16th atom of each molecule and the 5th atom of each molecule.Then I select the distances that are less than 5.55 and determine the length of the resulting list. This is repeated for each of the 29,000 timesteps.

```
analysis =
Flatten(
Table(
Table(
Length(
Select(
Table(
Distance(data((r, y, 16)), data((r, x, 5))),
{x, 1, 38}),
# <= 5.55 &)
),
{y, 1, 38}),
{r, 1, 29000})
);
```

This last section is my most computationally intensive part. For 29000 timesteps and 38 molecules, it takes 40 minutes to process fully. It also takes too much memory (16+ gigs per kernel) to parallelize. Is there any other method that will improve the performance? I have tried using compile, but I realized that Table, the biggest bottleneck, is already complied to machine code.

Below is an example of a dataset that takes my computer 2 minutes to complete with the analysis code. It is scalable to larger timesteps by changing 4000 to larger numbers.

```
data = Partition(
Partition(Partition(Table(RandomReal({0, 40}), (3*16*38*4000)), 3),
16), 38)
```