I recently decided to learn how to multithread Java programs, so I made a small program to compare the performance of serial and multithreaded programs that perform the same task.

I created a serial program that calculates the number of primes from 1 to 10 million, and timed it 50 times using a test program. Here’s the code for the serial program:

```
import java.util.Locale;
import java.text.NumberFormat;
/**
* A serial program that calculates the number of primes less than a
* certain number (which is hard coded). Used as a basis for
* benchmarking the multi-threaded programs that do the same thing.
*
* @author Tirthankar Mazumder
* @version 1.2
* @date 2nd May, 2021
*/
public class PrimalityTestSerial {
public static final long max = 10_000_000;
public static void main(String() args) {
final long startTime = System.currentTimeMillis();
long num_primes = primeCalculator();
final long endTime = System.currentTimeMillis();
NumberFormat nf = NumberFormat.getInstance(Locale.US);
System.out.println("Number of primes less than " + nf.format(max) + ": " + num_primes);
System.out.println("Took " + (endTime - startTime) + " ms.");
System.out.println();
}
private static boolean isPrime(long l) {
long upper_bound = (long) Math.floor(Math.sqrt(l));
for (long i = 2; i <= upper_bound; i++) {
if (l % i == 0)
return false;
}
return true;
}
public static long primeCalculator() {
long num_primes = 0;
for (long i = 2; i <= max; i++) {
if (isPrime(i))
num_primes++;
}
return num_primes;
}
}
```

Here’s the code for the worker class used in the multithreaded version:

```
/**
* A worker class for calculating the number of primes from start to end,
* which are private member variables. Instances of this class are used
* in the multithreaded version of PrimalityTestSerial.
*
* @author Tirthankar Mazumder
* @version 1.2
* @date 3rd May, 2021
*/
public class PrimalityTestWorker1 implements Runnable {
//Member variables
public static long totalPrimeCount = 0;
private final long start;
private final long end;
public PrimalityTestWorker1(long start, long end) {
this.start = start;
this.end = end;
}
private synchronized void increment(long num) {
totalPrimeCount += num;
}
private static boolean isPrime(long l) {
long upper_bound = (long) Math.floor(Math.sqrt(l));
for (long i = 2; i <= upper_bound; i++) {
if (l % i == 0)
return false;
}
return true;
}
private void numPrimes() {
long primeCount = 0;
for (long i = start; i <= end; i++) {
if (isPrime(i))
primeCount++;
}
increment(primeCount);
}
public void run() {
numPrimes();
Thread.yield();
}
}
```

Here’s the main program which uses instances of PrimalityTest1Worker as threads:

```
import java.util.Locale;
import java.text.NumberFormat;
/**
* The master program for the multithreaded primality test that creates
* objects of the PrimalityTestWorker1 to make threads, and then collates
* the results and prints them to stdout.
*
* @author Tirthankar Mazumder
* @version 1.0=2
* @date 3rd May, 2021
*/
public class PrimalityTestParallel1Runner {
public static final int cores = Runtime.getRuntime().availableProcessors();
//We will spawn as many threads as there are cores on the system, and not
//more than that because we are not I/O bound here.
public static final long max = PrimalityTestSerial.max;
//For consistency.
public static void main(String() args) {
long startTime = System.currentTimeMillis();
primeCalculator();
long endTime = System.currentTimeMillis();
NumberFormat nf = NumberFormat.getInstance(Locale.US);
System.out.println("Number of primes less than " + nf.format(max) + ": " +
PrimalityTestWorker1.totalPrimeCount);
System.out.println("Took " + (endTime - startTime) + " ms.");
System.out.println();
}
public static void primeCalculator() {
Thread arrThreads() = new Thread(cores);
long chunk = max / cores;
long threadStart = 2;
long threadEnd = threadStart + chunk;
for (int i = 0; i < cores; i++) {
Thread t = new Thread(new PrimalityTestWorker1(threadStart, threadEnd));
t.start();
arrThreads(i) = t;
threadStart = threadEnd + 1;
threadEnd = (threadEnd + chunk > max) ? max : threadEnd + chunk;
}
for (int i = 0; i < cores; i++) {
try {
arrThreads(i).join();
} catch (InterruptedException e) {
System.out.println("Was interrupted.");
return;
}
}
}
}
```

Finally, here’s the code for the testing program, which runs each program 50 times and then calculates the average runtimes:

```
import java.util.Arrays;
/**
* A wrapper class that handles benchmarking the performances of
* PrimalityTestSerial and PrimalityTestParallel1Runner and then
* prints information about the results to stdout.
*
* @author Tirthankar Mazumder
* @version 1.0
* @date 8th May, 2021
*/
public class PrimalityTestSuite {
public static final int n = 50;
//Number of test runs to perform
public static void main(String() args) {
long totalSerialTime = 0;
long totalParallelTime = 0;
long serialTimes() = new long(n);
double avgSerialTime = 0;
double avgParallelTime = 0;
System.out.println("Starting Serial runs...");
long startTime = System.currentTimeMillis();
for (int i = 0; i < n; i++) {
PrimalityTestSerial.primeCalculator();
serialTimes(i) = System.currentTimeMillis();
}
for (int i = 0; i < n; i++) {
serialTimes(i) -= startTime;
for (int j = 0; j < i; j++) {
serialTimes(i) -= serialTimes(j);
//to get rid of the time taken by the previous runs
}
avgSerialTime += serialTimes(i);
}
avgSerialTime /= n;
long parallelTimes() = new long(n);
System.out.println("Starting parallel runs...");
startTime = System.currentTimeMillis();
for (int i = 0; i < n; i++) {
PrimalityTestParallel1Runner.primeCalculator();
parallelTimes(i) = System.currentTimeMillis();
}
for (int i = 0; i < n; i++) {
parallelTimes(i) -= startTime;
for (int j = 0; j < i; j++) {
parallelTimes(i) -= parallelTimes(j);
//to get rid of the time taken by the previous runs
}
avgParallelTime += parallelTimes(i);
}
avgParallelTime /= n;
Arrays.sort(serialTimes);
Arrays.sort(parallelTimes);
double bestThreeSerialAvg = (serialTimes(0) + serialTimes(1)
+ serialTimes(2)) / 3;
double bestThreeParallelAvg = (parallelTimes(0) + parallelTimes(1)
+ parallelTimes(2)) / 3;
System.out.println();
System.out.println("Results:");
System.out.println("Average of " + n + " Serial Runs: " + avgSerialTime + " ms.");
System.out.println("Average of " + n + " Parallel Runs: " + avgParallelTime + " ms.");
System.out.println();
System.out.println("Average speed-up: " + avgSerialTime / avgParallelTime + "x");
System.out.println();
System.out.println("Average of best 3 Serial Runs: " + bestThreeSerialAvg + " ms.");
System.out.println("Average of best 3 Parallel Runs: " + bestThreeParallelAvg + " ms.");
System.out.println();
System.out.println("Average speed-up (w.r.t. best run times): "
+ bestThreeSerialAvg / bestThreeParallelAvg + "x");
System.out.println();
}
}
```

Here are the results from the test program:

```
Starting Serial runs...
Starting parallel runs...
Results:
Average of 50 Serial Runs: 4378.92 ms.
Average of 50 Parallel Runs: 1529.2 ms.
Average speed-up: 2.8635364896678x
Average of best 3 Serial Runs: 4328.0 ms.
Average of best 3 Parallel Runs: 1297.0 ms.
Average speed-up (w.r.t. best run times): 3.3369313801079414x
```

From here, it is obvious that the average speed-up is simply around 3x. However, this is surprising because I expect the multi-threaded program to run 7 to 8 times faster (because the serial program uses just 1 core, whereas the multi-threaded program should use all 8 cores on my system.)

So my question is, **why is the multi-threaded program not as fast as I expect it to be?**