It’s a follow-up question.
A previous version
of this code has been posted on Code Review about 2 weeks ago.
What was done after the last review
In fact, the entire application is rewritten from scratch. Here
is current version with tests and JavaDoc.
I tried to split the application into small classes that have only one responsibility and can be re-used and extended.
The main approach to solve a problem has not changed. I still put one value in accordance with each ip and set a bit with the corresponding index in the bit array. The required amount of memory remained the same about 550-600 MB. The speed has increased, now I am practically limited to the performance of my hard disk. It is still assumed that only valid IPs will be in the file.
I deleted all JavaDoc comments from code examples, because they occupy more space than the code itself.
FixedSizeBitVector
I wrote a simple implementation of the bit array that allows you to store N bits with indexes from 0 to N-1. There are
three operations are supported: set bit, examine bit’s value, and get the number of all the set bits.
getCapacity()
method used for testing and may be useful in other cases.
I do not bring the BitVector
interface due to its primitiveness and simplicity.
package chptr.one;
public class FixedSizeBitVector implements BitVector {
public static final long MIN_CAPACITY = 1L;
public static final long MAX_CAPACITY = 1L << 32;
private final long capacity;
private final int() intArray;
private long cardinality;
public FixedSizeBitVector(long capacity) {
if (capacity < MIN_CAPACITY || capacity > MAX_CAPACITY) {
throw new IllegalArgumentException("Capacity must be in range (1.." + MAX_CAPACITY + ").");
}
int arraySize = 1 + (int) ((capacity - 1) >> 5);
this.intArray = new int(arraySize);
this.capacity = capacity;
}
private void checkBounds(long bitIndex) {
if (bitIndex < 0 || bitIndex >= capacity) {
throw new IllegalArgumentException("Bit index must be in range (0.." + (capacity - 1) + ").");
}
}
@Override
public void setBit(long bitIndex) {
checkBounds(bitIndex);
int index = (int) (bitIndex >> 5);
int bit = 1 << (bitIndex & 31);
if ((intArray(index) & bit) == 0) {
cardinality++;
intArray(index) |= bit;
}
}
@Override
public boolean getBit(long bitIndex) {
checkBounds(bitIndex);
int index = (int) (bitIndex >> 5);
int bit = 1 << (bitIndex & 31);
return (intArray(index) & bit) != 0;
}
@Override
public long getCapacity() {
return capacity;
}
@Override
public long getCardinality() {
return cardinality;
}
}
UniqueStringCounter
This class implements the counter of unique lines in the input Iterable<String>
sequence. The counter uses BitVector
implementation. To work, it is required that the input sequence has a final number of possible string-values and this
amount did not exceed the maximum capacity of the bit vector used.
It also requires a hash function that puts a String
and a long
in an unambiguous match.
package chptr.one;
import javax.validation.constraints.NotNull;
import java.util.Objects;
import java.util.function.ToLongFunction;
public class UniqueStringCounter {
private final Iterable<String> lines;
private final ToLongFunction<String> hashFunction;
private final BitVector bitVector;
private long linesProcessed;
public UniqueStringCounter(@NotNull Iterable<String> lines,
long capacity,
@NotNull ToLongFunction<String> hashFunction) {
Objects.requireNonNull(lines);
Objects.requireNonNull(hashFunction);
this.lines = lines;
this.hashFunction = hashFunction;
this.bitVector = new FixedSizeBitVector(capacity);
}
public long count() {
for (String line : lines) {
long value = hashFunction.applyAsLong(line);
bitVector.setBit(value);
linesProcessed++;
}
return bitVector.getCardinality();
}
public long getLinesProcessed() {
return linesProcessed;
}
}
IpStringHashFunction
Hash function to convert a String
to a long
value. This function must generate a unique value for each unique line,
the collisions are not allowed.
I tested several options for the conversion of IP-String in the long-value and came to the conclusion that they all work around at the same speed that the InetAddress
. I do not see the reasons to write my own implementation when the speed of the library function is completely satisfied.
The processing of exceptions really is not needed here, because according to the terms of the task, I am guaranteed that there will be only valid IP. I had to make this processing since otherwise I can’t use this function as a parameter for UniqueStringCounter
.
package chptr.one;
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.function.ToLongFunction;
public class IpStringHashFunction implements ToLongFunction<String> {
@Override
public long applyAsLong(String value) {
long result = 0;
try {
for (byte b : InetAddress.getByName(value).getAddress())
result = (result << 8) | (b & 255);
} catch (UnknownHostException e) {
throw new RuntimeException(e);
}
return result;
}
}
BufferedReaderIterable
A simple adapter that allows you to work with BufferedReader
as with Iterable
. I could not come up with a better way
to send the contents of the file in UniqueStringCounter
, which expects to the Iterable<String>
as parameter.
package chptr.one;
import javax.validation.constraints.NotNull;
import java.io.BufferedReader;
import java.io.IOException;
import java.util.Iterator;
import java.util.Objects;
public class BufferedReaderIterable implements Iterable<String> {
private final Iterator<String> iterator;
public BufferedReaderIterable(@NotNull BufferedReader bufferedReader) {
Objects.requireNonNull(bufferedReader);
iterator = new BufferedReaderIterator(bufferedReader);
}
public Iterator<String> iterator() {
return iterator;
}
private static class BufferedReaderIterator implements Iterator<String> {
private final BufferedReader bufferedReader;
private BufferedReaderIterator(BufferedReader bufferedReader) {
this.bufferedReader = bufferedReader;
}
@Override
public boolean hasNext() {
try {
return bufferedReader.ready();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
@Override
public String next() {
String line;
try {
line = bufferedReader.readLine();
if (line == null) {
try {
bufferedReader.close();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
return line;
}
}
}
IpCounterApp
The main class of the application. Accepts the file name in the -file parameter.
package chptr.one;
import java.io.BufferedReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Objects;
public class IpCounterApp {
private static String parseFileName(String() args) {
Objects.requireNonNull(args, "No arguments found. Use -file file-name to processing file.");
if (args.length == 2 || "-file".equals(args(0))) {
return args(1);
}
return null;
}
public static void main(String() args) {
String fileName = parseFileName(args);
if (fileName == null) {
System.err.println("Wrong arguments. Use -file file-name to processing file.");
return;
}
Path filePath = Paths.get(fileName);
if (!Files.exists(filePath)) {
System.err.printf("File %s does not exists.n", filePath);
return;
}
try {
System.out.printf("Processing file: %sn", filePath);
long startTime = System.nanoTime();
BufferedReader bufferedReader = Files.newBufferedReader(filePath);
Iterable<String> strings = new BufferedReaderIterable(bufferedReader);
UniqueStringCounter counter = new UniqueStringCounter(strings, 1L << 32, new IpStringHashFunction());
long numberOfUniqueIp = counter.count();
long linesProcessed = counter.getLinesProcessed();
long elapsedTime = System.nanoTime() - startTime;
System.out.printf("Unique IP addresses: %d in total %d.n", numberOfUniqueIp, linesProcessed);
System.out.printf("Total time: %d milliseconds.n", elapsedTime / 1_000_000);
} catch (IOException e) {
e.printStackTrace();
}
}
}
Why do I ask the new Review?
In previous question
I received a few excellent answers and one excellent bug report. I tried to take into account so many comments as I
could. I also tried to reconsider my approach to the design of classes. I am interested in mostly three questions:
- How are my new classes are suitable for re-use and extension? What other abstractions can be allocated here and should
it be done?
- What should I do with error processing? I know that now it is bad. I just try to fall as early as possible.
- And most importantly. Do I move in the right direction? I am a newbie in programming and get an overview for my code is
the only way to learn how to write a quality and understandable code.