Java and security bits
Hashing a file in 3 lines
As I was working on a Peabody contribution recently, I remembered a short program I wrote a couple of years ago. It shows that you can calculate the message digest of a file in 3 lines of code. Of course, I had since lost the program, so I started over:
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import static java.nio.channels.FileChannel.MapMode.READ_ONLY;
import java.math.BigInteger;
import java.util.*;
import java.security.MessageDigest;
public class Digest {
public static void main(String[] args) throws Exception {
if (args.length != 1) {
System.out.println("Usage: java Digest <file>");
return;
}
FileChannel channel = new RandomAccessFile(args[0], "r").getChannel();
ByteBuffer buffer = channel.map(READ_ONLY, 0, (int)channel.size());
digest("md5", buffer);
digest("sha1", buffer);
channel.close();
}
private static void digest(String algorithm, ByteBuffer buffer) throws Exception {
MessageDigest md = MessageDigest.getInstance(algorithm);
md.update(buffer.duplicate());
byte[] digest = md.digest();
new Formatter(System.out).format
("%-5s: %0" + (digest.length * 2) + "x%n",
algorithm, new BigInteger(1, digest));
}
}
Obviously, this is more than three lines, but the actual digesting is only three lines: getting the MessageDigest object [MessageDigest.getInstance()] hashing the file [md.update(ByteBuffer)] and getting the result [md.digest()].
The methods that take ByteBuffers were added to MessageDigest, Signature, and Cipher in JDK 5.0. They were designed for scalable network I/O using channels and securing them via an SSLEngine, but they work just as well for memory mapped files or just plain byte arrays. FWIW, I think ByteBuffers are underrated. They are to byte arrays as Collections are to Object arrays: more powerful and easier to use. But I disgress.
PS: I think it is also pretty clear that this program is not the smartest way to digest a file using multiple algorithms. Apart from the fact that it does not work for files larger than 2G at all (the limit for ByteBuffers), it also processes the entire file first using MD5 and then again using SHA1. That should really be done in chunks to take advantage of the OS page cache, but if I did all that I could not claim that I hashed the file using three lines of code ;-)
Posted at 00:03 Apr 12, 2006 by Andreas Sterbenz in Java | Comments[5]
Posted by Jamie Lawrence on April 12, 2006 at 05:16 AM PDT #
Posted by Steven Coco on April 12, 2006 at 07:56 AM PDT #
Posted by Kenneth on April 12, 2006 at 05:17 PM PDT #
Jamie, I admit that the title of the post is deceptive, but my excuse is that the actual hashing is only 3 lines ;-) The rest of the code is fairly simple and most boilerplate anyway.
Kenneth, like other objects in Java, memory mapped files are automatically released by the garbage collector when no longer referenced. Although having an additional explicit unmap method would be useful, there are good reasons why there is none, see 4724038. As for running out of address space, that should not happen as long as don't try to map too many large regions at the same time. However, there appears to be a bug in that area. I have contacted our NIO expert and will keep you posted on his response.
Posted by Andreas Sterbenz on April 21, 2006 at 06:30 PM PDT #
I have filed 6417205 to track the problem of running out of address space. If you encounter other issues with memory mapped files, file a bug or contact me.
Posted by Andreas Sterbenz on April 25, 2006 at 05:52 PM PDT #