The Constant Pool
We said that the constant pool contains a great deal of information. In fact it contains an interesting mixture of items. The constant pool combines the function of a symbol table for linking purposes as well as a repository for constant values and string literals present in the source code. It may be considered as an array of heterogeneous data types which are referenced by index number from other sections of the class file such as the Field and Method sections. In addition, many Java bytecode instructions take as arguments numbers which are in turn used as indexes into the constant pool.
The following table shows the types of entries in the constant pool, as defined by the current JVM:
Table . Constant Pool Entry Types
following table shows a dump of the constant pool for the PointlessButton class:
Table . Constant Pool Example
Beating the Decompilation Threat
The very real threat of decompilation is not going to go away. Decompilers work by recognizing patterns in the generated bytecode which can be translated back into Java source code statements. The field and method names required to make this source code more readable are readily available in the constant pool as we have seen.
To date, there have been two main approaches to thwarting would-be decompilers: code obfuscation and bytecode hosing 7 :
1. The principle of obscuring (or obfuscating) source code to make it more difficult to read is not new. In the UNIX world – where incompatibilities between platforms and implementations make it necessary to distribute many applications in source format – shrouding is common. This is the process of replacing variable names with meaningless symbols, removing comments and white space and generally leaving as little human readable content in the source code without impacting its compilability. The end result of obfuscation is that although a class file will decompile into valid Java, that valid Java will not be very readable by humans. Note that although obfuscation certainly makes decompilation more difficult and the Java file not readable, it might not protect your code against a determined adversary. You can think of copyrighting your code, although it is not an ideal solution, but it is better than nothing.
After the release of Mocha, the author released Crema, a further appalling coffee pun, which was designed to thwart Mocha. It did this by replacing names in the constant pool with illegal Java variable names and reserved words (such as if and class ). This had no affect on the JVM, which merely used the names as tags to resolve references without attributing any meaning to them. Nor did it actually prevent decompilation. It did however mean that the decompiled code was more difficult to read and understand and also would not recompile as the Java compiler would object to the illegal names.
2. Bytecode hosing is more subtle and is aimed at preventing the decompiler from recognizing patterns within the bytecode from which it could recover valid source. It does this by breaking up recognizable patterns of bytecodes with do-nothing instruction sequences (such as the NOP code or a PUSH followed by a POP). A good example of a bytecode hoser is HoseMocha.
Of course, this approach can be defeated, since once a hacker has established what types of do-nothing sequences are being generated by a bytecode hoser, he or she can modify the behavior of the decompiler to ignore such sequences. Furthermore, attempts to decompile hosed bytecode will generally result in broadly readable code interspersed with unintelligible passages rather than completely unreadable code.
In addition to this, bytecode hosers present a more insidious problem to Java users. The Execution Environment the principal method of optimizing Java performance is in the JVM and in particular through the use of just-in-time (JIT) compilation. And how do JIT compilers work? Yes, you guessed it, they recognize patterns in the generated bytecode that can be optimized into native code. Breaking up these patterns through the use of a bytecode hoser can seriously impact the performance of JIT compilers.
This is a well understood dilemma in security circles: the trade off between security and performance/price/ease-of-use.
The only safe course of action is to assume that all Java code will at some point be decompiled.
For developers this means ensuring that no sensitive information, like passwords or cryptographic keys, is distributed in the class file either algorithmically or as hard-coded values. This can be accomplished by building client/server type applications with a Java presentation layer which can be run anywhere and a secured server side where sensitive information or algorithms can be stored. This may also involve extending the development and testing process to ensure that distributed Java code is safe.
Also note that if a hacker is able to decompile your program, he can look for weaknesses in its security. This will help him in attacking your system more efficiently. Browser JVMs may become targets of such attacks.