The Java 2 Class File Format

The class file contains a lot more information than its cousin, the executable file. Of course, it still contains the same type of information: program requirements, an identifier indicating that this is a program and executable code (bytecode, in this case). However, it also contains some very rich information about the original source code.

The high level structure of a class file is shown in the following table:

Table . Class File Contents

Much here is as we would expect. There is information to identify the file as a Java class file, as well as the JVM on which it was compiled to run. In addition, there is information describing the dependencies of this class in terms of classes, interfaces 3 , fields, and methods. There is much more information than this however, buried within the constant pool : information which includes variable and method names within both this class file and those on which it depends.

Let’s explain in more detail the fields listed in Table :

• The magic number is a hexadecimal number identifying the class format and is always 0xCAFEBABE 4 .
• The values of minor version and major version are the minor and major versions of the compiler that produced this class.
• The constant pool is a table of variable length structures representing various string constants, class names, field names, and other constants that are referred to.
• The access flag is a mask of modifiers used with the class and interface declarations (for example, ACC_PUBLIC for public class or interface, ACC_FINAL for a final class etc.
• The interfaces field is an array of entries describing the interfaces implemented by the class.
• The fields field is an array of entries describing the class variables declared by this class or interface. It does not include those inherited.
• The methods field is an array of entries describing the methods declared by this class or interface.
• The only attribute defined for the attributes table is SourceFile , which indicates the name of the source file from which the class was created.

The main thing to understand at this point is that the inclusion of all of this information makes the job of a hacker much simpler in many ways.

Decompilation Attacks

One of the areas seldom discussed when considering security implications of deploying Java is that of securing Java assets. Often
considerable effort is put into developing software and the resultant intellectual property can be very valuable to a company.

Hackers are a clever (although potentially misguided) bunch and there are many reasons why they might want to get inside your code. Here are a few:

• To steal a valuable algorithm for use in their own code
• To understand how a security function works to enable them to bypass it
• To extract confidential information (such as hard-coded passwords and keys)
• To enable them to alter the code so that it behaves in a malicious way (such as installing Trojan horses or viruses)
• To demonstrate their prowess
• For their entertainment(much as other people might solve crosswords)

The chief tool in the arsenal of the hacker in these cases is the decompiler. A decompiler , as its name suggests, undoes the work performed by a compiler. That is, it takes an executable file and attempts to re-create the original source code.

Advances in compiler technology now make it effectively impossible to go from machine code to a high-level language such as C. Modern compilers remove all variable and function names, move code about to optimize its execution profile and, as was discussed previously, there are many possible ways to translate a high-level statement into a low-level machine code representation. For a decompiler, to produce the original source code is impossible without a lot of additional information which simply is not shipped in an executable file.

It is, however, very easy to recover an assembler language version of the program. On the other hand, the amount of effort required to actually understand what such a program does makes it far less worthwhile to the hacker to do. So, it is fair to say that it is impossible to completely protect any program from tampering.

When the Java Development Kit (JDK) 1.0.2 was shipped, a decompiler named Mocha was quickly available which performed excellently. It was able to recover Java source code from a class file. It was so successful that at least one person used it as a way of formatting his source code! In fact the only information lost in the compilation process (and unrecoverable using Mocha) are the comments. However, if meaningful variable names are used in the code (such as accountNumber , or password ), then it is readily possible to understand the function of the code, even without the comments.

Already, there are decompilers available, like SourceAgain , which can decompile Java codes including those programs written with the Java 2 SDK using new APIs.

Figure . Decompiled Count.class

You can see that the code has been successfully decompiled. Only small things like the name of the variables are changed.

There can be some advantages of having a decompiler:
1. Recovery of lost source code (by accident or otherwise)

2. Migration of applications to a new hardware platform
3. Translation of code written in obsolete languages not supported byncompiler tools nowadays
4. Determination of the existence of viruses or malicious code in the program
5. Recovery of someone else’s source code.

As long as you are decompiling your own code with your own decompiler or a freely available one, you are safe. But once you decompile someone else’s code, there may be legal and moral issues. Many programs are protected by copyright laws and license agreements.