DevX Home    Premier Club   Newsletters   Skill Building   Code Library   Help   Shop DevX   
Intel Optimizing Center
Home
Intel Optimizing Center - Mobile Technologies
Intel Optimization Center - Digital Media
Intel Optimization Center - Itanium
Intel Optimization Center - Tools
Intel Optimization Center - Tips and Techniques
Intel Optimization Center - Hot Technologies
Sign up for the Intel Optimizing Center Update from DevX!
Intel Optimizing Center Update

More Newsletters
Accelerate your Java Apps with Optimized JVMs
Allan McNaughton explains how three optimized technologies—garbage collection, hyper-threading technology, and branching—can improve execution of your Java applications.
 

The Java language specification imposes unique and demanding requirements on the runtime environment. For example, Java has such a high level of abstraction that a seemingly simple block of source code can turn into a large number of implicit method calls, null checks, boundary checks, and exception handling calls. This type of code is a worst-case scenario for most processors due to heavy branching and extensive use of indirection. Processors that are not designed to work effectively with these types of operations will suffer from poor performance due to stalling in the instruction pipeline and frequent data cache misses.

Systems based on Intel architecture have proven to be a robust and high-performance deployment platform for Java on both the server and desktop. Start with the outstanding raw performance of the Intel Pentium 4 and Xeon processors, incorporate them into a high-performance system architecture, layer on a proven operating system such as Windows XP or Linux, top this with an Java Virtual Machine (JVM) optimized for Intel processors by vendors such as BEA or IBM, and you have a top-flight scalable Java execution environment.

Let's look at three of the technologies that directly—and dramatically—improve Java performance on Intel architecture: garbage collection, Hyper-Threading technology, and optimizations for branching.

Efficient Garbage Collection
For best performance, Java applications require software and hardware to cooperate to minimize bottlenecks and to maximize throughput. Nowhere in a Java solution is this more important than the area of garbage collection—the JVM's automatic system for reclaiming data objects no longer in use. It is typical for developers to focus their efforts on writing code to meet functional requirements. As a result, class hierarchies are commonly designed without a clear understanding of their impact on the heap during program execution. This neglect can lead to serious application performance and scalability problems as the system pauses non-deterministically for garbage collection.

JVM vendors such as BEA and IBM have placed a high emphasis on reducing the drag on performance by implementing capable garbage collection algorithms on the Intel platform. Original implementations of garbage collection simply did not scale on machines with more than one processor. This lack of scalability was caused by an algorithm design that stopped all Java threads from running while garbage collection was in progress. The problem became much more apparent as Java applications migrated away from the desktop to enterprise-class machines.

Intel and JVM designers worked together through Intel's partner support programs—such as the Early Access Program—to enhance these server-class machines with improved garbage collection algorithms that allow for concurrency between the garbage collection cycle and other threads processing user requests. These new designs result in significant scalability improvements as each processor in a n-way Xeon processor-based system can be kept busy doing useful work instead of waiting for a lengthy garbage collection and heap compaction cycle to complete. Due to the widespread deployment of Intel machines in enterprise computing, these innovations were made to JVMs running on Intel architecture prior to being ported to virtual machines running on other architectures.

Hyper-Threading Technology
In early 2002, Intel introduced an architectural innovation that results in even better Java performance and scalability. With Hyper-Threading technology, one physical processor can be viewed as two logical processors each with its own state. The performance improvements due to this design arise from the following factors: (a) JVMs schedule threads to execute simultaneously on the logical processors; (b) on-chip execution resources are utilized more efficiently than when a single thread consumes the execution resources.

Java should benefit from this Intel innovation more than any other commercial language due to its insatiable demand for threads. When Hyper-Threading technology becomes available on desktop processors, , Java performance on single-processor machines will improve dramatically due to interleaving the JVM threads (including the garbage collection thread).

Optimizations for Branching
State-of-the-art JVM implementations include sophisticated Just-In-Time (JIT) optimizing compilers to minimize application execution time. These tools take advantage of a static call graph inferred from the byte code structure and execution time profiling to determine the best feasible code generation. Due to the intrusive on-the-fly nature of JIT compilation, there will always be a limit to what optimizations can be accomplished in this phase. Although branching and indirection can be reduced by intelligent JIT compilation, a large burden still falls on the processor to quickly execute this complex code stream.

The Pentium 4 and Xeon processors were designed with these requirements in mind and provide tailored support for code heavy with branching and indirection. Advanced branch prediction circuitry is incorporated that can reduce delays for a correctly predicted branch to almost zero clock cycles. The JIT cooperates with the processor by generating code so that branches are predicted correctly in most cases. Indirection is dealt with efficiently by use of an L2 data cache of 512KB. This large cache combined with support for speculative loads (that is, loading memory before a branch is resolved) result in excellent throughput and utilization of memory bandwidth.

Java and the Intel platform are an excellent match with features designed for speed such as a 20-stage execution pipeline, double-clock arithmetic units, 12K micro-operation instruction trace cache, 512KB L2 cache, and Hyper-Threading support. When all else fails and a Java application just needs a faster processor, Intel is there with ever-increasing clock rates on the Pentium 4 and Xeon processors (up to 2.40 GHz/sec.). Keeping a processor of these speeds fed with instructions and data is accomplished by interfacing with the system bus at an effective data transfer rate of 3.2GB/second.

The Proof in the Pudding
Independent benchmarks clearly show that the Intel platform offers the best performance and price/performance for Java-based solutions. The top three scores for the ECperf benchmark are held by solutions running on Intel processors with divergent combinations of operating systems, databases, and application servers. The best performing solution on the SPECjvm98 benchmark, which focuses strictly on JVM/JIT performance, is also based on an Intel processor.

Java developers interested in getting and staying ahead on Java performance tips and tricks should sign up for Intel Developer Services (it's free),where they'll find lots of tips and technical papers.

Resources
An overview of the Intel Hyper-Threading technology.

Details about how Intel's NetBurst architecture improves performance.

Whitepaper discussing an approach to solving JVM performance problems.

Spotlight on IBM JVM performance and a timeline of performance improvements.

Benefits of Migrating from Sun Java to IBM Java 2 on Intel Architectures (PDF).

Review the details of independent benchmark testing of Java on Intel architecture.

Page 1 of 1
Allan McNaughton (amcnaughton@nc.rr.com) is a freelance contributor based in Research Triangle Park, NC. He has extensive real world experience with portal-class Web applications and software performance issues.