IanG on Tap

Ian Griffiths in Weblog Form (RSS 2.0)

Blog Navigation

April (2018)

(1 item)

August (2014)

(1 item)

July (2014)

(5 items)

April (2014)

(1 item)

March (2014)

(1 item)

January (2014)

(2 items)

November (2013)

(2 items)

July (2013)

(4 items)

April (2013)

(1 item)

February (2013)

(6 items)

September (2011)

(2 items)

November (2010)

(4 items)

September (2010)

(1 item)

August (2010)

(4 items)

July (2010)

(2 items)

September (2009)

(1 item)

June (2009)

(1 item)

April (2009)

(1 item)

November (2008)

(1 item)

October (2008)

(1 item)

September (2008)

(1 item)

July (2008)

(1 item)

June (2008)

(1 item)

May (2008)

(2 items)

April (2008)

(2 items)

March (2008)

(5 items)

January (2008)

(3 items)

December (2007)

(1 item)

November (2007)

(1 item)

October (2007)

(1 item)

September (2007)

(3 items)

August (2007)

(1 item)

July (2007)

(1 item)

June (2007)

(2 items)

May (2007)

(8 items)

April (2007)

(2 items)

March (2007)

(7 items)

February (2007)

(2 items)

January (2007)

(2 items)

November (2006)

(1 item)

October (2006)

(2 items)

September (2006)

(1 item)

June (2006)

(2 items)

May (2006)

(4 items)

April (2006)

(1 item)

March (2006)

(5 items)

January (2006)

(1 item)

December (2005)

(3 items)

November (2005)

(2 items)

October (2005)

(2 items)

September (2005)

(8 items)

August (2005)

(7 items)

June (2005)

(3 items)

May (2005)

(7 items)

April (2005)

(6 items)

March (2005)

(1 item)

February (2005)

(2 items)

January (2005)

(5 items)

December (2004)

(5 items)

November (2004)

(7 items)

October (2004)

(3 items)

September (2004)

(7 items)

August (2004)

(16 items)

July (2004)

(10 items)

June (2004)

(27 items)

May (2004)

(15 items)

April (2004)

(15 items)

March (2004)

(13 items)

February (2004)

(16 items)

January (2004)

(15 items)

Blog Home

RSS 2.0

Writing

Programming C# 5.0

Programming WPF

Other Sites

Interact Software

IL (or Java Bytecode) in Silicon - Just Say No

Thursday 12 August, 2004, 05:48 PM

In a recent discussion on a mailing list I subscribe to, someone put forward the idea of a CPU that could execute .NET IL directly without the need for a JIT compilation step. This old chestnut crops up time and time again - before .NET, people used to suggest silicon that could execute Java bytecode directly, and LISP used to be a popular candidate for similar ideas many years ago. The idea tends to come up every time a new virtualised execution environment becomes popular.

The thinking behind these suggestions is that a native implementation in silicon must surely be faster than mapping everything into the execution model of some general purpose CPU. After all, removing the need to perform this mapping should remove some overhead, surely? In practice this turns out to be wrong, for two reasons.

Relentless Progress of General Purpose CPUs

The first reason highly specialised microprocessors targeting a particular virtual machine tend to perform less well than their general purpose counterparts is simply down to the economics of general purpose CPUs. The market for high performance general purpose processors is so vast that the companies that make them invest billions of dollars in making ever faster chips. Since they have, remarkably, managed to achieve sustained exponential performance improvements over the last few decades, even if a special-purpose chip has an edge, general purpose CPUs will eventually catch up. (Until Moore's law comes to an end of course...)

Unless the creator of a special-purpose chip is prepared to get onto the same treadmill as the CPU manufacturers, they will eventually lose their edge. And in the cases where someone actually went as far as creating the special purpose CPUs rather than merely proposing them, this is pretty much what happened. There were LISP machines that were faster than their general purpose counterparts for a while, but they soon got eclipsed. A few years ago Sun made a lot of noises about a Java chip that was supposedly real, but it never saw the light of day - perhaps its performance was behind the curve even from the start.

There are exceptions - some special purpose chips have kept up with the pace, graphics accelerators being the obvious example. Since graphics chips are maintaining the same exponential rate of improvement as CPUs, graphics processors still offer a worthwhile benefit, keeping a few years ahead of what a modern CPU can do. (Of course there are also the architectural benefits of having the dedicated memory and bus for graphics, but if that were the only advantage, graphics card vendors wouldn't bother to develop GPUs - they'd just stick a general purpose CPU on the graphics card.)

A clear example of where general purpose CPUs have become fast enough to render special purpose chips obsolete is in mobile phones. (Or 'cellphones', as I believe they're usually called in the USA.) A few years ago, all digital mobile phones had two processors in them: a general purpose one to handle the signalling protocol and user interface, and a specialised digital signal processor (DSP). The DSP used to be necessary in order to encoded and decode the compressed voice data. It also did a lot of processing work on the incoming signals in order to cope with reflections and other signal distortions that afflict mobile receivers. (A large proportion of the bandwidth for a typical digital mobile phone signal is taken up with a fixed training pattern that the phone uses to work out exactly how the signal is being distorted right now, so that it can correct for these distortions.)

Having these two processors was a problem. The extra processor made phones bigger, shortened battery life, and decreased reliability as a result of the increased component count. But it was necessary because only a specialised DSP was fast enough to perform the necessary processing, while a general purpose CPU was required to handle the signalling protocol and user interface. But a few years ago, the performance of low-power embedded CPUs (and in particular the ARM CPU) got to the point where one general purpose CPU could do all of the work. This meant that there was no longer any need for a DSP. This is why there was a step change in the size and battery life of mobile phones a few years back - they all moved over to having a single general purpose CPU instead of a pair of processors, enabling them to become smaller and to consume less power.

Benefits of Non-Native Execution

The other reason that a VM-specific processor has a hard time competing with a general purpose CPU is a more fundamental problem. What initially looks like a disadvantage of a general purpose CPU turns out to be a net win in practice: the transformation that must be performed from the virtual execution model of IL or bytecode to real executable code is not pure overhead; it actually has a beneficial effect.

Raw IL code is a long way from being optimal, largely because IL was designed from day 1 to be compiled prior to execution. Blindly executing IL will lead to distinctly suboptimal behaviour. This means that anything that executes IL natively is at a significant disadvantage to a general purpose CPU. The specialized processor would need to be several times faster simply to keep up with the general purpose CPU because of the performance improvements contributed by the JIT compiler.

The only benefit that direct execution of IL or bytecode offers is that it avoids the startup delays that JIT compilation can introduce. But there are easier ways of dealing with that than designing new silicon. Precompiling (e.g. using ngen on .NET) can work well. Another approach may be to interpret the IL or bytecode to start with, and do compilation later, possibly in the background. (Maybe that's AIT - Almost In Time compilation...) My understanding is that the current Java hotspot technology does something like this, although I think that's more because it wants to defer compilation until it has a better understanding of how the code is being used in order to optimize it better.

You might argue that maybe the C# or Java compilers should just do more optimizations. But an awful lot of the optimizations performed by the JIT compiler are going to be transformations that couldn't be represented in IL anyway, because they only make sense in the execution model of the target processor. (Register allocation is an obvious example - it's a crucial part of optimizing for any modern CPU, but IL doesn't even have a concept of a register! Nor does Java bytecode.)

You could argue that these intermediate formats are just using the wrong model - maybe they should have chosen an abstract execution model that could encapsulate more of the optimizations. But this might run into a couple of problems. It might then become hard to verify the code for type safety. And by being more concrete the execution model might be a bad match for some CPU designs. Indeed, Serge Lidin (the guy at Microsoft who designed IL) recently posted this quote from Ori Gershony (a JIT compilation expert at Microsoft) concerning the execution model used by IL:

"The machine state model is an abstract model that defines the semantics of managed code. The actual implementation can be completely different as long as it preserves the semantics of the machine state model. This was done on purpose, to allow different implementations to make different choices that lead to various tradeoffs (simplicity, performance, etc.)."

Any attempt to come up with a more concrete alternative to IL would lose this property. In any case, IL and Java bytecode are pretty well entrenched, so pragmatically, such an approach wouldn't help us run the code we have today any faster.

So a targeted optimising transformation from IL to a general purpose CPU is likely to outperform specialised hardware for the foreseeable future.

Copyright © 2002-2024, Interact Software Ltd. Content by Ian Griffiths. Please direct all Web site inquiries to webmaster@interact-sw.co.uk