Sunday, June 03, 2007

Java running faster than C

Note: A lot of people seem to be taking this post to be the "Ultimate C vs Java shootout". It's not. Performance is a very complex topic. My only real point is this: Java (which used to be slow) has reached the class of "fast languages". For the majority of applications, speed is no longer a valid excuse for using C++ instead of Java.

I just saw this page comparing the performance of several languages on a simple Mandelbrot set generator. His numbers show Java being over twice as slow as C, but then I noticed that he's using an older version of java and only running the test once, which doesn't really give the JVM a chance to show off.

I quickly hacked up the code to run 100 iterations 3 times and then used my standard "go fast" flags (there may be better flags, but I'm lazy). Here are my results:

$ java -server -XX:CompileThreshold=1 Mandelbrot 2>/dev/null
Java Elapsed 2.994
Java Elapsed 1.926
Java Elapsed 1.955

$ gcc -O8 mandelbrot.c
$ ./a.out 2>/dev/null
C Elapsed 2.03
C Elapsed 2.04
C Elapsed 2.05

C still wins on the first iteration, but Java is actually slightly faster on subsequent iterations!

Obviously the results will be different with different code and different machines, but it's clear that the JVM is getting quite fast.

This test was run with Java 1.6.0-b105 and gcc 4.1.2 under Linux 2.6.17 under Parallels on my 2.33GHz Core 2 Duo MacBook Pro. Here is the hacked up code: Java and C.


For extra fun, I also tried running the JS test using the Rhino compiler:

$ java -cp rhino1_6R5/js.jar -server -XX:CompileThreshold=1 org.mozilla.javascript.tools.shell.Main -O 9 mandelbrot.js 2>/dev/null
JavaScript Elapsed 21.95
JavaScript Elapsed 17.039
JavaScript Elapsed 17.466
JavaScript Elapsed 17.147

Compiled JS is about 9x slower than C on this test. If CPU speed doubles every 18 months, then JS in 2007 performs like C in 2002.


Update: A few more cpp flags have been suggested. -march=pentium4 helps a little, but it's still slower than Java.

$ gcc -O9 -march=pentium4 mandelbrot2.c
$ ./a.out 2>/dev/null
C Elapsed 1.99
C Elapsed 1.99
C Elapsed 1.99

Adding -ffast-math puts C in the lead, but I'm not sure what the downside is. The gcc man page says, "This option should never be turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions." That sounds like an optimization that Java might not use.

$ gcc -ffast-math -O9 -march=pentium4 mandelbrot2.c
$ ./a.out 2>/dev/null
C Elapsed 1.66
C Elapsed 1.67
C Elapsed 1.67


Update: Several people have claimed that the performance difference is due to fputs (including the top rated comment on reddit, aumusingly). That is not correct, at least not on my computer. I tried replacing the print calls with a trivial function (but with side-effects), and it actually helped the Java more than the C:

C Elapsed 1.88
Java Elapsed 1.554

Many people have pointed out that '-O8' is more than enough 'O' levels. I know, and I don't care -- it's just as good as '-O3' or whatever.

48 comments:

Sriram said...

Can you try running the same tests using the managed Javascript in Silverlight 1.1. I'm curious to see how it compares

cg said...

Regarding JS test, I am more interested in testing it with next JS version Tamarin, (or test it with Actionscript 3.0, its VM is Tamarin.)

vii said...

Firstly, this is not a fair test of language implementation speed as it only measures the ability of the compiler to deal with simple arithmetic.

However, given these reservations, why not level the playing field and compile the C file with optimizations for your processor?

(For me on my low voltage Core Duo at 1GHz.)

With -O9

C Elapsed 3.31
C Elapsed 3.13
C Elapsed 3.14

With -O9 -march=pentium4

C Elapsed 3.03
C Elapsed 2.97
C Elapsed 2.97

With -ffast-math -O9 -march=pentium4

C Elapsed 2.62
C Elapsed 2.57
C Elapsed 2.57

So I think that Java would lose.

I don't want to say that Java is always slower than C, because the JVM has many opportunities to do profile based optimizations that C doesn't.

Kostas said...

Nice test but just one comment: there is no -O8, -O9 or -O100000000000. There is only -O, -O1, -O2, -O3 and -Os. -Owhatever greater than 3 is the same as -O3 (or ignored - i don't remember which).

teki said...

Not only Java got better but the processors too. This test proves two things:
- the Java compiler doesn't generate too much overhead where it isn't needed
- the processors are much better in eating the filling code

Wouter said...

Java is getting fasst enough. Now they should work on getting the JVM's horrible memory footprint down!

Perseus said...

the real slowdown of java comes in real projects. than java really needs to run on nasa computers...

Sam said...

Another reason why the linked page has slow results could be that it is running on a PPC processor, and AFAIK there is no JIT on PPC yet (unless you use the IBM JVM for Linux), even for Java 6.

In my personal experience, Java numerical code is comparable to C only if the C code has not been enhanced with assembly instructions. In our codebase the only JNI we use (need?) is for matrix/vector operations, which are sent to machine specific BLAS/LAPACK. On these critical operations, Java can be up to 20 times slower than the super optimised native code... but that's not a fair test at all because the BLAS/LAPACK are accessing specially designed instructions (e.g. altivec on PPC).

I've never seen a high level algorithm (for desktop use) written in C outperform a Java version by an order of magnitude. And it would certainly take longer to write the C version! There have been cases I've seen where an academic has written a C implementation of an algorithm and it is so bad that our Java implementation (straight from the research paper) will beat it on both memory usage and cpu cycles.

I guess the moral is that good Java beats bad C, even for numerical code. And that average/good Java is comparable to average/good C... so leave Emacs alone, get back into Eclipse and get some work done ;-)

Andrew.Dashin said...

Weird tests...

Anonymous said...

Like java, lisp is slow on the first iteration, but faster than C:

$ echo '(load "mandelbrot.lisp")(with-open-file (*standard-output* "/dev/null" :direction :output :if-exists :append)(dotimes(y 3)(time(dotimes(x 100)(run2)))))(quit)' |sbcl 2>/dev/null |grep 'real time$'
0.614 seconds of real time
0.169 seconds of real time
0.163 seconds of real time

C on the same machine:
$ gcc -O6 -o mandelbrot2 mandelbrot2.c && ./mandelbrot2 2>/dev/null
C Elapsed 2.09
C Elapsed 2.09
C Elapsed 2.09

http://www.scribd.com/doc/94953/mandelbrotlisp

Chris said...

Isn't this showing that the Java JIT compiler is doing it's job on arithmetic?

ffast-math trades speed for the correct answer, bad things happen if you use it to compile arbitrary code.

Alex said...

I'm not sure whether it matters that Java or C is actually faster here. I think it's more important that they are effectively the same order of magnitude! In other words, for the vast majority of kinds of apps, choosing C or Java will not affect whether the program is "fast enough". Since the perception for a long time has been the opposite, I think that's important.

For comparison, there may be many apps where the performance difference between say Ruby and Java is significant in determining whether the language is "fast enough".

And of course we need to remember that these things change rapidly.

Matthew said...

when I run the C code unmodified I get:

C Elapsed 4.29
C Elapsed 4.28
C Elapsed 4.32
(Old P4)

When I make a small optimization I get this:

C Elapsed 2.77
C Elapsed 2.77
C Elapsed 2.77

int mandelbrot
(float x, float y)
{
float cr = y - 0.5;
float ci = x;
float zi = 0.0;
float zr = 0.0;
int i = 0;

for(i = 1 ;
i < MAX_ITERATIONS ; ++i) {
float temp = zr * zi;
float zr2 = zr * zr;
float zi2 = zi * zi;
zr = zr2 - zi2 + cr;
zi = temp + temp + ci;
if (zi2 + zr2 > BAILOUT)
return i;
}
return 0;
}

John Nowak said...

As usual, this says nothing. The C code isn't optimal (as pointed out by Matthew). Also, just as Java benefits from running more than once, so can C. Making use of -fprofile-arcs can make a big difference in silly numeric benchmarks like this.

Darkfold said...

Yeah, running with floats not doubles took me from 0.015569 to 0.014435 on the Intel compilers (free trials Woohooo). Its good fun :D. I bet a little time working over the C code could speed it up quite a bit more.

Settings -O3, intel64.

GCC would turn in times of 0.016862 with the default OS X release settings and the -O3 and 64bit math.

igouy said...

iirc JVM will use SSE instructions - instead of -ffast-math give gcc the same opportunity to use SSE.

-mfpmath=sse -msse2

(or is it -msse3 on your machine?)

igouy said...
This comment has been removed by the author.
igouy said...

Paul wrote "... only running the test once, which doesn't really give the JVM a chance to show off"
Such an amusing way of brushing aside all that class-loading, chug chug chug...

-XX:CompileThreshold=1
Just use an ahead of time compiler and have done with it! :-)

Barry said...

The people who are saying the C code isn't optimized are really missing the point here. If you spend time reworking C you can optimize it so it will out perform Java. But that time would can be better spent writing more code and getting more work done than just retooling the same section of code so it goes uber fast.

Nick Halstead said...

What an absolute rubbish comparison. This proves nothing except that 90% of the time is taken performing the arithmetic calculations. C is as low level as you can get without working directly in assembler. If you know Assembler then you know C maps rather well to most x86 instructions. Which is why you do 'register int i=0;' to make sure the compiler is aware you want to dedicate one register to that variable.

I worked in the game industry for over 10 years and how many PC Games are written in Java none? why because C/C++ is are miles ahead in speed in the hands of people who know how to extract every last ounce out of the CPU. And I can assure you if Java was fast enough it would be used as it would assist in crossplatform development.

John Connors said...

I'd be a lot more impressed if the respondes actually understood the gcc optimisations flags.

As another poster said, gcc -O3 is the maximum optimisation. I'd also ad d -ffast-math -march=pentium4 -mfpmath=sse,i387

It'd also be interesting to compare -Os with -O3 : I've seen -Os generate *faster* code many times due to cache hit wins.

Laurent Szyster said...

Sorry Paul, but peer reviewed benchmarks consistantly demonstrate that Java is significantly slower than C ... and to achiev that has to consume a lot more memory.

http://shootout.alioth.debian.org/debian/benchmark.php?test=all&lang=gcc&lang2=java

In the real-world, Java applications are not only slower than C applications: they also consume between 10 and 100 times more RAM.

The problem of trading space for speed is that a program can run as long as it needs but it cannot use more memory than what is made available by the system.

Which is quite an embarassement for long-running programs that cannot tolerate to run their host out of memory.

Add to that the tendency of Java developpers to make it infuckingcredibly complicated whenever a simple solution exists and you end up with J2EE CMS that demand 512MB of RAM just to boot and won't stay up longer than a few days of heavy use before they gobble all memory available.

chubbard said...

So in order to get C to run faster you have to play with silly complier optimizations none of which the C programmers can agree on. And, some of the options can be quite dangerous in general?! Hilarious. It's like watching 3 nerdy kids fight over the a video game controller. "Nooo let me. You're not doing it right".

Face it C just isn't the fastest kid around anymore.

Mark said...

Paul - I think java does use the -ffast-math optimisation. It is used by default, and can be turned off using the 'strictfp' keyword.

Mystilleef said...

Did the GCC manual also tell you there is no -O8 or -O9 optimization?

-Os, -O2 and -O3 are the only valid -O optimizations recognized by GCC.

KGR said...

When running the java code on my Dell laptop, if I change:

zi = temp + temp + ci;

to
zi = 2 * temp + ci;

then I get slightly better results.

I wonder why this is?

lborupj said...


When running the java code on my Dell laptop, if I change:

zi = temp + temp + ci;

to
zi = 2 * temp + ci;

then I get slightly better results.

I wonder why this is?


Probably because the compiler can replace the 2 * temp with a shift left operation

David Pollak said...

Back in 1997/1998, I wrote a spreadsheet application (Integer, http://athena.com) and was able to match Excel's numeric performance on a single CPU box and scale linearly up to 16x on SMP boxes.

The application was more than just a "snippet" of code, but was a full-on application that did "lots of stuff."

Complex Java code can be as fast or faster than C code. Except for hairy stuff that's better shuttled off to a special purpose processor (e.g., GPU), it's easier to write Java code that runs faster than C code in the "whole application" context.

Kieron said...

Just an observation, the statement "If CPU speed doubles every 18 months[...]" implies this might be true. It isn't. Far from it, as you might have noticed, this is not happening. Moore's law actually states that the number of transisters doubles every 18 months, not the speed. That is why we are now going multi-core, because they can't ramp the speed up very much.

Kevin Greer said...

lborupj - Probably because the compiler can replace the 2 * temp with a shift left operation.

But 'temp' is a floating point number so bit shifting isn't going to work.

igouy said...

Following on from Matthew's replacement of the while loop by a for loop ....

Simply using a while loop in an ordinary way maybe marginally faster than the for loop -

int mandelbrot(float x, float y)
{
float cr = y - 0.5;
float ci = x;
float zi = 0.0;
float zr = 0.0;
int i = 0;
while(i <= MAX_ITERATIONS) {
i++;
float temp = zr * zi;
float zr2 = zr * zr;
float zi2 = zi * zi;
zr = zr2 - zi2 + cr;
zi = temp + temp + ci;
if (zi2 + zr2 > BAILOUT) return i;
}
return 0;
}

igouy said...

mea culpa it looks like -mfpmath=sse -msse2 makes no noticeable difference on my machine.

Andrew Binstock said...

"If CPU speed doubles every 18 months..." CPU speed does not double every 18 months. I suspect you're thinking of Moore's Law, which says that component density doubles every 18-24 months (he modified the rule several times). The use of the Law as a proxy for speed is not correct. This is even more true in the multicore era.

Barry said...

Well I think everyone is missing something here. If you profile the C program, you will see that the bottleneck is not in the math calculations at all. The bottleneck is in fputs. If you replace these with cout the program runs 2X faster than Java! Of course, now you're using C++. ;) I think math calculations are still going to be much faster in C than in Java which is one reason C and C++ are still used the most in games.

igouy said...

Barry wrote "The bottleneck is in fputs. If you replace these with cout the program runs 2X faster than Java!"

afaict commenting them out completely seemed to make the c program ~10% faster?

of course, you also changed the java program to use BufferedOutputStream and single byte output, didn't you?

Barry said...

My point is, it seems this example is trying to compare the speed of math calculations between 2 languages. To do that you need to take out ALL other factors; in this case printing to the console. You cannot simply comment out System.err.print in Java because more than likely the JVM is going to optimize the run2 method completely out since it doesn't do anything without the trace statements. This is an invalid test case for comparing math calculation speed. Try putting the timer around the math calculations themselves and see what happens. Another thing is that it's extremely difficult ( if not impossible ) to write code that is equivalent in Java and C unless you are a JVM guy and know the internals of how it performs optimizations at runtime.

igouy said...

Barry wrote "My point is, it seems this example is trying to..."
afaict neither Erik Wrenholt's page nor this page say that the comparison is purely math (and obviously the program does produce output).

Barry wrote "...more than likely the JVM is going to optimize the run2 method completely out..."
I think you're guessing - and perhaps you'd agree that guessing won't take us very far in a this kind of discussion.

Java Elapsed 4.068
Java Elapsed 3.353 // no err.print
Java Elapsed 0.022 // no if(iterate


Barry wrote "...extremely difficult ( if not impossible ) to write code that is equivalent in Java and C..."
There's an Ada front-end to GCC so Ada programmers make this point by saying - it's the same program if the compiler produces the same assembler.

imo the programs on Erik Wrenholt's page are wierd - it seems like there's been an attempt to write programs that are line-by-line the same even though they are in different languages, presumably to suggest that they are as equivalent as can be.
And of course that doesn't take into account that seemingly similar statements in different languages do different things.

Barry said...

You're right I am guessing, and that's the point. What you think the JVM is optimizing today may be completely changed in another version. So maybe in Java 7, even those for loops would be optimized out if you removed the trace statements. I'm actually surprised they're not optimized now.
It's clear the JVM is optimizing as the program runs. You can actually see the asterisks printing faster and faster as it runs. What's not clear is what exactly is being optimized. Is it the math calculations, or the printing, a little of both, or something completely different? Maybe it figures out the program is printing the same thing over and over and compiles the program to simply
print "*"
print "**" etc. And the math is no longer needed. WOW! That would be something, wouldn't it?

igouy said...

Barry wrote "...and that's the point...
I see that I should have written - you're guessing wildly without any evidence whatsoever :-)

Barry wrote "What you think the JVM is optimizing today may be completely changed...
Maybe it will, but I think your comments were intended to be about what the JVM is optimizing today.

igouy said...

Paul Buchheit wrote "... people seem to be taking this post to be the "Ultimate C vs Java shootout". It's not."

One of the reasons search engines give so much weight to page titles, is that people give so much weight to titles.

When you title something "Java running faster than C" you really should expect people to take your meaning as - Java is faster (without qualification) than C.

Paul Buchheit wrote "For the majority of applications, speed is no longer a valid excuse..."
For some definition of majority of applications, for some definition of speed...

I have a suspicion that neither of us know much about what the "majority of applications" do.

Sanjeev said...

Andrew, moore's law has been a reasonable proxy for throughput thus far and throughput is more relevant to servers (where you would be using java or rhino-compiled javascript :). Look at Intel's best server chips from 2002q1 and 2007q1 (specint2000):


2002q1: Xeon 2.2Ghz (1core), 9.41
2007q1: Xeon 2.66Ghz (4core), 108

11x more.

Sanjeev said...

I should add specint2000 RATE, which measures throughput.

igouy said...

Paul Buchheit wrote "Update: ... I tried replacing the print calls..."

Cool. Now try Matthew's loop suggestion, or just use

while(i <= MAX_ITERATIONS) {
...
if (zi2 + zr2 > BAILOUT) return i;
}

That improved both programs on my machine, be interesting to see what happens on your hardware.

Georgi said...

I think the sentence

Obviously the results will be different with different code and different machines, but it's clear that the JVM is getting quite fast.

says it all to me. I do not care about compiler flags or "switch-these-lines-and-I-am-faster" thingies. The thing I learned about Java and C today is that they are (regarding the performance) not that far from each as, say, 1999. You do not have to count in magnitiudes anymore, and that simply is good to know.

Kevin Greer said...

IO is cheaper in C than in Java because C's IO provides neither thread-safety nor Unicode support. This is why Java's lead increased even further when IO was removed.

thelittlebug said...

whats the language java itself is written in?

try to understand this please: http://www.kano.net/javabench/

regards

MCKAY Brothers said...

quite java is not faster, cos in those machines the JVM optimizes the bycode, in gcc all depends of teh code and the instrucctions used, as show in the many commnent posted here.. is obviosly

The Greatest said...

Paul - I think java does use the -ffast-math optimisation. It is used by default, and can be turned off using the 'strictfp' keyword.


www.ipmango.com