In several previous posts I have studied the performance of the Raspberry Pi (version 1) and Node.js to find out why the Raspberry Pi underperforms so badly when running Node.js.
The first two posts indicate that the Raspberry Pi underperforms about 10x compared to an x86/x64 machine, after compensation for clock frequency is made. The small cache size of the Raspberry Pi is often mentioned as a cause for its poor performance. In the third post I examine that, but it is not that horribly bad: about 3x worse performance for big memory needs compared to in-cache-situations. It appears the slow SDRAM of the RPi is more of a problem than the small cache itself.
The Benchmark Program
I wanted to relate the Node.js slowdown to some other scripted language. I decided Lua is nice. And I was lucky to find Mandelbrot implementations in several languages!
I modified the program(s) slightly, increasing the resolution from 80 to 160. I also made a version that did almost nothing (MAX_ITERATIONS=1) so I could measure and substract the startup cost (which is signifacant for Node.js) from the actual benchmark values.
The Numbers
Below are the average of three runs (minus the average of three 1-iteration rounds), in ms. The timing values were very stable over several runs.
(ms) C/Hard C/Soft Node.js Lua ================================================================= QNAP TS-109 500MHz ARMv5 17513 49376 39520 TP-Link Archer C20i 560MHz MIPS 45087 65510 82450 RPi 700MHz ARMv6 (Raspbian) 493 14660 12130 RPi 700MHz ARMv6 (OpenWrt) 490 11040 15010 31720 RPi2 900MHz ARMv7 (OpenWrt) 400 9130 770 29390 Eee701 900MHz Celeron x86 295 500 7992 3000MHz Athlon II X2 x64 56 59 1267
Notes on Hard/Soft floats:
- Raspbian is armhf, only allowing hard floats (-mfloat-abi=hard)
- OpenWrt is armel, allowing both hard floats (-mfloat-abi=softfp) and soft floats (-mfloat-abi=soft).
- The QNAP has no FPU and generates runtime error with hard floats
- The other targets produce linkage errors with soft floats
The Node.js versions are slightly different, and so are the Lua versions. This makes no significant difference.
Findings
Calculating the Mandelbrot with the FPU is basically “free” (<0.5s). Everything else is waste and overhead.
The cost of soft float is about 10s on the RPI. The difference between Node.js on Raspbian and OpenWrt is quite small – either both use the FPU, or none of them does.
Now, the interesting thing is to compare the RPi with the QNAP. For the C-program with the soft floats, the QNAP is about 1.5x slower than the RPi. This matches well with earlier benchmarks I have made (see 1st and 3rd link at top of post). If the RPi would have been using soft floats in Node.js, it would have completed in about 30 seconds (based on the QNAP 50 seconds). The only thing (I can come up with) that explains the (unusually) large difference between QNAP and RPi in this test, is that the RPi actually utilizes the FPU (both Raspbian and OpenWrt).
OpenWrt and FPU
The poor Lua performance in OpenWrt is probably due to two things:
- OpenWrt is compiled with -Os rather than -O2
- OpenWrt by default uses -mfloat-abi=soft rather than -mfloat-abi=softfp (which is essentially like hard).
It is important to notice that -mfloat-abi=softfp not only makes programs much faster, but also quite much smaller (10%), which would be valuable in OpenWrt.
Different Node.js versions and builds
I have been building Node.js many times for Raspberry Pi and OpenWrt. The above soft/softfp setting for building node does not affect performance much, but it does affect binary size. Node.js v0.10 is faster on Raspberry Pi than v0.12 (which needs some patching to build).
Lua
Apart from the un-optimized OpenWrt Lua build, Lua is consistently 20-25x slower than native for RPi/x86/x64. It is not like the small cache of the RPi, or some other limitation of the CPU, makes it worse for interpreted languages than x86/x64.
RPi ARMv6 VFPv2
While perhaps not the best FPU in the world, the VFPv2 floating point unit of the RPi ARMv6 delivers quite decent performance (slightly worse per clock cycle) compared to x86 and x64. It does not seem like the VFPv2 is to be blamed for the poor performance of Node.js on ARM.
Conclusion and Key finding
While Node.js (V8) for x86/x64 is near-native-speed, on the ARM it is rather near-Lua-speed: just another interpreted language, mostly. This does not seem to be caused by any limitation or flaw in the (RPi) ARM cpu, but rather the V8 implementation for x86/x64 being superior to that for ARM (ARMv6 at least).
What about RPI 3 ? I’ve been running Node on Pi 1 and it’s seriously painful (the starting delay when you develop…) and I was thinking to buy a new RPI 3.
I have not tried an RPi3. It is 1200MHz instead of 900MHz, I would expect that to reflect the performance difference. The RPi2 on the other hand is a magnitude faster than RPi1, not because of 900Mhz vs 700MHz, but because of other architectural details.
When it comes to starting times of node, I think this “snapshot” options matters. When (if) you build Nodejs you can choose to include snapshot or not. If you do, it will speedup the startup of nodejs itself. To build with snapshot, I think you need not to use a cross compiler, so I think forget about it for OpenWRT (if that is what you use).
I find it interesting that you are bashing Lua and that Node.js is just as slow as Lua. There is more to it than the raw speed of the language. You fail to consider the speed of the actual web application framework. You should check out the Mako Server, a framework with integrated Lua support. This server is super fast on small devices, including RPI.
Jamie, sorry you felt that way! I think Lua is quite great a language, and I am quite curious to try out Luvit (the reason I am quite stuck with Node.js and JavaScript is that most code I write needs to run both backend and frontend, and then JavaScript is the obvious option).
The ONLY purpose of this article was to try to establish why Node.js is so surprisingly much slower on RPIv1 than RPIv2/x86/x64.
For that purpose, a simple interpreted language was a good reference. OpenWRT has excellent Lua support, that was why I choose Lua. The Lua performance is more predictable and understandable than the Node.js performance. That was all.