Category Archives: OpenWrt

Upgrading OpenWRT to LEDE

A bit late, but I wanted to upgrade OpenWRT 15.05 to LEDE 17.01.4.

It worked perfectly for my WDR 4900. The OpenWRT-to-LEDE-rebranding caused no problems.

I basically followed my own upgrade instuctions.
I also took advantage of adding files and folders to /etc/sysupgrade.conf. Those where automatically kept during the upgrade, which is nice.

Conclusion (based on one successful upgrade): if you are an old OpenWRT fan there is no reason to fear LEDE and wait for a new OpenWRT release before upgrading.

Node.js 6 on OpenWrt

I have managed to produce a working Node.js 6 binary for OpenWrt and RPi (brcm2708/brcm2709).

Binaries

15.05.1: brcm2708 6.9.5
15.05.1: brcm2709 6.9.5
15.05.1: mvebu 6.9.5 Please test (on WRT1x00AC router) and get back to me with feedback
15.05.1: x86 6.9.5 Please test and get back to me with feedback

Note: all the binaries work with equal performance on RPi v2 (brcm2709). For practical purposes the brcm2708 may be the only binary needed.

How to build 6.9.5 brcm2708/brcm2709
The procudure is:

  1. Set PATH and STAGING_DIR
  2. Set a few compiler flags and run configure with not so few options
  3. Fix nearbyint/nearbyintf
  4. Fix config.gypi
  5. make

1. I have a little script to set my toolchain variables.

# file:  env-15.05.1-brcm2709.sh
# usage: $ source ./env-15.05.1-brcm2709.sh

PATH=/path/to/staging_dir/bin:$PATH
export PATH

STAGING_DIR=/path/to/staging_dir
export STAGING_DIR

Your path should now contain arm-openwrt-linux-uclibcgnueabi-g++ and other binaries.

2. (brcm2709 / mvebu) I have another script to run configure:

#!/bin/sh -e

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"

export CFLAGS="-isystem${CSTOOLS_INC} -mfloat-abi=softfp"
export CPPFLAGS="-isystem${CSTOOLS_INC} -mfloat-abi=softfp"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux --without-npm --without-ssl --without-intl --without-inspector

bash --norc

Please not that this script was the first one that worked. It may not be the best. Some things may not be needed. –without-intl and –without-inspector helped me avoid build errors. If you need those features you have more work to do.

2. (brcm2708)

#!/bin/sh -e

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"

export CFLAGS="-isystem${CSTOOLS_INC} -march=armv6j -mfloat-abi=softfp"
export CPPFLAGS="-isystem${CSTOOLS_INC} -march=armv6j -mfloat-abi=softfp"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux --without-npm --without-ssl --without-intl --without-inspector

bash --norc

3. Use “grep -nR nearbyint” to find and replace:

  nearbyint => round
  nearbyintf => roundf

This may not be a good idea! However, nearbyint(f) is not supported in OpenWrt, and with the above replacements Node.js builds and it passes the octane benchmark – so it is not that broken. I suppose there is correct way to replace nearbyint(f).

4. Add to config.gypi:

{ 'target_defaults': {
    'cflags': [ '-D__STDC_LIMIT_MACROS' ,'-D__STDC_CONSTANT_MACROS'],
    'ldflags': [ '-Wl,-rpath,/path/to/staging_dir/lib/' ]},

These are just compilation error workarounds.

This works for me.

Dependencies
You need to install dependencies in OpenWrt:

# opkg update
# opkg install librt
# opkg install libstdcpp

Performance
My initial tests indicate that Node.js v6 is a little (~2%) slower than Node.js 4 on ARM v7 (RPi v2).

Other targets
mvebu: I will build a binary, but I need help to test
x86/x86_64: This shall be easy, but I see little need/use. Let me know if you want a binary.
mpc85xx: The chip is quite capable, but the PowerPC port of Node.js will most likely never support it.

Most MIPS architectures lack FPU and are truly unsuitable for running Node.js.

std::snprintf
It seems the OpenWrt C++ std library does not support std::snprintf. You can replace it with just snprintf and add #include <stdio.h> in the file:
deps/v8_inspector/third_party/v8_inspector/platform/inspector_protocol/String16_cpp.template
However, this is not needed when –without-inspector is applied.

Node.js 6.12.2
I have failed building Node.js 6.12.2 on x86 with some openssl error.

Node.js 7
I have failed building Node.js 7 before. But perhaps I will give it a try sometime that Node.js 6 is working.

Older versions of Node.js
I have previously built and distributed Node.js 4 for OpenWrt.

OpenWrt, easy-rsa, openvpn and stunnel

Certificates are confusing. I have wanted to generate self signed certificates on OpenWrt using easy-rsa, and use them for openvpn and stunnel. Below are the relevant commands and configurations.

easy-rsa
The vpn guide for OpenWrt is quite good. A summary:

# cd /etc/easy-rsa
# vim vars                   -- edit as you like
# source ./vars
# build-ca                   -- generates ca.crt
# build-dh                   -- generates dh2048.pem
# build-key-server myserver  -- generates myserver.[crt+key+csr]
# build-key myclient         -- generates myclient.[crt+key+csr]

For stunnel purposes, you need to copy/rename your .crt file to .pem. The content is the same.

The .csr files are not needed. The clients need the ca.crt plus their .crt (or .pem) and .key files.

openvpn server

option ca '/etc/openvpn/ca.crt'
option cert '/etc/openvpn/myserver.crt'
option key '/etc/openvpn/myserver.key'
option dh '/etc/openvpn/dh2048.pem'

openvpn client

option ca '/etc/openvpn/ca.crt'
option cert '/etc/openvpn/myclient.crt'
option key '/etc/openvpn/myclient.key'

stunnel server

cert = /etc/stunnel/myserver.pem
key = /etc/stunnel/myserver.key
CAfile = /etc/stunnel/ca.crt
verify = 2

stunnel client

cert = /etc/stunnel/myclient.pem
key = /etc/stunnel/mysclient.key
CAfile = /etc/stunnel/ca.crt
verify = 2

It looks very simple now, but without a working configuration it is not so easy to find the error.

OpenWRT on Eee701

I ran OpenWRT on my Eee701 (mostly to test Node.js). A few notes…
Use the combined image: openwrt-15.05-x86-generic-combined-ext4.img.gz
Unpack it. Write it to a USB drive with dd. For me, it boots my Eee without any modification.

Networking
You probably want networking.
Download: kmod-atl2_3.18.20-1_x86.ipk.
I suggest you put it in /root on the filesystem of the above mentioned image, before starting up.

As with the RPi, you might want to edit your /etc/config/network to act like a pure client on the network:

config interface 'lan'
	option ifname 'eth0'
	option proto 'dhcp'
	option macaddr 'XX:XX:XX:XX:XX:XX'
	option hostname 'rpiopenwrt'

USB Storage
It seems you need to install usb-block-support the usual way… and then it is good to have network.

Node.js 4 on OpenWrt

Update 2017-02-27: I have built Node.js 6 for OpenWRT.
Update 2017-02-20: I migrated the files from DropBox since their public shares will stop working.
Update 2017-02-20: Updated binaries for OpenWRT 15.05.1 and Node.js 4.7.3.

Node.js is merged with io.js, and after Node.js 0.12.7 came version 4.0.0.

Well, the good news is that V8 seems to be competely and officially supported on Raspberry Pi (ARMv6+VFPv2) again (it has been a little in and out).

I intend to build and benchmark Node.js for different possible (and impossible) OpenWRT targets, and share a few binaries.

Binaries

Target Binaries Comments
14.07: brcm2708 4.0.0
15.05: x86 4.1.0
15.05: brcm2708 4.1.0 also works for brcm2709 Raspberry Pi 2
15.05: brcm2709 4.1.2
15.05: mvebu 4.1.2 Not Tested! Please test, run octane-benchmark, and let me know!
15.05: ramips/mt7620 0.10.40
r47168: ramips/mt7620 4.1.2 requires kernel FPU emulation (get custom built r47168)
15.05.1: brcm2708 4.4.5
4.7.3
15.05.1: brcm2709 4.4.5
4.7.3
15.05.1: mvebu 4.7.3 Not Tested! Please test, run octane-benchmark, and let me know!

You need to install dependencies:

# opkg update
# opkg install librt
# opkg install libstdcpp

Benchmarks
Octane (1.0.0) Benchmark:

Target        System             CPU        Score      Time
brcm2708      Raspberry Pi v1    700Mhz       97.1     2496s
brcm2708      Raspberry Pi v2    900Mhz     1325        198s
brcm2709      Raspberry Pi v2    900Mhz     1298        198s
x86           Eee701             900Mhz     2559        118s
mt7620        Archer C20i        ( 64 MB RAM not enough )

Performance has been very consistent through different versions of OpenWRT and Node.js.

Building x86
With the 15.05 toolchain, this script configured Node.js 4.1.0

#!/bin/sh -e

export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="i486-openwrt-linux-uclibc-gcc"
export CXX="i486-openwrt-linux-uclibc-g++"
export LD="i486-openwrt-linux-uclibc-ld"

export CFLAGS="-isystem${CSTOOLS_INC}"
export CPPFLAGS="-isystem${CSTOOLS_INC}"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=x86 --dest-os=linux --without-npm

bash --norc

Then just run make, and wait.

Building brcm2708 (Raspberry Pi v1)
I configured
– Node.js 4.0.0 with 14.07 toolchain,
– Node.js 4.1.0 with 15.05 toolchain,
– Node.js 4.4.5 with 15.05.1 toolchain
with the following script:

#!/bin/sh -e

export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"

export CFLAGS="-isystem${CSTOOLS_INC} -march=armv6j -mfloat-abi=softfp"
export CPPFLAGS="-isystem${CSTOOLS_INC} -march=armv6j -mfloat-abi=softfp"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux --without-npm

bash --norc

Then just run make, and wait.

Building brcm2709 (Raspberry Pi v2)
I configured Node.js 4.1.2 with 15.05 toolchain and 4.4.5 with 15.05.1 toolchain with the following script:

#!/bin/sh -e

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"

export CFLAGS="-isystem${CSTOOLS_INC} -mfloat-abi=softfp"
export CPPFLAGS="-isystem${CSTOOLS_INC} -mfloat-abi=softfp"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux --without-npm

bash --norc

Building ramips/mt7620 (Archer C20i)
For Ramips mt7620, Node.js 0.10.40 runs on standard 15.05 and I have posted build instructions for 0.10.38/40 before.

For Node.js 4, you need kernel FPU emulation (which is normally disabled in OpenWRT). The following script configures Node.js 4 for trunk (r47168, to be DD).

#!/bin/sh -e

export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="mipsel-openwrt-linux-musl-gcc"
export CXX="mipsel-openwrt-linux-musl-g++"
export LD="mipsel-openwrt-linux-musl-ld"

export CFLAGS="-isystem${CSTOOLS_INC}"
export CPPFLAGS="-isystem${CSTOOLS_INC}"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=mipsel --dest-os=linux --without-npm --with-mips-float-abi=soft
bash --norc

Without FPU emulation you will get ‘Illegal Instruction’ and Node.js will not run.

ar71xx (TP-Link WDR3600)
Without a custom built FPU-emulator-enabled kernel, a WDR3600 gives:

root@wdr3600-1505-std:/tmp# ./node 
Illegal instruction

However, with FPU enabled:

root@wdr3600-1505-fpu:/tmp# ./node 
undefined:1



SyntaxError: Unexpected end of input
    at Object.parse (native)
    at Function.startup.processConfig (node.js:265:27)
    at startup (node.js:33:13)
    at node.js:963:3

Same result for 4.1.2 and 4.2.2. That is as far as I have got with ar71xx at the moment (20151115).

Notes on Raspberry Pi and Serial

I experimented with my Raspberry Pi (v1 B) and a serial cable, a USB-serial identified as:

[85907.504415] usb 4-5: new full-speed USB device number 19 using ohci-pci
[85907.730850] usb 4-5: New USB device found, idVendor=0403, idProduct=6001
[85907.730863] usb 4-5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[85907.730871] usb 4-5: Product: TTL232R-3V3
[85907.730877] usb 4-5: Manufacturer: FTDI
[85907.730882] usb 4-5: SerialNumber: ********
[85907.737978] ftdi_sio 4-5:1.0: FTDI USB Serial Device converter detected
[85907.738070] usb 4-5: Detected FT232RL
[85907.744057] usb 4-5: FTDI USB Serial Device converter now attached to ttyUSB1

My USB-serial-device has six cables: black-brown-red-orange-yellow-green.
Connected to the RPi from the corner pin: none-none-black-yellow-orange-none8x.

At this point I have no success with minicom. Screen works though:

sudo minicom -b 115200 -o -D /dev/ttyUSB1
sudo screen /dev/ttyUSB1 115200

When serial works, my procedure is:

  1. Connect everything except power
  2. Start screen
  3. Connect power
  4. Within a few seconds i get output

If I start a fresh default NOOBS (v1.4):

Uncompressing Linux... done, booting the kernel.

Welcome to the rescue system
recovery login: 

You can log in with root/raspberry, but I don’t know if you are meant to (can) install Raspbian this way.

NOTE: The Raspberry Pi itself prints nothing to the serial console. Only with a properly installed SD-card inserted, you get output.

Already installed System
For an already installed Raspbian, I got a normal login prompt over serial.
For an already installed OpenWRT (14.07), I got a root prompt, no password required, over serial.

Formatting SD-card using Linux
Sometimes it is hard to produce an SD-card that the Raspberry Pi wants to boot from.
This partitioning and formatting works:

$ sudo /sbin/fdisk -l /dev/sde

Disk /dev/sde: 7,4 GiB, 7948206080 bytes, 15523840 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00055f28

Device     Boot Start      End  Sectors  Size Id Type
/dev/sde1        2048 15523839 15521792  7,4G  e W95 FAT16 (LBA)

gt@oden:~/Downloads$ sudo mkfs.vfat /dev/sde1
mkfs.fat 3.0.27 (2014-11-12)

To be on the safe side, before using fdisk:

$ sudo dd if=/dev/zero of=/dev/sde bs=1024 count=10240

OpenWrt 15.05 on Legacy Devices (16Mb RAM)

There are 86 devices on the OpenWrt homepage listed as supported, but with only 16Mb of RAM. Those devices work just fine with OpenWrt Backfire 10.03.1, but not with more recent OpenWrt releases.

I myself own a Linksys WRT54GL and I used Barrier Breaker 14.07 with some success.

With 15.05 there is a new feature available: zram-swap. A bit simplified, it means the system can compress its memory, effectively making better use of it.

I decided to try out 15.05 RC3 on my WRT54GL.

The standard image
The standard image is 3936256 and the device page for WRT54GL says: As the WRT54GL has only 4Mb flash, any image sent to the device must be 3866624 bytes or smaller. So the standard image is out of the question. Instead I downloaded the Image Builder from the same folder.

The Image Builder
The Image Builder is very easy to use and requires an x64 linux computer.

make image PROFILE=Broadcom-b43 PACKAGES="zram-swap -kmod-ppp -kmod-pppox -kmod-pppoe -ppp -ppp-mod-pppoe -kmod-b43 -kmod-b43legacy -kmod-mac80211 -kmod-cfg80211"

After a little while this has produced custom images, minus ppp-stuff, minus wireless stuff (more on that later), plus zram-swap. Also, LuCi is not there. The image is found in bin/brcm47xx, it is 3012kb and is installed the normal way on your WRT54GL.

Trying 15.05
Logging in via ssh (dropbear) is fine:

BusyBox v1.23.2 (2015-06-18 17:05:04 CEST) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 CHAOS CALMER (15.05-rc3, r46163)
 -----------------------------------------------------
  * 1 1/2 oz Gin            Shake with a glassful
  * 1/4 oz Triple Sec       of broken ice and pour
  * 3/4 oz Lime Juice       unstrained into a goblet.
  * 1 1/2 oz Orange Juice
  * 1 tsp. Grenadine Syrup
 -----------------------------------------------------
root@OpenWrt:~# 

Top looks tight but not alarming (as usual):

Mem: 11568K used, 1056K free, 44K shrd, 1208K buff, 3228K cached
CPU:   8% usr   8% sys   0% nic  83% idle   0% io   0% irq   0% sirq
Load average: 0.13 0.23 0.11 1/31 1061
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 1061  1056 root     R     1488  12%  17% top
  631     1 root     S     1656  13%   0% /sbin/netifd
  848     1 root     S     1492  12%   0% /usr/sbin/ntpd -n -S /usr/sbin/ntpd-h
 1056  1055 root     S     1492  12%   0% -ash
  735   631 root     S     1488  12%   0% udhcpc -p /var/run/udhcpc-eth0.2.pid
    1     0 root     S     1444  11%   0% /sbin/procd
 1055   757 root     S     1224  10%   0% /usr/sbin/dropbear -F -P /var/run/dro
  650     1 root     S     1196   9%   0% /usr/sbin/odhcpd
  757     1 root     S     1156   9%   0% /usr/sbin/dropbear -F -P /var/run/dro
  580     1 root     S     1060   8%   0% /sbin/logd -S 16
  868     1 nobody   S      996   8%   0% /usr/sbin/dnsmasq -C /var/etc/dnsmasq
  308     1 root     S      916   7%   0% /sbin/ubusd
  737   631 root     S      812   6%   0% odhcp6c -s /lib/netifd/dhcpv6.script
  337     1 root     S      772   6%   0% /sbin/askfirst /bin/ash --login
    4     2 root     SW       0   0%   0% [kworker/0:0]
    8     2 root     SW       0   0%   0% [kworker/u2:1]
    3     2 root     SW       0   0%   0% [ksoftirqd/0]
   14     2 root     SW       0   0%   0% [kswapd0]
    6     2 root     SW       0   0%   0% [kworker/u2:0]
  237     2 root     SWN      0   0%   0% [jffs2_gcd_mtd5]

The swap seems to work, at least in theory:

root@OpenWrt:~# free
             total         used         free       shared      buffers
Mem:         12624        11552         1072           44         1208
-/+ buffers:              10344         2280
Swap:         6140           72         6068

But that is the end of the good news.

opkg runs out of memory
Trying to install a package fails (in a new way):

# opkg install kmod-b43
Installing kmod-b43 (3.18.17+2015-03-09-3) to root...
Downloading http://downloads.openwrt.org/chaos_calmer/15.05-rc3/brcm47xx/legacy/packages/base/kmod-b43_3.18.17+2015-03-09-3_brcm47xx.ipk.
Collected errors:
 * gz_open: fork: Cannot allocate memory.
 * opkg_install_pkg: Failed to unpack control files from /tmp/opkg-lE7SIf/kmod-b43_3.18.17+2015-03-09-3_brcm47xx.ipk.
 * opkg_install_cmd: Cannot install package kmod-b43.

This happens also without zram-swap installed. I tried different packages but none of those I tried installed successfully. Effectively opkg is broken. One way to deal with this is to build an image with exactly the packages I need, and rebuild the image every time I want a new package. Which leads me to the next problem.

sysupgrade runs out of memory
I have found that flashing my 15.05 image to my WRT54GL (from 10.03.1 or 14.07) is fine. But flashing from 15.05 is tricky because it seems there is not enough RAM for sysupgrade. And it is quite scary when sysupgrade stalls, because you dont know if it is in the middle of flashing but failing to let you know.

One way to get around this is to flash a smaller image that use less space on /tmp. I tried 8.09.1 for the first time ever for this reason. Another (not recommended way) is to pipe from nc to mtd directly.

I found out (the hard way) about system recovery mode: start your WRT54GL, press the reset button on the back side (more is better), and it starts in recovery mode where you can telnet to it and sysupgrade runs just fine.

Not even in recovery mode everything is fine: for example, when trying the firstboot command it did not finish properly and I had to reset the WRT54GL.

A few times I forgot to use the -n option with sysupgrade: that is not such a good thing when you run it in recovery mode and perhaps flash a different firmware version.

testing wifi
I built a new image with WiFi installed and flashed it from failsafe mode:

make image PROFILE=Broadcom-b43 PACKAGES="zram-swap -kmod-ppp -kmod-pppox -kmod-pppoe -ppp -ppp-mod-pppoe"

Well, I tried different things… on one occation I had WiFi without encryption working. However, most of the time, activating WiFi just makes the WRT54GL not responding or very slow.

Conclusion
zram-swap is not the silver bullet that makes OpenWrt run on 16Mb devices. As with 14.07, you can probably use Image Builder to build a useful minimal image: get rid of the firewall, the WiFi, LuCl of course, and use it for something else – fine! But as a WiFi router: use Tomato or 10.03.1 instead.

For now, my WRT54GL is flashed with 10.03.1, completely unconfigured, and stored away for future adventures. At least it is not bricked, and I never needed to connect a TTL-cable to it.

Building Node.js for OpenWrt (mipsel)

Update 2015-10-11:See separate post for Node version 4 for different OpenWrt targets. Information about v4 added below.

I managed to build (and run) Node.js OpenWrt and my Archer C20i with a MIPS 24K Little Endian CPU, without FPU (target=ramips/mt7620).

Node.js v0.10.40
First edit (set to false):

deps/v8/build/common.gypi

    54      # Similar to vfp but on MIPS.
    55      'v8_can_use_fpu_instructions%': 'false',
   
    63      # Similar to the ARM hard float ABI but on MIPS.
    64      'v8_use_mips_abi_hardfloat%': 'false',

For 15.05 I use this script to run configure:

#!/bin/sh -e

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="mipsel-openwrt-linux-uclibc-gcc"
export CXX="mipsel-openwrt-linux-uclibc-g++"
export LD="mipsel-openwrt-linux-uclibc-ld"

export CFLAGS="-isystem${CSTOOLS_INC}"
export CPPFLAGS="-isystem${CSTOOLS_INC}"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=mipsel --dest-os=linux --without-npm

bash --norc

Then just “make”. I have uploaded the compiled binary node to DropBox.

Compilation for (DD) trunk (with musl rather than uclibc) fails for v0.10.40.

Node.js v4.1.2
Node.js v4 does not run without a FPU. Normally Linux emulates an FPU if it is not present, but this feature is disabled in OpenWRT. I built and published r47168 with FPU emulation and Node v4.1.2.

Node.js 4.1.2 is configured like:

#!/bin/sh -e

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib

export CC="mipsel-openwrt-linux-musl-gcc"
export CXX="mipsel-openwrt-linux-musl-g++"
export LD="mipsel-openwrt-linux-musl-ld"

export CFLAGS="-isystem${CSTOOLS_INC}"
export CPPFLAGS="-isystem${CSTOOLS_INC}"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=mipsel --dest-os=linux --without-npm --with-mips-float-abi=soft

bash --norc

Dependencies
In order to run the node binary on OpenWrt you need to install:

# opkg update
# opkg install librt
# opkg install libstdcpp

Performance
The 64MB or RAM of my Archer C20i is not sufficient to run octane-benchmark (even if the node binary and the benchmark are stored on a USB drive). However, I have a Mandelbrot benchmark that I can run. For Archer C20i, timings are:

C/Soft Float                     48s
Lua                              82s
Node.js v0.10.40 (soft float)    65s
Node.js v4.1.2 (FPU emulation)  444s (63s user, 381s kernel)

Clearly, the OpenWrt developers have a good reason to leave FPU emulation out. However, for Node.js in the future, FPU emulation seems to be the only way. My Mandelbrot benchmark is of course ridiculously dependent on FPU performance. For more normal usage, perhaps the penalty is less significant.

Other MIPS?
The only other MIPS I have had the opportunity to try was my WDR3600, a Big Endian 74K. It does not work:

  • v0.10.38 does not build at all (big endian MIPS seems unsupported)
  • v0.12.* builds, but it does not run (floating point exceptions), despite I managed to build for Soft Float.

I need to try rebuilding OpenWRT with FPU emulation for ar71xx, then perhaps Node.js v4 will work.

Effects of cache on performance

It is not clear to me, why is Node.js so amazyingly slow on a Raspberry Pi (article 1, article 2)?

Is it because of the small cache (16kb+128kb)? Is Node.js emitting poor code on ARM? Well, I decided to investigate the cache issue. The 128kb cache of the Raspberry Pi is supposed to be primarily used by the GPU; is it actually effective at all?

A suitable test algorithm
To understand what I test, and because of the fun of it, I wanted to implement a suitable test program. I can imagine a good test program for cache testing would:

  • be reasonably slow/fast, so measuring execution time is practical and meaningful
  • have working data sets in sizes 10kb-10Mb
  • the same problem should be solvable with different work set sizes, in a way that the theoretical execution time should be the same, but the difference is because of cache only
  • be reasonably simple to implement and understand, while not so trivial that the optimizer just gets rid of the problem entirely

Finally, I think it is fun if the program does something slightly meaningful.

I found that Bubblesort (and later Selectionsort) were good problems, if combined with a quasi twist. Original bubble sort:

Array to sort: G A F C B D H E   ( N=8 )
Sorted array:  A B C D E F G H
Theoretical cost: O(N2) = 64/2 = 32
Actual cost: 7+6+5+4+3+2+1     = 28 (compares and conditional swaps)

I invented the following cache-optimized Bubble-Twist-Sort:

Array to sort:                G A F C B D H E
Sort halves using Bubblesort: A C F G B D E H
Now, the twist:                                 ( G>B : swap )
                              A C F B G D E H   ( D>F : swap )
                              A C D B G F E H   ( C<E : done )
Sort halves using Bubblesort: A B C D E F G H
Theoretical cost = 16/2 + 16/2 (first two bubbelsort)
                 + 4/2         (expected number of twist-swaps)
                 + 16/2 + 16/2 (second two bubbelsort)
                 = 34
Actual cost: 4*(3+2+1) + 2 = 26

Anyway, for larger arrays the actual costs get very close. The idea here is that I can run a bubbelsort on 1000 elements (effectively using 1000 memory units of memory intensively for ~500000 operations). But instead of doing that, I can replace it with 4 runs on 500 elements (4* ~12500 operations + ~250 operations). So I am solving the same problem, using the same algorithm, but optimizing for smaller cache sizes.

Enough of Bubblesort… you are probably either lost in details or disgusted with this horribly stupid idea of optimizing and not optimizing Bubblesort at the same time.

I made a Selectionsort option. And for a given data size I allowed it either to sort bytes or 32-bit words (which is 16 times faster, for same data size).

The test machines
I gathered 10 different test machines, with different cache sizes and instructions sets:

	QNAP	wdr3600	ac20i	Rpi	Rpi 2	wdr4900	G4	Celeron	Xeon	Athlon	i5
								~2007   ~2010   ~2013
============================================================================================
L1	32	32	32	16	?	32	64	32	32	128	32
L2				128	?	256	256	512	6M	1024	256
L3							1024				6M
Mhz	500	560	580	700	900	800	866	900	2800	3000	3100
CPU	ARMv5	Mips74K	Mips24K	ARMv6	ARMv7	PPC	PPC	x86	x64	x64	x64
OS	Debian	OpenWrt	OpenWrt	OpenWrt	OpenWrt	OpenWrt	Debian	Ubuntu	MacOSX	Ubuntu	Windows

Note that for the multi-core machines (Xeon, Athlon, i5) the L2/L3 caches may be shared or not between cores and the numbers above are a little ambigous. The sizes should be for Data cache when separate from Instruction cache.

The benchmarks
I ran Bubblesort for sizes 1000000 bytes down to 1000000/512. For Selectionsort I just ran three rounds. For Bubblesort I also ran for 2000000 and 4000000 but those times are divided by 4 and 16 to be comparable. All times are in seconds.

Bubblesort

	QNAP	wdr3600	ac20i	rpi	rpi2	wdr4900	G4	Celeron	Xeon	Athlon	i5
============================================================================================
4000000	1248	1332	997	1120	396	833		507	120	104	93
2000000	1248	1332	994	1118	386	791	553	506	114	102	93
1000000	1274	1330	1009	1110	367	757	492	504	113	96	93
500000	1258	1194	959	1049	352	628	389	353	72	74	63
250000	1219	1116	931	911	351	445	309	276	53	61	48
125000	1174	1043	902	701	349	397	287	237	44	56	41
62500	941	853	791	573	349	373	278	218	38	52	37
31250	700	462	520	474	342	317	260	208	36	48	36
15625	697	456	507	368	340	315	258	204	35	49	35
7812	696	454	495	364	340	315	256	202	34	49	35
3906	696	455	496	364	340	315	257	203	34	47	35
1953	698	456	496	365	342	320	257	204	35	45	35

Selectionsort

	QNAP	wdr3600	ac20i	rpi	rpi2	wdr4900	G4	Celeron	Xeon	Athlon	i5
============================================================================================
1000000	1317	996	877	1056	446	468	296	255	30	45	19
31250	875	354	539	559	420	206	147	245	28	40	21
1953	874	362	520	457	422	209	149	250	30	41	23

Theoretically, all timings for a single machine should be equal. The differences can be explained much by cache sizes, but obviously there are more things happening here.

Findings
Mostly the data makes sense. The caches creates plateaus and the L1 size can almost be prediced by the data. I would have expected even bigger differences between best/worse-cases; now it is in the range 180%-340%. The most surprising thing (?) is the Selectionsort results. They are sometimes a lot faster (G4, i5) and sometimes significantly slower! This is strange: I have no idea.

I believe the i5 superior performance of Selectionsort 1000000 is due to cache and branch prediction.

I note that the QNAP and Archer C20i both have DDRII memory, while the RPi has SDRAM. This seems to make a difference when work sizes get bigger.

I have also made other Benchmarks where the WDR4900 were faster than the G4 – not this time.

The Raspberry Pi
What did I learn about the Raspberry Pi? Well, memory is slow and branch prediction seems bad. It is typically 10-15 times slower than the modern (Xeon, Athlon, i5) CPUs. But for large selectionsort problems the difference is up to 40x. This starts getting close to the Node.js crap speed. It is not hard to imagine that Node.js benefits heavily from great branch prediction and large cache sizes – both things that the RPi lacks.

What about the 128k cache? Does it work? Well, compared to the L1-only machines, performance of RPi degrades sligthly slower, perhaps. Not impressed.

Bubblesort vs Selectionsort
It really puzzles me that Bubblesort ever beats Selectionsort:

void bubbelsort_uint32_t(uint32_t* array, size_t len) {
  size_t i, j, jm1;
  uint32_t tmp;
  for ( i=len ; i>1 ; i-- ) {
    for ( j=1 ; j<i ; j++ ) {
      jm1 = j-1;
      if ( array[jm1] > array[j] ) {
        tmp = array[jm1];
        array[jm1] = array[j];
        array[j] = tmp;
      }
    }
  }
}

void selectionsort_uint32_t(uint32_t* array, size_t len) {
  size_t i, j, best;
  uint32_t tmp;
  for ( i=1 ; i<len ; i++ ) {
    best = i-1;
    for ( j=i ; j<len ; j++ ) {
      if ( array[best] > array[j] ) {
        best = j;
      }
    }
    tmp = array[i-1];
    array[i-1] = array[best];
    array[best] = tmp;
  } 
}

Essentially, the difference is how the swap takes place outside the inner loop (once) instead of all the time. The Selectionsort should also be able of benefit from easier branch prediction and much fewer writes to memory. Perhaps compiling to assembly code would reveal something odd going on.

Power of 2 aligned data sets
I avoided using a datasize with the size an exact power of two: 1024×1024 vs 1000×1000. I did this becuase caches are supposed to work better this way. Perhaps I will make some 1024×1024 runs some day.

Raspberry Pi (v1), OpenWrt (14.07) and Node.js (v0.10.35 & v0.12.2)

Since I gave up running NetBSD on my Raspberry pi I decided it was time to try OpenWrt. And, to my surprise I also managed to cross compile Node.js!

Install OpenWrt on Raspberry Pi (v1@700MHz)
I installed OpenWrt Barrier Breaker (the currently stable release) using the standard instructions.

After you have put the image on an SD-card with dd, it is quite easy to resize the root partition:

  1. copy the second partition to an image file using dd
  2. use fdisk to delete the second partition, and create a new, bigger
  3. format the new partition with mkfs.ext4
  4. mount the image file using mount -o loop
  5. mount the new second partition
  6. copy all data from image file to second partition using cp -a

If you want to, you can edit /etc/config/network while you are anyway working with the OpenWrt root partition:

#config interface 'lan'
#	option ifname 'eth0'
#	option type 'bridge'
#	option proto 'static'
#	option ipaddr '192.168.1.1'
#	option netmask '255.255.255.0'
#	option ip6assign '60'
#	option gateway '?.?.?.?'
#	option dns '?.?.?.?'
config interface 'lan'
	option ifname 'eth0'
	option proto 'dhcp'
	option macaddr 'XX:XX:XX:XX:XX:XX'
	option hostname 'rpiopenwrt'

Probably you want to disable dnsmasq, odhcpd and firewall too:

.../etc/init.d/$ chmod -x dnsmasq firewall odhcpd

OR (depending on your idea of what is the right way)

.../etc/rc.d$ sudo rm S60dnsmasq S35odhcpd K85odhcpd S19firewall

Also, it is a good idea to edit config.txt (on the DOS partition):

gpu_mem=1

I don’t know if 1 is really a legal value, but it worked for me, and I had much more memory available than when gpu_mem was not set.

Node.js4 added 2015-10-03
For Node.js, check Node.js 4 builds.

Building Node.js v0.12.2
I downloaded and built Node.js v0.12.2 on a Xubuntu machine with an x64 cpu. On such a machine you can download the standard OpenWrt toolchain for Raspberry Pi.

I replaced configure and cpu.cc in the standard sources with the files from This Page (they are meant for v0.12.1 but they work equally good for v0.12.2).

I then found an a gist that gave me a good start. I modified it, and ended up with:

#!/bin/sh -e

export STAGING_DIR=...path to your toolchain...

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib
export ARM_TARGET_LIB=$CSTOOLS_LIB

export TARGET_ARCH="-march=armv6j"

#Define the cross compilators on your system
export AR="arm-openwrt-linux-uclibcgnueabi-ar"
export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LINK="arm-openwrt-linux-uclibcgnueabi-g++"
export CPP="arm-openwrt-linux-uclibcgnueabi-gcc -E"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"
export AS="arm-openwrt-linux-uclibcgnueabi-as"
export CCLD="arm-openwrt-linux-uclibcgnueabi-gcc ${TARGET_ARCH} ${TARGET_TUNE}"
export NM="arm-openwrt-linux-uclibcgnueabi-nm"
export STRIP="arm-openwrt-linux-uclibcgnueabi-strip"
export OBJCOPY="arm-openwrt-linux-uclibcgnueabi-objcopy"
export RANLIB="arm-openwrt-linux-uclibcgnueabi-ranlib"
export F77="arm-openwrt-linux-uclibcgnueabi-g77 ${TARGET_ARCH} ${TARGET_TUNE}"
unset LIBC

#Define flags
export CXXFLAGS="-march=armv6j"
export LDFLAGS="-L${CSTOOLS_LIB} -Wl,-rpath-link,${CSTOOLS_LIB} -Wl,-O1 -Wl,--hash-style=gnu"
export CFLAGS="-isystem${CSTOOLS_INC} -fexpensive-optimizations -frename-registers -fomit-frame-pointer -O2"
export CPPFLAGS="-isystem${CSTOOLS_INC}"
export CCFLAGS="-march=armv6j"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux --without-npm

bash --norc

Run this script in the Node.js source directory. If everything goes fine it configures the Node.js build, and leaves you with a shell where you can simply run:

$ make

If compilation is fine, you find the node binary in the out/Release folder. Copy it to your OpenWrt Raspberry Pi.

Building Node.js v0.10.35
I first successfully built Node.js v0.10.35.

The (less refined) script for configuring that I used was:

#!/bin/sh -e

export STAGING_DIR=...path to your toolchain...

#Tools
export CSTOOLS="$STAGING_DIR"
export CSTOOLS_INC=${CSTOOLS}/include
export CSTOOLS_LIB=${CSTOOLS}/lib
export ARM_TARGET_LIB=$CSTOOLS_LIB
export GYP_DEFINES="armv7=0"

#Define our target device
export TARGET_ARCH="-march=armv6"
export TARGET_TUNE="-mfloat-abi=hard"

#Define the cross compilators on your system
export AR="arm-openwrt-linux-uclibcgnueabi-ar"
export CC="arm-openwrt-linux-uclibcgnueabi-gcc"
export CXX="arm-openwrt-linux-uclibcgnueabi-g++"
export LINK="arm-openwrt-linux-uclibcgnueabi-g++"
export CPP="arm-openwrt-linux-uclibcgnueabi-gcc -E"
export LD="arm-openwrt-linux-uclibcgnueabi-ld"
export AS="arm-openwrt-linux-uclibcgnueabi-as"
export CCLD="arm-openwrt-linux-uclibcgnueabi-gcc ${TARGET_ARCH} ${TARGET_TUNE}"
export NM="arm-openwrt-linux-uclibcgnueabi-nm"
export STRIP="arm-openwrt-linux-uclibcgnueabi-strip"
export OBJCOPY="arm-openwrt-linux-uclibcgnueabi-objcopy"
export RANLIB="arm-openwrt-linux-uclibcgnueabi-ranlib"
export F77="arm-openwrt-linux-uclibcgnueabi-g77 ${TARGET_ARCH} ${TARGET_TUNE}"
unset LIBC

#Define flags
export CXXFLAGS="-march=armv6"
export LDFLAGS="-L${CSTOOLS_LIB} -Wl,-rpath-link,${CSTOOLS_LIB} -Wl,-O1 -Wl,--hash-style=gnu"
export CFLAGS="-isystem${CSTOOLS_INC} -fexpensive-optimizations -frename-registers -fomit-frame-pointer -O2 -ggdb3"
export CPPFLAGS="-isystem${CSTOOLS_INC}"
export CCFLAGS="-march=armv6"

export PATH="${CSTOOLS}/bin:$PATH"

./configure --without-snapshot --dest-cpu=arm --dest-os=linux
bash --norc

Running node on the Raspberry Pi
Back on the Raspberry Pi you need to install a few packages:

# ldd ./node 
	libdl.so.0 => /lib/libdl.so.0 (0xb6f60000)
	librt.so.0 => not found
	libstdc++.so.6 => not found
	libm.so.0 => /lib/libm.so.0 (0xb6f48000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb6f34000)
	libpthread.so.0 => not found
	libc.so.0 => /lib/libc.so.0 (0xb6edf000)
	ld-uClibc.so.0 => /lib/ld-uClibc.so.0 (0xb6f6c000)
# opkg update
# opkg install librt
# opkg install libstdcpp

That is all! Now you should be ready to run node. The node binary is about 13Mb (the v0.10.35 was 19Mb perhaps becuase of -ggdb3), so it is not optimal to deploy it to other typical OpenWrt hardware.

Final comments
I ran a few small programs to test, and they were fine. I guess some more testing would be appropriate. The performance is very comparable to Node.js built and executed on Raspbian.

I think RaspberryPi+OpenWrt+Node.js is a very interesting and competitive combination for microservices!