I am running a system, a collection of about 20 Node.js-processes on a single machine. Those processes do some I/O to disk and they communicate with each other using HTTP. Much of the code is almost 10 years old and this system first ran on Node 0.12. I can run the system on many different machines and I have automated tests as well.
The problem demonstrated for idle system using top
I will now illustrate the problem of excessive SYS CPU load under Node 20.10.0 compared to Node 18 on an idle system, using top.
TEST (production identical cloud VPS, Debian 11.8)
Here the system running on Node 18 has been idling for a little while.
top - 12:44:46 up 3 days, 23:21, 4 users, load average: 0.02, 0.44, 0.35 Tasks: 109 total, 1 running, 108 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.5 us, 0.7 sy, 0.0 ni, 96.4 id, 0.1 wa, 0.0 hi, 0.3 si, 0.1 st MiB Mem : 3910.9 total, 948.2 free, 1484.8 used, 1478.0 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 2166.2 avail Mem
Upgrading to Node.js 20.10.0 and letting the system idle a while gives:
top - 12:54:20 up 3 days, 23:30, 2 users, load average: 0.79, 1.74, 1.16 Tasks: 108 total, 3 running, 105 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.3 us, 20.4 sy, 0.0 ni, 76.0 id, 0.0 wa, 0.0 hi, 1.3 si, 0.0 st MiB Mem : 3910.9 total, 809.8 free, 1316.8 used, 1784.3 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 2347.7 avail Mem
As you can see, the SYS CPU load is massive under Node 20.
RPI v2, Raspbian 12.1
Here the system running on Node 18 has been idling on a RPi2 for more than 15 minutes.
top - 12:38:36 up 42 min, 2 users, load average: 0.13, 0.11, 0.63 Tasks: 133 total, 2 running, 131 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.2 us, 1.2 sy, 0.0 ni, 95.6 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st MiB Mem : 971.9 total, 436.0 free, 324.3 used, 263.3 buff/cache MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 647.6 avail Mem
This is a very under powered machine, but it is ok.
Upgrading to Node.js 20.10.0 and letting the machine idle gives:
top - 12:55:09 up 59 min, 2 users, load average: 0.56, 1.38, 1.32 Tasks: 139 total, 1 running, 138 sleeping, 0 stopped, 0 zombie %Cpu(s): 4.3 us, 12.6 sy, 0.0 ni, 82.7 id, 0.3 wa, 0.0 hi, 0.1 si, 0.0 st MiB Mem : 971.9 total, 429.5 free, 327.9 used, 266.5 buff/cache MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 644.0 avail Mem
Again, a quite massive increase in SYS CPU load.
The problem demonstrated using integration tests and “time”
On the same TEST system as above, I run my integration tests on Node 18:
$ node --version v18.13.0 $ time ./tools/local.sh integrationtest ALL -v | tail -n 1 Bad:0 Void:0 Skipped:8 Good:1543 (1551) real 0m27.277s user 0m17.751s sys 0m4.251s
Changing to Node 20.10.0 instead gives:
$ node --version v20.10.0 $ time ./tools/local.sh integrationtest ALL -v | tail -n 1 Bad:0 Void:0 Skipped:8 Good:1542 (1551) real 0m56.958s user 0m12.875s sys 0m36.931s
As you can see, SYS CPU load increased dramatically.
Affected Node versions
There is never a problem with Node.js 18 or lower.
Current Node.js 20.10.0 shows the problem (on some hosts).
My tests (on one particular host) indicate that the excessive SYS CPU load was introduced with Node.js 20.3.1. The problem is still there with Node 21.
There is an interesting open Github issue.
Affected hosts
I can reproduce the problem on some computers with some configurations. Successful reproduction means that Node 18 runs fine and Node 20.10.0 runs with excessive SYS CPU load.
Hosts where problem is reproduced (Node 20 runs with excessive SYS CPU load)
- Raspberry Pi 2, Raspbian 12.1
- Intel NUC i5 4250U, Debian 12.1
- Cloud VPS, Glesys.com, System container VPS, x64, Debian 11.8
Host where problem is not reproduced (Node 20 runs just fine)
- Apple M1 Pro, macOS
- Dell XPS, 8th gen i7, Windows 11
- Raspberry Pi 2, Raspbian 11.8
- QNAP Container Station LXD, Celeron J1900, Debian 11.8
- QNAP Container Station LXD, Celeron J1900, Debian 12.4
Comments to this
On the RPi, upgrading from 11.8 to 12.1 activated the problem.
On QNAP LXD, both 11.8 and 12.4 do not show the problem.
Thus we have Debian 11.8 hosts that exhibit both behaviours, and we have Debian 12 hosts that exhibit both behaviours.
Conclusion
This problem seems quite serious.
It affects recent versions of Debian in combination with Node 20+.
I have seen no problems on macOS or Windows.
I have tested no other Linux distributions than Debian (Raspbian).
Solution
It it seems this is a kernel bug with io_uring, at least according to Node.js/Libuv people. That is consistent with my findings above about affected machines.
There is a workaround for Node.js:
UV_USE_IO_URING=0
It appears to be intentionally undocumented, which I interpret as it will be removed from Node.js when no common kernels are affected.
I will stay away from Node.js 20, at least in prodution, for a year and see how this develops.