I will present two findings that I find strange in this post:
- The performance of Node.js (V8?) has clearly gotten consistently worse with newer Node.js versions.
- The standard library sort (Array.prototype.sort()) is surprisingly slow, often slower than a simple textbook mergesort.
My findings in this article are based on running a simple program mergesort.js on different computers and different node versions.
You may also want to read this article about sorting in Node.js. It applies to V8 version 7.0, which should be used in Node.js V11.
The sorting algorithms
There are three sorting algorithms compared.
- Array.prototype.sort()
- mergesort(), a textbook mergesort
- mergesort_opt(), a mergesort that I put some effort into making faster
Note that mergesort is considered stable and not as fast as quicksort. As far as I understand from the above article, Node.js used to use quicksort (up to V10), and from V11 uses something better called Timsort.
My mergesort implementations (2) (3) are plain standard JavaScript. Nothing fancy whatsoever (I will post benchmarks using Node.js v0.12 below).
The data to be sorted
There are three types of data to be sorted.
- Numbers (Math.random()), compared with a-b;
- Strings (random numbers converted to strings), compared with default compare function for sort(), and for my mergesort simple a<b, a>b compares to give -1, 1 or 0
- Objects, containing two random numbers a=[0-9], b=[0-999999], compared with (a.a-b.a) || (a.b-b.b). In one in 10 the value of b will matter, otherwise looking at the value of a will be enough.
Unless otherwise written the sorted set is 100 000 elements.
On Benchmarks
Well, just a standard benchmark disclaimer: I do my best to measure and report objectively. There may be other platforms, CPUs, configurations, use cases, datatypes, or array sizes that give different results. The code is available for you to run.
I have run all tests several times and reported the best value. If anything, that should benefit the standard library (quick)sort, which can suffer from bad luck.
Comparing algorithms
Lets start with the algorithms. This is Node V10 on different machines.
(ms) ===== Numbers ===== ===== Strings ===== ==== Objects =====
sort() merge m-opt sort() merge m-opt sort() merge m-opt
NUC i7 82 82 61 110 81 54 95 66 50
NUC i5 113 105 100 191 130 89 149 97 72
NUC Clrn 296 209 190 335 250 196 287 189 157
RPi v3 1886 1463 1205 2218 1711 1096 1802 1370 903
RPi v2 968 1330 1073 1781 1379 904 1218 1154 703
The RPi-v2-sort()-Numbers stand out. Its not a typo. But apart from that I think the pattern is quite clear: regardless of datatype and on different processors the standard sort() simply cannot match a textbook mergesort implemented in JavaScript.
Comparing Node Versions
Lets compare different node versions. This is on a NUC with Intel i5 CPU (4th gen), running 64bit version of Ubuntu.
(ms) ===== Numbers ===== ===== Strings ===== ==== Objects =====
sort() merge m-opt sort() merge m-opt sort() merge m-opt
v11.13.0 84 107 96 143 117 90 140 97 71
v10.15.3 109 106 99 181 132 89 147 97 71
v8.9.1 85 103 96 160 133 86 122 99 70
v6.12.0 68 76 88 126 92 82 68 83 63
v4.8.6 51 66 89 133 93 83 45 77 62
v0.12.9 58 65 78 114 92 87 55 71 60
Not only is sort() getting slower, also running “any” JavaScript is slower. I have noticed this before. Can someone explain why this makes sense?
Comparing different array sizes
With the same NUC, Node V10, I try a few different array sizes:
(ms) ===== Numbers ===== ===== Strings ===== ==== Objects =====
sort() merge m-opt sort() merge m-opt sort() merge m-opt
10 000 10 9 11 8 12 6 4 7 4
15 000 8 15 7 13 14 11 6 22 7
25 000 15 35 12 40 27 15 11 25 18
50 000 35 56 34 66 57 37 51 52 30
100 000 115 107 97 192 138 88 164 101 72
500 000 601 714 658 1015 712 670 698 589 558
Admittedly, the smaller arrays show less difference, but it is also hard to measure small values with precision. So this is from the RPi v3 and smaller arrays:
(ms) ===== Numbers ===== ===== Strings ===== ==== Objects =====
sort() merge m-opt sort() merge m-opt sort() merge m-opt
5 000 34 57 30 46 59 33 29 52 26
10 000 75 129 64 100 130 74 63 104 58
20 000 162 318 151 401 290 166 142 241 132
40 000 378 579 337 863 623 391 344 538 316
I think again quite consistently this looks remarkably bad for standard library sort.
Testing throughput (Version 2)
I decided to measure throughput rather than time to sort (mergesort2.js). I thought perhaps the figures above are misleading when it comes to the cost of garbage collecting. So the new question is, how many shorter arrays (n=5000) can be sorted in 10s?
(count) ===== Numbers ===== ===== Strings ===== ==== Objects =====
sort() merge m-opt sort() merge m-opt sort() merge m-opt
v11.13.0 3192 2538 4744 1996 1473 2167 3791 2566 4822
v10.15.3 4733 2225 4835 1914 1524 2235 4911 2571 4811
RPi v3 282 176 300 144 126 187 309 186 330
What do we make of this? Well the collapse in performance for the new V8 Torque implementation in Node v11 is remarkable. Otherwise I notice that for Objects and Node v10, my optimized algorithm has no advantage.
I think my algorithms are heavier on the garbage collector (than standard library sort()), and this is why the perform relatively worse for 10s in a row.
If it is so, I’d still prefer to pay that price. When my code waits for sort() to finish there is a user waiting for a GUI update, or for an API reply. I rather see a faster sort, and when the update/reply is complete there is usually plenty of idle time when the garbage collector can run.
Optimizing Mergesort?
I had some ideas for optimizing mergesort that I tried out.
Special handling of short arrays: clearly if you want to sort 2 elements, the entire mergesort function is heavier than a simple function that sorts two elements. The article about V8 sort indicated that they use insertion sort for arrays up to length 10 (I find this very strange). So I implemented special functions for 2-3 elements. This gave nothing. Same performance as calling the entire mergesort.
Less stress on the garbage collector: since my mergesort creates memory nodes that are discarded when sorting is complete, I thought I could keep those nodes for the next sort, to ease the load on the garbage collector. Very bad idea, performance dropped significantly.
Performance of cmp-function vs sort
The relevant sort functions are all K (n log n) with different K. It is the K that I am measuring and discussing here. The differences are, after all, quite marginal. There is clearly another constant cost: the cost of the compare function. That seems to matter more than anything else. And in all cases above “string” is just a single string of 10 characters. If you have a more expensive compare function, the significance of sort() will be even less.
Nevertheless, V8 is a single threaded environment and ultimately cycles wasted in sort() will result in overall worse performance. Milliseconds count.
Conclusions
Array.prototype.sort() is a critical component of the standard library. In many applications sorting may be the most expensive thing that takes place. I find it strange that it does not perform better than a simple mergesort implementation. I do not suggest you use my code, or start looking for better sort() implementations out there right away. But I think this is something for JavaScript programmers to keep in mind. However, the compare function probably matters more in most cases.
I find it strange that Node v11, with Timsort and V8 Torque is not more of an improvement (admittedly, I didnt test that one very much).
And finally I find it strange that Node.js performance seems to deteriorate with every major release.
Am I doing anything seriously wrong?
0 Comments.