Peculiar Compiler Optimizations

My teacher in a High Performance Computing class once told me not to confuse the compiler. This was in the late 90s, and SGI C and Fortran compilers were supposed to replace entire blocks of code with highly optimised implementations. As long as the compiler understood your intentions.

I have never discovered this, but yesterday perhaps! Read on.

I have been playing around with LISP, solving Project Euler challenges on Hackerrank. For problems 44 and 45 I decided to do Binary Search (which afterwards turned out not to be so smart, but that is another story) and took the implementation from Rosetta Code (the iterative one).

Binary search is about finding an element in a sorted array by starting in the middle, and jumping left or right, cutting the remaning array in half each time.

In my case I decided just a part of the array was worth searching so instead of searching the entire array [0..length] I wanted to search up to the Nth element [0..N]. Searching fewer elements should be faster, so I improved the binary search function to take an additional argument: hi. For SBCL (Steel Bank Common Lisp), this surprisingly had horrible effect on performance.

Benchmark Results
The results for different algorithms, different machines and different LISP implementations follow. The RPi V1 runs Arch Linux, Clisp comes with Arch, SBCL is downloaded from SBCL webpage. The RPi V2 runs Raspbian and SBCL that comes with the distribution. The Celeron runs Ubuntu that comes with SBCL. The MacBook Air runs OS X and SBCL is downloaded separately.

 (times in seconds)             Array Size Standard Optimized Recursive    C -O2
================================================================================
RPi V1  ARMv6 700MHz  Clisp 2.49      5000    640       633       720
                      SBCL  1.3.12    5000     15.6      27        34       0.95
RPi V2  ARMv7 900MHz  SBCL  1.2.4     5000      6.2      16        17       0.31
                                     20000    110       293       300       5.8
NUC Celeron J3455     Clisp 2.49     20000    765       762       720   
                      SBCL  1.3.3    20000      8.3      16.7      18.0     1.0
MacBook Air i5        SBCL  1.2.11   20000      4.0      11.5      12.3     0.75

A very slight “optimization” turns out to have very negative impact on performance for the quite fast (compiled) SBCL. I can’t imagine any other explanation than SBCL replaces the standard binary search with optimized code rather than executing my program. For Clisp the optimization actually works quite as would be expected and the recursive code is actually the fastest. On the Celeron, Clisp and SBCL behaves completely opposite.

Comparing to C
The other week I had the feeling (SBCL) LISP was fast and decided to compare LISP to C. This time I had the feeling that LISP was rather slow so I ported my test program to C (basically line by line). Well, I found that SBCL is actually pretty fast (especially on x86/x64), and C was faster only thanks to -O2 on some systems. -O2 actually made the C-program more than 5 times faster: perhaps also the C-compiler replace the entire binary search?

The Test Program
The code follows. The only difference between Standard and Optimized is the single line that is commented out (with ; in LISP) selecting which binary search to run (the length of the function name does not explain the performance difference).

The program creates an array of length N and populates it with values by the formula n(n+1)/2. This takes little time. It then checks for values 10,20,30… if the values are found in the array (using binary search). In this program the entire array is always searched, not taking advantage of the extra parameter (although the optimized version does not need to find the length of the array every time called).

(defun binary-search (value array)                       ; Standard 2 lines
    (let ((low 0) (high (1- (length array))))            ;
        (do () ((< high low) nil)
            (let ((middle (floor (+ low high) 2)))
                (cond ((> (aref array middle) value)
                       (setf high (1- middle)))
                      ((< (aref array middle) value)
                       (setf low (1+ middle)))
                      (t (return middle)))))))

(defun binary-search-optimized (value array hi)          ; Optimized 2 lines
    (let ((low 0) (high hi))                             ;
        (do () ((< high low) nil)
            (let ((middle (floor (+ low high) 2)))
                (cond ((> (aref array middle) value)
                       (setf high (1- middle)))
                      ((< (aref array middle) value)
                       (setf low (1+ middle)))
                      (t (return middle)))))))

(defun binary-search-r (value
                        array
                        &optional (low 0)
                        (high (1- (length array))))
  (if (< high low)
      nil
      (let ((middle (floor (+ low high) 2)))
        (cond ((> (aref array middle) value)
               (binary-search-r value array low (1- middle)))
              ((< (aref array middle) value)
               (binary-search-r value array (1+ middle) high))
              (t middle)))))

(defun formula (n)
    (/ (* n (+ n 1)) 2))

(defun init-array (n)
    (let ((arr (make-array n)))
        (loop for i from 0 to (1- n) do
            (setf (aref arr i) (formula (1- i))))
        arr))

(defun solve (arr n max)
    (let ((ret 0))
        (loop for i from 10 to max by 10 do
            (if (binary-search i arr)                     ; Toggle code used
;           (if (binary-search-r i arr)                   ;
;           (if (binary-search-optimized i arr n)         ;
                (incf ret)
                Nil))
        ret))
            
(defun main ()
    (let ((n (read)))
        (let ((arr (init-array n)))
            (format T "~D~%" (solve arr (1- n) (aref arr (1- n)))))))

(main)

Since I am a very novice LISP programmer I appreciate any feedback. The code above does not solve Project Euler 44 or 45, it is much simplified to test binary search. Initially I wrote code that relied on recursion rather than loops but I exceeded the stack size and ended up with loops (according to what I read, loops rather than recursion is the preferred style of Common Lisp).

Conclusion
Well... optimization is hard, and dont make any assumptions. As I have found many times before, what makes code faster on some platforms can make it slower on others. When it comes to optimizing SBCL and compiled LISP much experience is required, and dont forget to measure!

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.