A few days ago I wrote a post about how to convert a csv list to a table. I spent some time writing my program in C (for fun and for performance).
Now I have written an equivalent program in Python, and I am going to present the differences.
First the sourcecode: csv-list2table.c and csv-list2table.py.
About the C implementation
The C implementation uses only standard C library (stdlib.h, stdio.h, string.h). The good thing is that the program is very portable and easy to compile. The bad thing is that I do not have access to powerful datatypes (hash tables, balanced trees, sets). This basically means wasting time reinventing the wheel, and not getting a perfect wheel anyway. My estimation is that the C implementation is O(n) for suitable input (fairly sorted data) and O(n)Sqrt(n) for unsuitable data (reversely sorted). This is for fairly “square” matrices. The C implementation is not really optimized (perhaps a later excercise).
About the python implementation
The python implementation was written after the C implementation, and very nicely utilizes the python built in set and dict datatypes. Should be O(n) or O(n)log(n) depending on Python dict implementation.
The machine
The benchmarks are performed using the time command on an AMD Athlon II X2 250 machine with 4GB of RAM. The machine runs x64 Xubuntu 11.10.
The tests
C | Python | |
Environment | gcc 4.6.1, -O2 | Python 2.7.2+ |
Lines of code (excl help text) |
355 lines | 53 lines |
Coding time | 8h | 1h |
Input size | Execution time | |
100×100 288kb |
Sorted data: 0.014s Reversed data: 0.027s |
0.11s |
200×200 1.2Mb |
0.039s – 0.052s | 0.18s |
400×400 4.5Mb |
0.16s – 0.25s | 0.64s |
800×800 18Mb |
0.57s – 1.2s | 2.6s |
1600×1600 72Mb |
2.4s – 7.8s | 10s |
3200×3200 288Mb |
9.7s – 51s | 41s |
6400×6400 1.2Gb |
40s – 337s | Not enough RAM |
RAM usage for 3200×3200 |
152Mb | 1.4Gb |
Conclusions
I draw the following conclusions:
- Python is a very efficient language compared to C, when it comes to producing code fast, in this case about 10x faster
- Python is impressively fast, even startup overhead is reasonable
- Python can beat C on performance, when C programmer has not found optimal data type/algorithm. The Python datatypes makes it easy to write code that is fast even for large data
- A well written C program uses 10% the RAM of Python, and is at least 5x faster. Especially the small RAM requirement is very powerful and valuable for many applications
It is tempting to optimize the C program to see if I can get 2x, 3x or 5x speedup.
It is also tempting to write a C program that uses Glib, and have access to well implemented data types. How would it compare?
0 Comments.