Python vs C

A few days ago I wrote a post about how to convert a csv list to a table. I spent some time writing my program in C (for fun and for performance).

Now I have written an equivalent program in Python, and I am going to present the differences.
First the sourcecode: csv-list2table.c and csv-list2table.py.

About the C implementation
The C implementation uses only standard C library (stdlib.h, stdio.h, string.h). The good thing is that the program is very portable and easy to compile. The bad thing is that I do not have access to powerful datatypes (hash tables, balanced trees, sets). This basically means wasting time reinventing the wheel, and not getting a perfect wheel anyway. My estimation is that the C implementation is O(n) for suitable input (fairly sorted data) and O(n)Sqrt(n) for unsuitable data (reversely sorted). This is for fairly “square” matrices. The C implementation is not really optimized (perhaps a later excercise).

About the python implementation
The python implementation was written after the C implementation, and very nicely utilizes the python built in set and dict datatypes. Should be O(n) or O(n)log(n) depending on Python dict implementation.

The machine
The benchmarks are performed using the time command on an AMD Athlon II X2 250 machine with 4GB of RAM. The machine runs x64 Xubuntu 11.10.

The tests

C Python
Environment gcc 4.6.1, -O2 Python 2.7.2+
Lines of code
(excl help text)
355 lines 53 lines
Coding time 8h 1h
Input size Execution time
100×100
288kb
Sorted data: 0.014s
Reversed data: 0.027s
0.11s
200×200
1.2Mb
0.039s – 0.052s 0.18s
400×400
4.5Mb
0.16s – 0.25s 0.64s
800×800
18Mb
0.57s – 1.2s 2.6s
1600×1600
72Mb
2.4s – 7.8s 10s
3200×3200
288Mb
9.7s – 51s 41s
6400×6400
1.2Gb
40s – 337s Not enough RAM
RAM usage for
3200×3200
152Mb 1.4Gb

Conclusions
I draw the following conclusions:

  • Python is a very efficient language compared to C, when it comes to producing code fast, in this case about 10x faster
  • Python is impressively fast, even startup overhead is reasonable
  • Python can beat C on performance, when C programmer has not found optimal data type/algorithm. The Python datatypes makes it easy to write code that is fast even for large data
  • A well written C program uses 10% the RAM of Python, and is at least 5x faster. Especially the small RAM requirement is very powerful and valuable for many applications

It is tempting to optimize the C program to see if I can get 2x, 3x or 5x speedup.
It is also tempting to write a C program that uses Glib, and have access to well implemented data types. How would it compare?

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.