Category Archives: Command Line Essentials

CSV file: convert list to table/matrix

Update 2014-05-04: Added -N1/-N2 flags for sorting rows/columns numerically, and uploaded a windows binary.

I found myself having data in CSV-files with three columns; two dimensions and a value. It could look like:

20080102,AAPL,194.84
20090102,AAPL,90.75
20100104,AAPL,214.04
20110103,AAPL,329.57
20120103,AAPL,411.23
20080102,MSFT,35.22
20090102,MSFT,20.33
20100104,MSFT,30.95
20110103,MSFT,27.98
20120103,MSFT,26.765

Data typically looks like this because it is very easy to output transactions on this format. That is very nice if you want to load it into a database. But for other purposes (like plotting a graph using LibreOffice Calc or even Excel) it would be much nicer with a table/matrix-layout:

,AAPL,MSFT
20080102,194.84,35.22
20090102,90.75,20.33
20100104,214.04,30.95
20110103,329.57,27.98
20120103,411.23,26.765

I could not find a standard tool for this. I thought about different options, and finally decided it was quite easy to just write a little program. So I did. You use it like this:

$ ./csv-list2table -t < list.csv > table.csv

There are a few things to think about:

  • The switches -t or -T decides if column 1 or 2 will be rows
  • -N1 and -N2 can be used to treat/sort column 1 and 2 numerically
  • Rows and columns are outputed sorted
  • Holes/missing values outputed as ,,
  • Comma is the only accepted delimiter
  • Input must have exactly 3 columns
  • Pre/post-process data with sed and cut

As the last item mentions, sed can fix a file with other delimiters than comma, and cut can pick the columns you need from a list with more data than you need.

Finally, the code written in standard C: csv-list2table-1.1.c.
Old version: csv-list2table-1.0.c
Windows binary: csv-list2table-1.1.exe (should be no tricky dependencies)
Test file: csv-list2table-test.txt

Building should be trivial:

gcc -O2 -o csv-list2table csv-list2table.c

I dont think the code contains anything that should confuse any c-compiler on any reasonable platform.

Command Line Essentials: Text and Pipeline

Table of Content, Navigating the filesystem

What I will demonstrate now is extremely powerful. This is just examples that do nothing valuable to you, but when you need to get things done on a computer, the same technique can be very productive. You need to combine your creativity and experience to be truly successful.

An example:

essentials@kvaser:~$ ls -l /  |  cat -n
     1	total 109
     2	drwxr-xr-x  2 root root  4096 Feb 18 06:51 bin
     3	drwxr-xr-x  3 root root  1024 Feb 18 07:03 boot
     4	drwxr-xr-x 13 root root  2880 Feb 18 07:06 dev
     5	drwxr-xr-x 78 root root  4096 Mar  1 20:36 etc
     6	drwxr-xr-x  8 root root  4096 Feb 28 21:39 home
     7	drwxr-xr-x 10 root root  8192 Feb 18 06:51 lib
     8	drwxr-xr-x  2 root root 49152 Feb  2 21:01 lost+found
     9	drwxr-xr-x  3 root root  4096 Feb 17 21:39 media
    10	drwxr-xr-x  2 root root  4096 Jan 16 21:45 mnt
    11	drwxr-xr-x  2 root root  4096 Feb  2 21:27 opt
    12	dr-xr-xr-x 87 root root     0 Jan  1  1970 proc
    13	drwxr-xr-x  5 root root  4096 Mar  1 18:17 root
    14	drwxr-xr-x  2 root root  4096 Feb 18 06:52 sbin
    15	drwxr-xr-x  2 root root  4096 Sep 16  2008 selinux
    16	drwxr-xr-x  2 root root  4096 Feb  2 21:27 srv
    17	drwxr-xr-x 11 root root     0 Jan  1  1970 sys
    18	drwxrwxrwt  3 root root  4096 Mar  1 20:39 tmp
    19	drwxr-xr-x 10 root root  4096 Feb  2 21:27 usr
    20	drwxr-xr-x 14 root root  4096 Feb  2 22:38 var
essentials@kvaser:~$ ls -l /  |  cat -n  |  head -n 10  |  tail -n 5
     6	drwxr-xr-x  8 root root  4096 Feb 28 21:39 home
     7	drwxr-xr-x 10 root root  8192 Feb 18 06:51 lib
     8	drwxr-xr-x  2 root root 49152 Feb  2 21:01 lost+found
     9	drwxr-xr-x  3 root root  4096 Feb 17 21:39 media
    10	drwxr-xr-x  2 root root  4096 Jan 16 21:45 mnt

What happens? We first list the contents of the root directory, and add line numbers to the output. Second, we choose line 6-10 and just output those lines.

On the command line, you can interact with a program in different ways. The most common ways are displayed above:

  1. arguments: -l, -n 5, etc (gives program instructions about what to do)
  2. stdout: the output of a program (a list of directories in text format)
  3. stdin: input to a program (in the case of cat, head and tail, connected directly to the output of the program before)

The programs themselves may not seem so powerful. But combined they can surprise you. A fundamental design principle of UNIX is:
“each program should do just one thing, but do that one thing well”
So, you might not find a command that does exactly what you want. But combining a few commands you can easily do advanced things, that the designer of the programs never even thought of. Also, those standard programs are very old, very fast and very high quality. You can trust them to do the job very very well.

Now play a little with the ls-cat-head-tail-example above, and make sure you understand exactly how it works.

If you want to know more about a command you can do (q to exit)

essentials@kvaser:~$ man head
essentials@kvaser:~$ man tail
essentials@kvaser:~$ man cat

A word of warning: the man pages are very detailed, but not very easy to read. If you are lucky you find an example in the man pages (or try Google). Tricky thing is to be aware of what programs actually exist and understand what they do – then the man pages can help with details.

More examples: (space to scroll, q to exit)

essentials@kvaser:~$ find /  |  less

less makes it possible to look at large outputs page by page.

essentials@kvaser:~$ find /  |  grep txt  |  less
find: `/var/run/exim4': Permission denied
find: `/var/run/samba/winbindd_privileged': Permission denied

grep (by default) chooses lines in the input that contains the word you search for. So, this command lists all files with txt in the filename on your filesystem (that you have permission to). Note that errors are not written to stdout but to stderr, and stderr is not (by default) send to the next command. That is why some lines are not caught by less.

If you want to find only files with the extension .txt it gets a little trickier:

essentials@kvaser:~$ find /  |  grep "\.txt$"  |  less
find: `/var/run/exim4': Permission denied
find: `/var/run/samba/winbindd_privileged': Permission denied

The period character (.) is a special character to grep, so if you really want to match “.” you need to write a backslash before it. This is called escaping. The dollar character is also a special character. It matches “the end of the line”, which is what we do want, so we dont escape the dollar character.

Actually, “\.txt$” is a regular expression (often regexp). More about those ones later, but they are super powerful.

A few more commands:

essentials@kvaser:~$ cat /etc/group  |  sort  |  head -n 5
adm:x:4:
audio:x:29:kvaser
backup:x:34:
bind:x:106:
bin:x:2:
essentials@kvaser:~$ cat /etc/group  |  sort  |  cut -d ':' -f 1  |  head -n 5
adm
audio
backup
bind
bin
essentials@kvaser:~$ cat /etc/group  | wc -l
53

So, there is a file /etc/group (I think even in Cygwin). First we sort it and output the top 5 rows. Second we only output the first column (using cut). Third, we just count the lines.

So, now use the programs I have demonstrated above, and improvise. You can use files in the /etc directory as input data.

You can also play with output from

$ ps aux
$ w
$ /sbin/ifconfig  (or ifconfig, or ipconfig)
$ dig theregister.co.uk

Use: cat, head, tail, sort, cut, wc, grep.

Command Line Essentials: Navigating the Filesystem

Table of Content, Text and Pipeline

The filesystem contains files and directories. Files contain data and directories contain files and other directories. There is one single special directory – the root directory – that is inside no other directory. It is the root, where everything starts. Now give three commands:

essentials@kvaser:~$ cd /
essentials@kvaser:/$ pwd
/
essentials@kvaser:/$ ls
bin   dev  home  lost+found  mnt  proc  sbin     srv  tmp  var
boot  etc  lib   media       opt  root  selinux  sys  usr

cd: change directory (the / is the root)
pwd: print working directory
ls: list files and folders

Now, we will list the contents of the root directory and get a little more information:


essentials@kvaser:/$ ls -l
total 109
drwxr-xr-x  2 root root  4096 Feb 18 06:51 bin
drwxr-xr-x  3 root root  1024 Feb 18 07:03 boot
drwxr-xr-x 13 root root  2880 Feb 18 07:06 dev
drwxr-xr-x 78 root root  4096 Feb 28 22:11 etc
drwxr-xr-x  8 root root  4096 Feb 28 21:39 home
drwxr-xr-x 10 root root  8192 Feb 18 06:51 lib
drwxr-xr-x  2 root root 49152 Feb  2 21:01 lost+found
drwxr-xr-x  3 root root  4096 Feb 17 21:39 media
drwxr-xr-x  2 root root  4096 Jan 16 21:45 mnt
drwxr-xr-x  2 root root  4096 Feb  2 21:27 opt
dr-xr-xr-x 83 root root     0 Jan  1  1970 proc
drwxr-xr-x  4 root root  4096 Feb  3 21:04 root
drwxr-xr-x  2 root root  4096 Feb 18 06:52 sbin
drwxr-xr-x  2 root root  4096 Sep 16  2008 selinux
drwxr-xr-x  2 root root  4096 Feb  2 21:27 srv
drwxr-xr-x 11 root root     0 Jan  1  1970 sys
drwxrwxrwt  3 root root  4096 Feb 28 22:17 tmp
drwxr-xr-x 10 root root  4096 Feb  2 21:27 usr
drwxr-xr-x 14 root root  4096 Feb  2 22:38 var

The “d” first in the line means “directory”. Ignore the rest of the columns for now. Lets list the contents of the folder bin:

essentials@kvaser:/$ ls -l bin
total 4996
-rwxr-xr-x 1 root root 790844 Apr 10  2010 bash
-rwxr-xr-x 1 root root 502968 Nov 15 16:46 busybox
-rwxr-xr-x 1 root root  38996 Apr 28  2010 cat
-rwxr-xr-x 1 root root  51244 Apr 28  2010 chgrp
  (many lines)
-rwxr-xr-x 1 root root   2015 Jan 20  2010 zforce
-rwxr-xr-x 1 root root   5597 Jan 20  2010 zgrep
-rwxr-xr-x 1 root root   1733 Jan 20  2010 zless
-rwxr-xr-x 1 root root   2416 Jan 20  2010 zmore
-rwxr-xr-x 1 root root   4952 Jan 20  2010 znew

Here, the number before the date is the size of the file, in bytes.
The “x” means “executable” – that is a program. Confusingly, also directories are considered executable here.

Spend some time exploring your filesystem with cd, pwd, ls, for example:

essentials@kvaser:/$ cd etc/
essentials@kvaser:/etc$ pwd
/etc
essentials@kvaser:/etc$ cd ..
essentials@kvaser:/$ pwd
/
essentials@kvaser:/$ cd usr/bin/
essentials@kvaser:/usr/bin$ pwd
/usr/bin
essentials@kvaser:/usr/bin$ cd ..
essentials@kvaser:/usr$ ls
bin  games  include  lib  local  sbin  share  src
essentials@kvaser:/usr$ 

Do play around more!

All those files and folders are not for you to modify. If you want to make your own folders and fill them with files, first go home:

essentials@kvaser:/$ cd
essentials@kvaser:~$ mkdir MyFirstFolder
essentials@kvaser:~$ touch MyFirstFile
essentials@kvaser:~$ ls -l
total 4
-rw-r--r-- 1 essentials essentials    0 Feb 28 22:39 MyFirstFile
drwxr-xr-x 2 essentials essentials 4096 Feb 28 22:39 MyFirstFolder
essentials@kvaser:~$ pwd
/home/essentials

Several things to note here…
cd without arguments takes you home
touch creates an empty file (size = zero bytes), not executable
ls -l outputs essentials, the user (and group) who owns the file. There is a special user (and group) named root who owns most other files in the filesystem. The root user is a completely different thing than the root directory.

You can clean up again:

essentials@kvaser:~$ rm MyFirstFile 
essentials@kvaser:~$ rmdir MyFirstFolder/
essentials@kvaser:~$ ls
essentials@kvaser:~$ 

If you run cygwin (under Windows) you can try

$ cd /cygdrive/c

which takes you to your Windows c-drive.

Finally, end your session:

essentials@kvaser:~$ exit
exit

Command Line Essentials: Introduction

I have decided to try to write a Command Line Essentials guide. I intend to start from the beginning and make it very easy.

Table of contents

  1. Navigating the file system
  2. Text and Pipeline

Getting started
The primary platform will be Linux (Debian). Most things will apply to other Linux distributions, Mac OS X as well as Cygwin.

Linux: just start a terminal.
Mac OS X: start Terminal application, under Application/Utilities.
Windows: install and start cygwin.

What matters more than the operating system is the shell (more on this later – I will use bash). Bash looks something like this:

essentials@kvaser:~$ 

Here essentials is my username and kvaser is the name of the computer. To make sure it is Bash, print the value of the variable SHELL:

essentials@kvaser:~$ echo $SHELL
/bin/bash

Note: case is essential.

If you have been given remote access to a Linux system and you are on Windows; install putty. You will need host address, username and password. First thing you should do when you log in this way is to change your password.

essentials@kvaser:~$ passwd