Des Profundis...: kernel programming

Showing posts with label kernel programming. Show all posts

Tuesday, 9 September 2008

Translation Lookaside buffer, aka TLB

in a few words from the wikipedia article: a CPU cache used memory management hardware to improve the speed of virtual address translation.(wikipedia).

Much information comes from this article.

The idea is that CPUs keep an associative memory to cache page table entries (PTEs) of virtual pages which were recently accessed.

When the CPU must access virtual memory, it looks up in the TLB for a number corresponding to the entry to obtain.

If an entry was found (a TLB hit), then the CPU can use the value of of the PTE which accessed and calculate the physical address.

In case it was not found (a TLB miss), then depending on the architecture, the miss is handled:

through hardware, then the CPU tries to walk the page table and find the correct PTE. if one is found the TLB is updated, if none is found then the CPU raises a page fault, which is then treated by the operating system.
through software, then the CPU raises a TLB miss fault. The operating system intercepts the miss fault and invoke the corresponding handler, which walks the page. if the PTE is found, it is marked present and the TLB is updated. if it not present, the page fault handler is then in charge.

Mostly, CISC (IA-32) use hardware, while RISC (alpha) use software. IA-64 uses an hybrid approach because the hardware approach is faster but less flexible as the software one.

Replacement policy

If the TLB is full, some entries must be replaced. For this depending on the miss handling strategy, different strategies and policy exist:

Least recently used (aka LRU)
Not recently used (aka NRU)

though the TLB miss mechanism is implemented in software, the replacement strategy can be implemented using hardware. This is performed by a number of new architectures: IA-64...

Ensure coherence with page Table

Another issue is to keep the TLB coherent with the page table it represents.

Wednesday, 3 September 2008

Read Copy Update

Read Copy Update (aka RCU) is another synchronisation mechanism in order to avoid reader writer locks.

An excellent explanation can be found at the LWN.net in three parts by Paul McKenney and Jonathan Walpole:

The basic idea behind it is that when a resource is modified, a new updated structure is put in its place and the old structure is not discarded right away, it waits until references to this structure by other processes are dropped. It can be seen as similar to the concept of garbage collection, but as noted in What is RCU? Part 2: usage, the old structure is not discarded automatically when there are no references any more and the programmer must indicate the critical read portions of the code.

There is an interesting page on the RCU argueing that this technique is used more and more in the kernel as a replacement for the reader writer locks.

Tuesday, 2 September 2008

Kernel Locking mechanisms

An important aspect of programming in an environment with threads and processes is to prevent the different processes to interfer with the functionalities of other processes at the wrong time.

In linux, a number of methods are used to ensure that the data or code section of processes is not disturbed by others. These methods are:

atomic operations
spinlocks
semaphore
reader writer locks

These locks and mechanisms are in the kernel space. Other methods or locking mechanisms are used in the user space.

atomic operations

The idea behind atomic operations is to perform very basic changes on variable but which cannot be interfered by other processes, because they are so small. For this, special data type is used called: atomic_t.

On this data type, a number of atomic operations can be performed:

function	description
atomic_read(atomic_t *v)	read the variable
atomic_set(atomic_t *v, int i)	set the variable to i
atomic_add(int i, atomic_t *v)	add i to the variable
atomic_sub(int i, atomic_t *v)	substract i to the variable
atomic_sub_and_test(int i, atomic_t *v)	substract i to the variable, return true value if 0 else return false
atomic_inc(atomic_t *v)	increment the variable
atomic_inc_and_test(atomic_t *v)	increment the variable, return true value if 0 else return false
atomic_dec(atomic_t *v)	decrement the variable
atomic_dec_and_test(atomic_t *v)	decrement the variable, return true value if 0 else return false
atomic_add_negative(int i, atomic_t *v)	add i to the variable, and return true if its value is negative else false

Note that I discussed in another post the local variables for CPUs.

spinlocks

This kind of locking is used the most, above all to protect sections for short periods from access of other processes.

The kernel checks continuously whether a lock can be taken on the data. This is an example of busy waiting.

spinlocks are used in the following way:

spinlock_t lock = SPIN_LOCK_UNLOCKED;
...
spin_lock(&lock);
/** critical operations */
spin_unlock(&lock);

Due to the busy waiting, if the lock is not released... the computer may freeze, therefore spinlocks should not be used for long times.

semaphores

Unlike linux spinlocks, the kernel sleeps while waiting for the release of the semaphore. Contrary to spinlocks, this kind of structure should only be used for locks which have a certain length, while for short locks using linux spinlocks is recommended.

DECLARE_MUTEX(mutex);
....
down(&mutex);
/** critical section*/
up(&mutex);

The waiting processes then sleep in an uninterruptable state to wait for the release of the lock. The process cannot be woken up using signals during his sleep.

There are other alternatives to the down(&mutex) operation:

down_interruptible(&mutex) : the process can be woken up using signals
down_trylock(&mutex): if the lock was successful then the process goes on and does not sleep

For the user space, there are also futexes.... But this is another story.

reader writer locks

Using this kind of locks, processors can read the locked data structure but when the structure is to be written the structure can only be manipulated by one processor at a time.

Monday, 1 September 2008

GIT tutorial

I was having a look at the git tutorial. The important tasks: 1/ download the code of the linux kernel from git:

>git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6

2/ pulling new code from git:

> git pull

3/ reverse code changes

> git checkout -f

4/ commiting the modifications:

> git commit -a

5/ undo last commits (note it is different from a revert which consists of a patch reverting some other patch)

> git reset HEAD~2

6/ list branches

> git branch

7/ create branch

> git checkout -b my-new-branch-name master

8/ choose a branch and make it the current one:

> git checkout branch

9/ Tell which branch is current

> git status

10/ merging code into a branch mybranch

> git checkout mybranch

> git merge anotherbranch

Friday, 29 August 2008

Linux Kernel Mailing List FAQ

One page I had not noticed, and which seems to contain lots of interesting features: the Linux Kernel Mailing List FAQ. There are some nice infos for newbies... But not so much information except a lot of very nice pointers to different useful places.

Linux Kernel: Copy on Write

Copy on Write is an important technique in the linux kernel memory management. The basic idea is to prevent the creation of unnecessary copies of structures when creating new processes. For this, when a child process is created, the kernel first provides only readable access to both parent and son. If one of the process need writable access, at the time it needs to write data in memory, the kernel throws a page fault indicating that a new copy should be created for the asking process. Thus, the data is actually copied only when the process actually tries to write.

Some information is of course available on Wikipedia on the Copy on Write page. Note that there is an acronym COW for Copy on Write.

Wednesday, 27 August 2008

Memory Management Documentation in Linux Kernel

As I was looking at the kernel newbies info. I found this post from Mel Gorman about documentation for the Memory Management under Linux. It contains links to two documents:

Wednesday, 11 June 2008

Kernel KnowHow: Per CPU Variables

I just read the section on per CPU variables from the Linux Device Drivers book from O'Reilly (TODO link).

The use of per CPU variable helps improving performance by limiting the cache sharing between CPUs. The idea is that each CPU has his own private instance of a given variable.

A variable can be defined using the following macro: DEFINE_PER_CPU(type,name);

It is important to note that a kind of locking mechanism is still needed for the CPU variables in case:

the processor would move the process to another cpu
the processor would be preemted in the middle of a modification of the variable

Therefore, each instance should be updated using a simple mechanism to lock the variable during the changes. This can be done by using the get_cpu_var() and put_cpu_var() functions, for example:

get_cpu_var(variabletoupdate)++;
put_cpu_var(variabletoupdate);

It is also possible to access the variable values from other processors using: per_cpu(variable, int cpu_id); but a locking mechanism must be implemented for these cases.

Memory allocation for dynamically allocated per-CPU variables is performed using:

void *alloc_percpu(type);
void *__alloc_percpu(size_t size , size_t align);

Deallocation is achieved using: free_percpu().

Accessing a dynamically allocated per-CPU variable is performed by using

per_cpu_ptr(void * per_cpu_var, int cpu_id).

It returns a pointer to the variable content. For example,

int cpu;
cpu= get_cpu();
ptr = per_cpu_ptr(per_cpu_var, cpu);
put_cpu();

Note: the use of the method to free the cpu from the lock.

Monday, 26 May 2008

Kernel Makefile

In this post, I sum up the main Makefile parameters and targets, right now it mainly corresponds to $make help but I might edit this entry to add useful information.

First of all use (if in the directory of the kernel sources)

$ make help

or if you are not in the directory of the kernel sources (then located in )

$ make -C help

This more or less gives the following information.

This outputs a list of possible target and supplementary information:

a few variable can be set
Documentation
- make [htmldocs|mandocs|pdfdocs|psdocs|xmldocs] ->>> build the corresponding docs
Packages
- make rpm-pkg > allows to build src and binary rpm packages
- make binrpm-pkg > allows to build binary rpm packages
- make deb-pkg > allows to build deb packages
- make tar-pkg > allows to build uncompressed tarball
- make targz-pkg > allows to build gzipped compressed tarball
- make tarbz2-pkg > allows to build bzip2 compressed tarball
Cleaning targets:
- clean - Remove most generated files but keep the config and enough build support to build external modules
- mrproper - Remove all generated files + config + various backup files
- distclean - mrproper + remove editor backup and patch files
Note that it can be useful to use ARCH=... in the cleaning process
Kernel Configuration targets:
- config - Update current config utilising a line-oriented program
- menuconfig - Update current config utilising a menu based program
- xconfig - Update current config utilising a QT based front-end
- gconfig - Update current config utilising a GTK based front-end
- oldconfig - Update current config utilising a provided .config as base
- silentoldconfig - Same as oldconfig, but quietly
- randconfig - New config with random answer to all options
- defconfig - New config with default answer to all options
- allmodconfig - New config selecting modules when possible
- allyesconfig - New config where all options are accepted with yes
- allnoconfig - New config where all options are answered with no
Other useful targets:
- prepare - Set up for building external modules
- all - Build all targets marked with [*]
  - * vmlinux - Build the bare kernel
  - * modules - Build all modules
  - modules_install - Install all modules to INSTALL_MOD_PATH (default: /)
- dir/ - Build all files in dir and below
- dir/file.[ois] - Build specified target only
- dir/file.ko - Build module including final link

Note that there are also some little things about tags for editors, but I am not so sure what it really brings.

Thursday, 21 February 2008

Kernel - User Mode Linux

While I was playing with kernel programming, I managed to hang my linux box. That's why I decided to use again a usermode linux box to do my module programming. First step was to start an linux user box. As specified on the User-mode Linux Kernel Home Page, download the uml linux kernel version as well as the filesystem file which should be used. For this a number of file system are possible, see for example here. In order to build a uml ou need to download the sources of the kernel. You will also need the readline libraries and the fuse libraries. You might also need others. I haven't yet checked that. Once you have these elements ou can compile a unix kernel with usermode support. Then the only remaining things to do were:

host% make defconfig ARCH=um
host% # now make menuconfig or xconfig if desired
host% make menuconfig ARCH=um
host% make ARCH=um

Note the importance of the ARCH=um parametrisation. The menuconfig is useful for further configuration of your kernel.

Des Profundis...