Well I have been learning to use lotus notes at work... And I am not so sure how good it is. There are many things that would need to be improved. It is used as a platform for some software which thus provides as a whole an interesting CRM solution. But I believe there is much room for improvement there.
Friday, 5 December 2008
Subversion and unicode
I was having some difficulty at work with documents which were not saved as unicode in the subversion repository. This caused problems with eclipse which has not been able to open it.
What I did to solve my problem was to use the command: file to determine the encoding, and use pconv to reencode the documents.
I am planning to do a small perl script (though some java program one would be also OK) which downloads files, checks the type and then reload and then performs a commit on the documents. However, I must make sure there is no problem with different encodings of the document for the history.
RAP
Well I learnt some things about RAP these days and I recalled a few things about eclipse I had forgotten as well I learnt new ones.
Part of the interesting things are targets in eclipse, which allow you to choose exactly the plugins available to the eclipse platform. This is particularly helpful when some packages conflict with each other.
In the case of RAP, the RCP and RAP plugins seems to have difficulty working together. I must admit though that I do not know what is the issue there.
So expect a RAP intro tutorial and perhaps other issues.
Tuesday, 18 November 2008
Common Errors in C
Sunday, 2 November 2008
Java Workqueue API
- executors
- queues
- concurrent collections
- synchronizers
- atomic (in package: java.util.concurrent.atomic )
- locks (in package: java.util.concurrent.locks )
Executors
Executors are containers taking care of the execution of tasks, just as SwingUtilities used to do it. Different kinds of executors are imaginable:- DirectExecutor (performing the task but not asynchronously)
- ThreadPerTaskExecutor (one thread is created for each task to be performed)
- SerialExecutor (each task is performed sequentially)
- ThreadPoolExecutor (executor performing using a pool of threads)
- ScheduledThreadPoolExecutor (executor performing using a pool of threads at certain specified moments or periodically)
Queues
Another useful data structure for performing task in parallel and asynchronously are queues (also known as FIFO data structures). The java.util.concurrent package provides a number of data structures for this purpose too. One particular kind of FIFO are blocking queues, for which five different versions exist:- LinkedBlockingQueue
- ArrayBlockingQueue
- SynchronousQueue
- PriorityBlockingQueue
- DelayQueue
Synchronizers
A number of possible techniques can be used to synchronize threads. Among these are semaphores which we discussed in the linux kernel context. Other types provided by the java.util.concurrent package, such as the CountDownLatch, used to block some actions until a number of events or elements have been counted, the CyclicBarrier, which is a resettable multiway synchronization mechanism, an exchanger to allow threads to exchange objects at some definite point and the already mentioned locks.Concurrent Collections
The java.util Collection Framework already contained a number of snychronized or synchronizable classes. However, the java.util.concurrent package introduces new structures useful in a multithreaded context:- ConcurrentHashMap,
- ConcurrentSkipListMap,
- ConcurrentSkipListSet,
- CopyOnWriteArrayList, and
- CopyOnWriteArraySet
Timing Units
The java.util.concurrent package also introduces a new class TimingUnit to provide different kind of granularity as to which unit of interval for time measurements should be used. Here again, it would be useful to take a look at the implementation of the kernel and compare.Atomic and Locks
Since atomic variables and locks are in other packages, I will describe them in other entries. However, it is again interesting to note that the same topics were already discussed in other entries of this blog.Flash with Flex
So I tried a few things with Flash now that I more or less know the basics of flex (and only the basics).
So I wrote my first little movie in flash and compiled it using flex's compiler.
I can not really do an introduction yet to flash and flex. I still need to get use to too many things. For example, I am not really happy not knowing which data structures are available in action script. Maybe I have been spoiled by java using the java.util collections. Yet I think it sensible to expect the existence of a number of standard libraries one can use to prevent the inventingthewheelitis.
One thing I found at least im internet is from a developer from Mannheim who wrote some data structures for games in action script. I will have to take a look at it. It sure sounds really interesting.
I have also been interested in what is the best way to create small movies for fun. So I just thought of the general object oriented structure of my character creations. Actually I already had some project of the sort in java. But I had not that much because other priorities popped up as they always do.
Oprofile - What are thou ?
O.K Apart from doing stupid references to films I actually have not really liked. What am I doing?
After reading a few mails from the Linux Kernel Mailing list, I found the following tool which seems useful: oprofile. I must admit I still do not have a clear idea of all the possibilities that this tool offers.
The first thing to know it is a profiler and it is a profiler capable of giving a number of information on a program with quite a low overhead. But what is a profiler?
A profiler collects information on running processes of a computer to analyze the performance of a program (see the wikipedia article on performance analysis).
It gives the possibility to look at the following aspects of a system:
- Call-graphs (i.e it looks at the graph of the functions calls of programs)
- libraries used by a program
- instruction level information collections (also on assembly code level)
Saturday, 25 October 2008
Interesting Week
- network problems with TCP/UDP broadcast
- flash/flex sandbox security issues
Monday, 6 October 2008
Performance according to Martin Fowler
- response time
- responsiveness
- latency
- throughput
- the load
- load sensitivity
- efficiency
- capacity
- response time
- the time it take to a system to process a request
- responsiveness
- the time that the system takes to acknowledge a request
- latency
- the minimum time that the system needs to perform any task even if the task has nothing to be done
- throughput
- how much work can be done in a given amount of time
- load
- how much strain are put on the resources (for example how many users ( or processes ) are logged on (or running)
- load sensitivity
- an indicator of the variation in the response time depending on the load
- efficiency
- the efficiency is the performance (i.e, either throughput or response time) divided by resources
- capacity
- indicate the maximum effective throughput or load for the system
Thursday, 2 October 2008
What's the ack is that ?
Well! as I was looking at the programs I was updating on my fedora 9 box. I fell upon the program ack. I had no idea what it was so I ran rpm -qi ack... And it told me that it was a kind of grep replacement. It seems to be faster than grep in most cases. Some people tried it out it seems. Though some other seem to disagree.
One of the advantage of ack is that it allows to select more easily the type of files to be checked. So I guess I will have to take a look at it....
Monday, 29 September 2008
Some Ideas for Desktop Improvements
To Do List
I want a to do list which is more or less always present when the desktop is on.- classification by priorities
- important and urgent
- not important but urgent
- important and not urgent
- not important and not urgent
- not classified
- classification by subject
- work
- administrativ
- Hobby
- Family
- Style of the task and Icons
- Position of Tasks as Desktop Icons
- Size of Desktop Icons
Organized Important Files
I want that the files which I have on my desktop are organized in a meaningful way, for example in thematic and time oriented way. From left to right time oriented, top to bottom thematic. Of course the classification cannot be automatic for the thematic way. Moreover, the time oriented way might not always be relevant.Friday, 26 September 2008
Firefox Plugins
- No Script
- an add on to control easily whether scripts (java, javascript, flash) are allowed to be performed. This is a per domain enabling or not.
- Download Them All
- with this add on, it easier to download many resources from a web page
- Download Status Bar
- adds a status bar which shows the status of downloading of things from firefox.
- Greasemonkey
- using greasemonkey allows to write scripts to be performed on top of web pages
- Firebug
- This is a utility to help in developing javascripts.
Monday, 22 September 2008
Hibernate
Second Edition of Hibernate in Action
Christian Bauer and Gavin King
November, 2006 | 880 pages
ISBN: 1-932394-88-5
The goal is to map the objects created by a object programming language such as Java to a relational database in order to provide persistent objects, i.e object which can be stored on disk and which do not disappear when the virtual machine shutdowns. Hibernate performs the mapping using configurations files in xml (or other). Here is an example of XML file for a mapping called tasks.hbm.xml:
<!DOCTYPE hibernate-mapping PUBLIC
"-//Hibernate/Hibernate Mapping DTD//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping>
<class
name="mytasks.Task"
table="TASKS">
<id
name="id"
column="TASK_NAME">
<generator class="increment"/>
</id>
<property
name="name"
column="Task_NAME"/>
<many-to-one
name="nexttask"
cascade="all"
column="SPOUSE_ID"
foreign-key="FK_NEXT_TASK"/>
</class>
</hibernate-mapping>
private String name;
private Task nexttask;
}
import persistence.*;
import java.util.*;
public class TaskExample {
public static void main(String[] args) {
// First unit of work
Session session =
HibernateUtil.getSessionFactory().openSession();
Transaction tx = session.beginTransaction();
Task firsttask = new Task("Learn Hibernate");
Long meberId = (Long) session.save(firsttask);
tx.commit();
session.close();
// Second unit of work
Session newSession = HibernateUtil.getSessionFactory().openSession();
Transaction newTransaction = newSession.beginTransaction();
List tasks = newSession.createQuery("from Task m order by m.name asc").list();
System.out.println( tasks.size() + " Task(s) found:" );
for ( Iterator iter = members.iterator(); iter.hasNext(); ) {
Tasks task = (Task) iter.next();
System.out.println( task.getName() );
}
newTransaction.commit();
newSession.close();
// Shutting down the application
HibernateUtil.shutdown();
}
}
<hibernate-configuration>
<session-factory>
<property name="hibernate.connection.driver_class">
org.postgresql.Driver </property>
<property name="hibernate.connection.url">
jdbc:postgresqll://localhost
</property>
<property name="hibernate.connection.username">
sa
</property>
<property name="hibernate.dialect">
org.hibernate.dialect.HSQLDialect
</property>
<!-- Use the C3P0 connection pool provider -->
<property name="hibernate.c3p0.min_size">5</property>
<property name="hibernate.c3p0.max_size">20</property>
<property name="hibernate.c3p0.timeout">300</property>
<property name="hibernate.c3p0.max_statements">50</property>
<property name="hibernate.c3p0.idle_test_period">3000</property>
<!-- Show and print nice SQL on stdout -->
<property name="show_sql">true</property>
<property name="format_sql">true</property>
<!-- List of XML mapping files -->
<mapping resource="mytasks/tasks.hbm.xml"/>
</session-factory>
</hibernate-configuration>
Note the use of a certain number of configuration entries:
- the Hibernate connection pool provider: here the C3PO-connection-pool-provider
- the hibernate dialect used, here the HSQLDialect
- the connection information: drivers, url and username
- the mapping file
There are a number of other possibilities to configure Hibernate.
Antipatterns and Code Problems
Tuesday, 16 September 2008
Java AWT bug on linux with XCB
When starting my java application on linux, I have the following traceback:
Locking assertion failure. Backtrace:
#0 /usr/lib/libxcb-xlib.so.0 [0xc3e767]
#1 /usr/lib/libxcb-xlib.so.0(xcb_xlib_unlock+0x31) [0xc3e831]
#2 /usr/lib/libX11.so.6(_XReply+0x244) [0xc89f64]
#3 /usr/java/jre1.6.0_03/lib/i386/xawt/libmawt.so [0xb534064e]
#4 /usr/java/jre1.6.0_03/lib/i386/xawt/libmawt.so [0xb531ef97]
#5 /usr/java/jre1.6.0_03/lib/i386/xawt/libmawt.so [0xb531f248]
#6 /usr/java/jre1.6.0_03/lib/i386/xawt/libmawt.so(Java_sun_awt_X11GraphicsEnvironment_initD
It has already been discussed in a number of forums and bug reports:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6532373, or https://bugs.launchpad.net/ubuntu/+source/libxcb/+bug/87947
One possible work around seems to be to add the following variable setting when starting the application:
export LIBXCB_ALLOW_SLOPPY_LOCK=1
Monday, 15 September 2008
Autotest - WOW they did it!!!!
- bisection...
- building
- booting
- filesystem check
- python based library to automate scripts
PXE Problems with NAS 1000
A good friend of mine gave me a NAS 1000 so that I could try a few things with it. In particular, I wanted to try PXE and diskless solutions with the installation files or disk data on the NAS server.
First I had some troubles starting the atftpd daemon because of user and group information which did not work. I should have checked the messages information right away... duh!!! It would have saved me a lot to time.
But then as I tried getting the data from my linux box using the fedora tftp client, it did not work. Well actually I am still not sure why it is not. Some routing errors obviously:
Jan 1 13:36:21 icybox daemon.info atftpd[1951]: Server thread exiting
Jan 1 13:36:26 icybox daemon.notice atftpd[1952]: Fetching from 192.168.0.104 to ess
Saturday, 13 September 2008
Syntax highlighting for the Web
Friday, 12 September 2008
Java invokeLater
A number of month ago, I took a look at the new features of Java 1.5 and 1.6. And I fell on the new java.util.concurrent package.
Whoever programmed GUI interfaces in Java is certainly aware of the importance of using thread to run in the background in order to enable the user to perform other tasks and not just wait in front of a screen which is not refreshing. Using nice runnable threads, you could have a much more responsive GUI. A typical example was things of the sort:
Thread t = new Thread(){
public void run(){
// the task to perform which requires a certain amount of time
}
};
SwingUtilities.invokeLater(t);
This technique is really fundamental to a well programmed graphical interface.
But since Java 1.5, there are a number of supplementary structures which can be used to perform tasks in parallel. And these are found in the package java.util.concurrent which will be the topic of a future entry.
Overview of Maven
Maven is a tool design to support as many task as possible for the management of a software project.
Its purpose is to provide a simple tool to achieve the following tasks:
- Builds
- Documentation
- Reporting
- Dependencies
- SCMs
- Releases
- Distribution
A number of good tutorials can be found on maven's guide page.
Archetypes:In maven there is the possibility to create archetype models of projects. This means that it is possible to create very easily new projects which have a number of templates to start with. This is related to the possibilities of RAILS.
This is performed by issueing the following command:
$ mvn archetype:create -DgroupId=com.mycompany.app -DartifactId=my-appProject Object Model: POM
There is a concept of project object model somewhat similar to the ant build files.
An example from the website (see this page):
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.mycompany.app</groupId>
<artifactId>my-app</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>Maven Quick Start Archetype</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
This model is quite easy to understand.
Project File and Directory StructureThe project file and directory structure depends on the archetype chosen to create a new project. There are means of configuring this, see: http://maven.apache.org/guides/mini/guide-creating-archetypes.html.
The build Life cycleEach possible tasks (e.g. validate, compile, package) may require other to be performed first. This means that there are dependencies between the tasks (like in ant).
Common tasks- mvn compile (compile the code)
- mvn test (test the functionalities of this project)
- mvn test-compile (compile test classes for this project)
- mvn package (package the code of this project)
- mvn clean (clean the builds and task )
- mvn site (create a template website for the project)
- mvn idea:idea (create an IntelliJ IDEA descriptor for the project)
- mvn eclipse:eclipse (create the project description files for eclipse)
There are a number of plugins which can be useful for maven. You can add them to the POM file of the project, see: How_do_I_use_plug-ins
A list of plugins can be found there.
SCM Plugin ( Source Code Management plugin)One of the many pluging is the SCM plugin which offers useful tasks/goals for interacting with a SCM
External DependenciesThere is also the possibility to configure external dependencies.
DeploymentThere are also possibilities of deployment if things are configured. For example, the created distribution can be copied and added to a repository using scp. For this, some information about user names, keys and/or passwords have to be configured.
DocumentationThere are also some thing to help in the creation of a documentation site using archetypes. See: guide-site.
Thursday, 11 September 2008
Linux Links
www.google.com/linux and co
Wednesday, 10 September 2008
Using Busybox for serving linux distributtions
A central Linux Documentation page
Tuesday, 9 September 2008
Useful appendices :-)
- remarks on C
- booting of the linux kernel
- ELF binary format
- architectures specific topics
- links on how to work with the source code
- Online Documents about Kernel
- important RFCs (TCP/IP..., Differentiated Services fields)
- GNU tool information
- ELF format
- important documentation from the kernel
Have you looked at JBOSS' projects lately ?
- JBoss application server
- rich faces
- JBoss Remoting
- hibernate (though there was already one or two entries)
- JRUNIT a JUNIT extension to test client/server applications
JBoss Remoting
Translation Lookaside buffer, aka TLB
in a few words from the wikipedia article: a CPU cache used memory management hardware to improve the speed of virtual address translation.(wikipedia).
Much information comes from this article.
The idea is that CPUs keep an associative memory to cache page table entries (PTEs) of virtual pages which were recently accessed.
When the CPU must access virtual memory, it looks up in the TLB for a number corresponding to the entry to obtain.
If an entry was found (a TLB hit), then the CPU can use the value of of the PTE which accessed and calculate the physical address.
In case it was not found (a TLB miss), then depending on the architecture, the miss is handled:
- through hardware, then the CPU tries to walk the page table and find the correct PTE. if one is found the TLB is updated, if none is found then the CPU raises a page fault, which is then treated by the operating system.
- through software, then the CPU raises a TLB miss fault. The operating system intercepts the miss fault and invoke the corresponding handler, which walks the page. if the PTE is found, it is marked present and the TLB is updated. if it not present, the page fault handler is then in charge.
Mostly, CISC (IA-32) use hardware, while RISC (alpha) use software. IA-64 uses an hybrid approach because the hardware approach is faster but less flexible as the software one.
Replacement policy
If the TLB is full, some entries must be replaced. For this depending on the miss handling strategy, different strategies and policy exist:
- Least recently used (aka LRU)
- Not recently used (aka NRU)
Ensure coherence with page Table
Another issue is to keep the TLB coherent with the page table it represents.
Monday, 8 September 2008
Nice little tool isoinfo
$ isoinfo -i isofilesystem.iso -J -x /filetobeextracted > filereceivingtheextracteddata
Nice!Sunday, 7 September 2008
Kudos to helpwithpcs.com
Wednesday, 3 September 2008
Read Copy Update
Read Copy Update (aka RCU) is another synchronisation mechanism in order to avoid reader writer locks.
An excellent explanation can be found at the LWN.net in three parts by Paul McKenney and Jonathan Walpole:
The basic idea behind it is that when a resource is modified, a new updated structure is put in its place and the old structure is not discarded right away, it waits until references to this structure by other processes are dropped. It can be seen as similar to the concept of garbage collection, but as noted in What is RCU? Part 2: usage, the old structure is not discarded automatically when there are no references any more and the programmer must indicate the critical read portions of the code.
There is an interesting page on the RCU argueing that this technique is used more and more in the kernel as a replacement for the reader writer locks.
Tuesday, 2 September 2008
Kernel Locking mechanisms
An important aspect of programming in an environment with threads and processes is to prevent the different processes to interfer with the functionalities of other processes at the wrong time.
In linux, a number of methods are used to ensure that the data or code section of processes is not disturbed by others. These methods are:
- atomic operations
- spinlocks
- semaphore
- reader writer locks
These locks and mechanisms are in the kernel space. Other methods or locking mechanisms are used in the user space.
atomic operations
The idea behind atomic operations is to perform very basic changes on variable but which cannot be interfered by other processes, because they are so small. For this, special data type is used called: atomic_t.
On this data type, a number of atomic operations can be performed:
function | description |
---|---|
atomic_read(atomic_t *v) | read the variable |
atomic_set(atomic_t *v, int i) | set the variable to i |
atomic_add(int i, atomic_t *v) | add i to the variable |
atomic_sub(int i, atomic_t *v) | substract i to the variable |
atomic_sub_and_test(int i, atomic_t *v) | substract i to the variable, return true value if 0 else return false |
atomic_inc(atomic_t *v) | increment the variable |
atomic_inc_and_test(atomic_t *v) | increment the variable, return true value if 0 else return false |
atomic_dec(atomic_t *v) | decrement the variable |
atomic_dec_and_test(atomic_t *v) | decrement the variable, return true value if 0 else return false |
atomic_add_negative(int i, atomic_t *v) | add i to the variable, and return true if its value is negative else false |
Note that I discussed in another post the local variables for CPUs.
spinlocks
This kind of locking is used the most, above all to protect sections for short periods from access of other processes.
The kernel checks continuously whether a lock can be taken on the data. This is an example of busy waiting.
spinlocks are used in the following way:
spinlock_t lock = SPIN_LOCK_UNLOCKED;
...
spin_lock(&lock);
/** critical operations */
spin_unlock(&lock);
Due to the busy waiting, if the lock is not released... the computer may freeze, therefore spinlocks should not be used for long times.
semaphores
Unlike linux spinlocks, the kernel sleeps while waiting for the release of the semaphore. Contrary to spinlocks, this kind of structure should only be used for locks which have a certain length, while for short locks using linux spinlocks is recommended.DECLARE_MUTEX(mutex);
....
down(&mutex);
/** critical section*/
up(&mutex);
The waiting processes then sleep in an uninterruptable state to wait for the release of the lock. The process cannot be woken up using signals during his sleep.
There are other alternatives to the down(&mutex) operation:- down_interruptible(&mutex) : the process can be woken up using signals
- down_trylock(&mutex): if the lock was successful then the process goes on and does not sleep
For the user space, there are also futexes.... But this is another story.
reader writer locks
Using this kind of locks, processors can read the locked data structure but when the structure is to be written the structure can only be manipulated by one processor at a time.
Monday, 1 September 2008
GIT tutorial
>git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6
2/ pulling new code from git:> git pull
3/ reverse code changes> git checkout -f
4/ commiting the modifications:> git commit -a
5/ undo last commits (note it is different from a revert which consists of a patch reverting some other patch)> git reset HEAD~2
6/ list branches> git branch
7/ create branch> git checkout -b my-new-branch-name master
8/ choose a branch and make it the current one:> git checkout branch
9/ Tell which branch is current> git status
10/ merging code into a branch mybranch> git checkout mybranch
> git merge anotherbranch
Friday, 29 August 2008
Linux Kernel Mailing List FAQ
Linux Kernel: Copy on Write
Copy on Write is an important technique in the linux kernel memory management. The basic idea is to prevent the creation of unnecessary copies of structures when creating new processes. For this, when a child process is created, the kernel first provides only readable access to both parent and son. If one of the process need writable access, at the time it needs to write data in memory, the kernel throws a page fault indicating that a new copy should be created for the asking process. Thus, the data is actually copied only when the process actually tries to write.
Some information is of course available on Wikipedia on the Copy on Write page. Note that there is an acronym COW for Copy on Write.
Wednesday, 27 August 2008
Memory Management Documentation in Linux Kernel
Wednesday, 23 July 2008
Customer orientation
- Information level (what do we know of customers, which information collected, how are they organized, and how can they be used in processes: marketing, production..)
- Customer level
- quality of products,
- of services,
- flexibility of services realization,
- qualification of salesperson as well as their
- flexibility,
- reliability and
- friendliness,
- treatment of
- sales
- complaints
- interaction between customers and employees
- personal interaction with customer
- knowledge of the customer (needs, expectation, desires)
- check customer satisfaction
- problem solving suggestions
- customer oriented organization
- customer oriented employees
Friday, 18 July 2008
Online C++ Documentation
C library
A number of headers are present in the standard C++ library, for example:- asserts
- types
- errors
- floats
- Standard definitions
- localization
- limits of standard types
- maths
- C I/O library functions
- C strings
- C time
- C standard libraries
- Jump routines to preserve calling properties and environment ***
- and handling of variable arguments
I/O Streams
There is a number of headers existing for the C++ stream operations as well as a number of possible classes for all these headers information. The main reference page has a nice picture summarizing the relationship between these classes.string header
The string headers is useful to represent characters sequence. It contains a number of operations which are described on a special page for the string operation topics. A nice aspect is also that these pages have examples. So it's quite nice to learn how to use them correctly.Standard Template Libraries
One of the nice aspect of this page was for example the page on the Standard Template Library. It presents all the complexities of the different operation on the diverese containers which can be used with these structures. Dividing the different containers in three groups: sequence containers, container adapters and associative containers.<algorithm>
But there is also one I was not aware of: the algorithm header, which provides a number of standard algorithms which are designed to work on ranges of elements. For example, it contains search, merge, sorting, heap, min/max as well as operation to modify sequences and structures (swapping, copying, partitioning).Supplementary headers
There are 4 other headers which do not belong to the other above categories:Friday, 11 July 2008
Spanning Tree Protocol
Wednesday, 9 July 2008
Personal Projects
- Web 2.0 gallery
- data mining web site
- data mining experiments
- data warehousing
- information extraction tool
- ontology learning
- linux kernel learning
- inference engine technology
- fca implementation
- database implementation aspects
Thursday, 3 July 2008
Open Data Sources - presented By UK Governement
Tuesday, 1 July 2008
Lguest - simple virtualization mechanism
Weka Online
It seems that there is now the possibility to use Data Mining online using Weka. A company, CEO delegates, has built a web site where some arff files can be loaded as well as the corresponding data mining algorithms chosen.
It was actually partly one of the things I planed to do. But there are still lots of supplementary things which can be done. So let's see what happens.
Sunday, 29 June 2008
Git-bisect instructions
After reading a little bit of a long thread on testing Bugs for the linux kernel, there was a small HOWTO for running bissects of the linux kernel.
I write it again here in order to make sure, it is easier to find:
# install git
# clone Linus' tree:
git clone \
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
# start bisecting:
cd linux-2.6
git bisect start
git bisect bad v2.6.21
git bisect good v2.6.20
cp /path/to/.config .
# start a round
make oldconfig
make
# install kernel, check whether it's good or bad, then:
git bisect [bad|good]
# start next round
After at about 10-15 reboots you'll have found the guilty commit
("... is first bad commit").
More information on git bisecting:
man git-bisect
$ git bisect start v1.3 v1.1 -- # v1.3 is bad, v1.1 is good
$ git bisect run ~/test.sh
For more information on this a good explanation is found at this page.
Moreover, there is also the possibility to restrict the search for versions which are good or bad which had a change in a given part of the repository. For example, you may know that the bug is found in a certain subdirectories. Then you can specify these directories:
$ git bisect start -- mydirectory1 mydirectory2/directory3
Friday, 27 June 2008
Ketchup - Linux Kernel Automatic Patcher
Thursday, 26 June 2008
Learning Method
- 2.4 Improving the access of secondary storage
- 2.4.1 I/O model of computation
- 2.4.2 Sorting data on secondary storage
- 2.4.3 Merge Sort
- 2.4.4 Two-phase, multiway merge sort
- 2.4.5 extension of multiway merging for larger relations
Friday, 20 June 2008
Combining Data Mining Models
- Bagging
- Bagging with costs
- Randomization
- Boosting
- Additive regression
- Additive logistic regression
- Option trees
- Logistic model trees
- Stacking
- Error correcting output codes
Bagging
The principle of Bagging is to let create a number of models for a training set, and use the class returned the most frequently for a specific instance for each of these models. In other word, it is important that the different model return the same set of possible class output.
Bagging with costs
This extension of the Bagging approach uses a cost model. It is particularly useful when the predictions made by the models used in the bagging provide probabilities telling how likely is to be exact.
Randomization
The idea here is to introduce some kind of randomization in the model creation in order to create different models. Depending on the stability of the process, a certain class can then be chosen as prediction.
Boosting
Similarly to the bagging approach, boosting tries to create models as a kind of cascade, each model is built with the purpose of classifying better the instances which have not been suitably classified by previous models. This is a type of forward stagewise additive modelling
Additive regression
The additive regression is alse a kind of forward stagewise additive modelling which is suitable for numeric prediction with regression. Here again the principe is to use a serie of regressions which try to classify better the elements which were incorrectly classified.
Additive logistic regression
This type of regression is an adapation of the previous combination approach but for logistic regression.
Option trees
I still have to describe this but the concept is quite simple
Logistic model trees
I still have to describe this but the concept is quite simple
Stacking
The purpose of stacking is to combine different types of models which might not have the same labels. In order to achieve this, a meta learner is introduced. A number of models are built for the data. A meta learner, i.e a model which decides from the learning output of other learners, created in order to classify and adapt to all the models from the first phase.
Error correcting output codes
I still have to describe this but the concept is quite simple
Of course, all these mechanisms have been implemented in Weka.
Thursday, 19 June 2008
FishEye - Visualisation of Subversion
Tuesday, 17 June 2008
Transaction Models
- Flat transactions (supported by J2EE)
- Nested Transactions (supported by J2EE)
- Chained Transactions
- Sagas
Flat transactions
Flat transactions consists of a series of operations which should be performed as one unit of work, and return true or false depending on whether the operation failed or not. The reasons causing a transaction to abort may be for instance:- some invalid parameters have been given to the components
- some system state which should not have changed, was violated
- Hardware or software failure
Nested transactions
Contrary to flat transactions, nested transactions allow to have units of work consisting of other atomic units of works. Some of the embedded unit of work may roll back without forcing the entire transaction to roll back. In that way, failed transactions can be performed again. Note that subtransactions of transactions may also be nested transactions.Chained transactions
In the chained transaction model, new transactions are started automatically for a program when the previous one has been commited or aborted. This is not implemented by j2ee.Long Running Transactions - Sagas
Sagas, or Long-Running transactions are transactions which may take a long time to finish and may last for days either because they wait for an external event, or they wait for the decision of one of the actors involved in the transaction. This type of transaction usually occurs in a web service context (see here). This is not implemented by j2ee.Wednesday, 11 June 2008
Kernel KnowHow: Per CPU Variables
I just read the section on per CPU variables from the Linux Device Drivers book from O'Reilly (TODO link).
The use of per CPU variable helps improving performance by limiting the cache sharing between CPUs. The idea is that each CPU has his own private instance of a given variable.
A variable can be defined using the following macro: DEFINE_PER_CPU(type,name);
It is important to note that a kind of locking mechanism is still needed for the CPU variables in case:
- the processor would move the process to another cpu
- the processor would be preemted in the middle of a modification of the variable
Therefore, each instance should be updated using a simple mechanism to lock the variable during the changes. This can be done by using the get_cpu_var() and put_cpu_var() functions, for example:
get_cpu_var(variabletoupdate)++;
put_cpu_var(variabletoupdate);
It is also possible to access the variable values from other processors using: per_cpu(variable, int cpu_id); but a locking mechanism must be implemented for these cases.
Memory allocation for dynamically allocated per-CPU variables is performed using:
void *alloc_percpu(type);
void *__alloc_percpu(size_t size , size_t align);
Deallocation is achieved using: free_percpu().
Accessing a dynamically allocated per-CPU variable is performed by using
per_cpu_ptr(void * per_cpu_var, int cpu_id).It returns a pointer to the variable content. For example,
int cpu;
cpu= get_cpu();
ptr = per_cpu_ptr(per_cpu_var, cpu);
put_cpu();
Note: the use of the method to free the cpu from the lock.
Tuesday, 3 June 2008
Kernel Index page from LWN.net
Saturday, 31 May 2008
Alsa library musings
I have taken a look at the alsa library tonight.
There seems to be a few fun things one can do. I started to use the small test program: pcm.c from the alsa-lib source directory.
I looked at the code. I have not changed anything yet. But the program is small but shows many aspects of how to interact with pcm streams and drivers.
So one thing I did was to use the small program to create a little scale playing: some thing like:
for a in 1 1.25 1.33 1.5 1.66 1.75 2 do \
./pcm -f `echo $a * 440|bc`& pid $! \
& (sleep 1; kill $pid) \
done
Of course, the numbers 1 1.25 1.33 1.5 1.66 1.75 2 are approximation of some of the notes of a scala. I have been to lazy to look for the exact numbers.
This little test is really not much, and it certainly does make clear how to use the alsa function libraries, but it's fun. Nevertheless, after looking at the code, I have the feeling that I am going to have much fun in the future with playing with sound.
The possibility of combining both my musical theory musings together with my programming is quite a wonderful feeling.
Some of the ideas I have:
- find the different instruments harmonic series and implement a small tool to be able to play some instrument
- implement some composition mechanisms:
- arpegios
- Travis picking
- combinations of instruments
- combine with language ?
Friday, 30 May 2008
SMP and NUMA architectures
I have had a look at the SMP and NUMA architectures which are for multiprocessor systems.
I found a number of interesting resources:
SMP means symmetric multiprocessor while NUMA stands for Non-Uniform Memory Access. Actually SMP is a type of UMA, i.e Uniform Memory Access. Basically, SMP processor share all the same memory, this imposes some overhead on the cache of the processor to speed up the processing. NUMA architectures (of which different flavor exist, in particular cache-coherent models, aka ccNUMA) have a non uniform access to memory, some processor have access to local caches. In that way, the processes do not need to synchronize to access the data.
Thursday, 29 May 2008
Mail Spam Report and Strategies
Wednesday, 28 May 2008
Time Managment - Decision Making Techniques
Well... Right now my time management is not so bad as long as I don't have to do anything. But it is still important to remind oneself of the different time management techniques. So I think this blog should be a good way of making sure of this. At least, I will always have this to go back to.
One of the things I always have to keep an eye, it is certainly the decision making process. For this there are few techniques which exist (I looked this up more than once... but the following site should be suitable):
- pareto analysis
- six thinking hats
- grid analysis
- cost-benefit analysis
- decision trees
- force field analysis
- paired comparison analysis
- pmi
Monday, 26 May 2008
Kernel Makefile
In this post, I sum up the main Makefile parameters and targets, right now it mainly corresponds to $make help but I might edit this entry to add useful information.
First of all use (if in the directory of the kernel sources)
$ make helpor if you are not in the directory of the kernel sources (then located in
$ make -Chelp
This more or less gives the following information.
This outputs a list of possible target and supplementary information:
-
a few variable can be set
-
ARCH=um ... for the corresponding architecture
V=1 > means verbose build
V=2 > gives reason for rebuild of target
O=dir > is the output directory of the build including .config file
C=1 or C=2 > checking (resp force check) of c sources
-
Documentation
- make [htmldocs|mandocs|pdfdocs|psdocs|xmldocs] ->>> build the corresponding docs
-
Packages
- make rpm-pkg > allows to build src and binary rpm packages
- make binrpm-pkg > allows to build binary rpm packages
- make deb-pkg > allows to build deb packages
- make tar-pkg > allows to build uncompressed tarball
- make targz-pkg > allows to build gzipped compressed tarball
- make tarbz2-pkg > allows to build bzip2 compressed tarball
-
Cleaning targets:
- clean - Remove most generated files but keep the config and enough build support to build external modules
- mrproper - Remove all generated files + config + various backup files
- distclean - mrproper + remove editor backup and patch files
-
Kernel Configuration targets:
- config - Update current config utilising a line-oriented program
- menuconfig - Update current config utilising a menu based program
- xconfig - Update current config utilising a QT based front-end
- gconfig - Update current config utilising a GTK based front-end
- oldconfig - Update current config utilising a provided .config as base
- silentoldconfig - Same as oldconfig, but quietly
- randconfig - New config with random answer to all options
- defconfig - New config with default answer to all options
- allmodconfig - New config selecting modules when possible
- allyesconfig - New config where all options are accepted with yes
- allnoconfig - New config where all options are answered with no
-
Other useful targets:
- prepare - Set up for building external modules
- all - Build all targets marked with [*]
- * vmlinux - Build the bare kernel
- * modules - Build all modules
- modules_install - Install all modules to INSTALL_MOD_PATH (default: /)
- dir/ - Build all files in dir and below
- dir/file.[ois] - Build specified target only
- dir/file.ko - Build module including final link
Note that there are also some little things about tags for editors, but I am not so sure what it really brings.
Linux Standard Base
Yes I like standards... Standards are great... Of course you should not exaggerate it, but yeah standard base are a good thing for linux.
So take a look at the Linux Standard Base ( aka LSB). It is a specification of what rules all linux distribution should respect.
For example, it specifies the executable and linking format: ELF, as well as specifying a number of useful libraries: libc, libm, lipthread, libgcc_s, librt, libcrypt, libpam and libdl. Some util libraries are also specified: libz, libncurses, libutil.
It also specifies a number of commandline command ( see the standard on this subject)
Linux Config Archive
Sunday, 25 May 2008
init scripts
For a few things I am interested in doing, I wanted to be able to have a small script preparing as soon as I boot up. Perhaps it is more interesting to use atd or cron for this but I wanted to make sure how the initscript system works.
So I prepare a small script in order to start some system tools as soon as the boot process is finished.
For example, a little tool starting a remote process when I first boot which would allow me to use some remote processing facilities, e.g (focused) crawler. This could be also some system starting before/after the httpd daemon is up.
For this I took a look at the /etc/init.d/postgres script.
#!/bin/sh # newscript This is the init script for starting up the newscript
# service
#
# chkconfig: - 99 99
# description: Starts and stops the newscript that handles \
# all newscript requests.
# processname: mynewscript
# pidfile: /var/run/mynewscript.pid
# Version 0.1 myname
# Added code to start newscript
Note the use of chkconfig: - 99 99 .
This should be adapted with more useful priorities, basically 99 means that the initscript is started as one of the last scripts. Taking a look at $man chkconfig should prove useful.
The new script stores the pid of the newscript application in /var/run/mynewscript.pid
Note that it also stores things in /var/lock/subsys/Maintainers File in Kernel and SCM tools for the kernel
I have just had a look at the maintener file in the linux kernel Tree.
I have noticed that there are a number of orphaned project. The question is whether any of these orphaned project really needs to be taken care of.
Another interesting thing was to learn about the different scm and patching tools used in kernel development: git, quilt and hg.
Here is an interesting overview of the reason for the development of git and quilt.
I really start to like the patch approach, and the article linked above gives a good idea of the reasons to use this approach. I should try to summarise in a future post the advantages and disadvantages of the different source code management approaches.
Kernel Stuff
I did some little things with kernel programming (or more compiling) these days.
Part of the things I did were compiling kernel because I wanted to try UML (user mode linux).
So that's what I did:
- Download the kernel configs from: http://uml.nagafix.co.uk/kernels/.
- Download kernels from kernel.org.
- untar the kernels to some directory
- cd into the main directory of the kernel
- copy the config of the kernels into main directory as .config file
- $ make ARCH=um oldconfig
- answered the necessary questions as good as I could
- $ make ARCH=um
At that point some errors appeared, so I tried to correct them.
- to help me in the debugging process I used $ make V=1 ARCH=um
- when I had some things that did not work well I used the gcc output to call it right away. For example, sometimes the architecture files would not be found right so I used -Iinclude sometimes a precompiler marks was not set correctly so I used -D__someprecompilermarks__. At some point I removed some problematic definition by using this together with a #ifndef in the header file. $ gcc ..... -Iinclude -D__someprecompilermarks__ ...
- then I also downloaded a few kernel repositories using git, though I still need to perfect this.
- I read (or skipped/read) quite a few Documentation files from the kernel or from the internet.
- I familiarised myself with the git web interface, this together with having a kernel RSS feed in my thunderbird.
And all this in one day and half together with other things.
Saturday, 24 May 2008
Fedora and chkconfig
Sunday, 18 May 2008
Naemi and RPM
As I tend to forget pretty often the syntax of the query format of rpm, I had a short look at the internet. I found this web page which explains some aspects of the use of the rpm -q command.
I discovered a command which I did not know and which might turn out quite useful: namei. It follows the symbolic links until their end point can be found.
Example:
$> namei /usr/java/latest/
f: /usr/java/latest/
d /
d usr
d java
l latest -> /usr/java/jdk1.6.0_03
d /
d usr
d java
d jdk1.6.0_03
Friday, 9 May 2008
AjaxMP - a small Music server
I noticed that the mpd server must be configured in another way if you want to publish it to other nodes in the network.
For this:- install mpd
- set password and port for mpd
- add music to the mpd directory
- extract the tar from ajaxmp in a php enabled web server
- copy config.php.dist to config.php
- edit the mpd parameters: host, port and password
- copy user.txt.example to user.txt
- add for each person needing access a username and a password separated by tabs
Apache Problem With IP resolution
One of the apache rule for the resolution of API seems to be that addresses of request cannot be numeric.
I commented the rule out... But I should take a look whether there is no better solution.
To find the rule posing the problem I looked at the logs in: /etc/httpd/logs/error_log.
There was a line:
[Fri May 09 02:09:51 2008] [error] [client xxx.xxx.x.xxx] ModSecurity: Access denied with code 400 (phase 2). Pattern match "^[\\\\d\\\\.]+$" at REQUEST_HEADERS:Host. [id "960017"] [msg "Host header is a numeric IP address"] [severity "CRITICAL"] [hostname "xxx.xxx.x.xxx"] [uri "/ajaxmp"] [unique_id "BFUWMX8AAAEAAA8ewlgAAAAC"]
I then did a grep:
$> grep 960017 /etc/httpd/modsecurity.d/*.conf /etc/httpd/modsecurity.d/modsecurity_crs_21_protocol_anomalies.conf:SecRule REQUEST_HEADERS:Host "^[\d\.]+$" "deny,log,auditlog,status:400,msg:'Host header is a numeric IP address', severity:'2',id:'960017'"I had found the rule causing the problem and commented it out. I hope there is a beeter solution, perhaps a better rule ???
Thursday, 17 April 2008
Ubuntu Printer Canon IP1700 Pixma using IP1800 driver
Friday, 14 March 2008
java problem with browser
Exception in thread "main" java.lang.UnsatisfiedLinkError: /usr/java/jre1.6.0_03/lib/i386/libdeploy.so: libstdc++.so.5: cannot open shared object file: No such file or directoryI installed the compatibility libraries ( after calling yum provides libstdc++.so.5. Once installed, java applets were working just fine.
Friday, 7 March 2008
Ubuntu Installation
Yesterday, I installed a Ubuntu system on the laptop of a friend of mine, whose windows got bugged by a virus and who could not boot any more.
The laptop of my friend is a Acer Travelmate 2420. There seems to be a number of good experiences with this laptop under linux so I went ahead and tried the installation.
Moving ext3 impossible
The installation went without real problems, apart that as I tried to update ubuntu there was enough place on the hard disk (well there may have been in the end if I had cleaned the cache of the previous updates. I will have to take a look at this). Once I noticed that I tried to resize the root partition. To do this, I resized the home partition. This was no problem. But as I tried to move the ext3 partition, I noticed that it did not work. So I had to reinstall the system another time.
Printer
The other problem I had was installing the printer. She has a Canon PIXMAP IP 1700 and there is no direct driver for this printer. It seems that another driver works: the IP2200 driver. Yet I haven't tried whether it works yet. I wanted to make sure that it does not ruin her printer.
DVD Burner
I will also have to make sure her DVD burner works without problem.
Antivirus
She also asked me to install an antivirus which updates itself. I used for this the antivirus which is accessible from ubuntu (I think it is the clamav). And she tried to install another one from the internet antivir, but I do not know if that works.
Friday, 22 February 2008
User Mode Linux- Mounting File Systems
- hostfs,
- humfs.
- Make the directory and cd to it
host% mkdir humfs-mount host% cd humfs-mount
- the directory hierarchy which should be available to the user mode linux is in a subdirectory called "data". It is possible to use an existing UML filesystem for this, information on this are found at the previously cited source.
- As root, run the humfsify utility to convert this directory to the format needed by the UML humfs filesystem:
host# humfsify user group 4G
Thursday, 21 February 2008
Kernel - User Mode Linux
host% make defconfig ARCH=um host% # now make menuconfig or xconfig if desired host% make menuconfig ARCH=um host% make ARCH=umNote the importance of the ARCH=um parametrisation. The menuconfig is useful for further configuration of your kernel.
Kernel Programming - WORKQUEUE Problems
Install firefox plugins only once for all users
Ohloh - an open source software network
Quality Management
ISO 9001
DIN ISO 9001 is an important norm for quality. It comes from production companies and remains somewhat generic. The purpose of this norm is to define how the quality should be taken care of.
Important for the ISO 9001 is that the quality is defined on the result of the processes and not on their implementation. For example, it requires that systems be documentated, but does not specify how this documentation should be implemented.A number of quality management norms have been integrated in the ISO 9001, and many other system are getting closer to the norm.
This norm defines that the quality management responsabilities lie directly by the directing managers.
ISO 20 000
ISO 20 000 defines a norm and is very similar to the ITIL. A company can be certified for this norm through a serie of audits.Six Sigma
Six Sigma is a very strict way of checking the quality of the processes of an enterprise. For this, there is a central instance as well as many other mechanism to check in all processes the quality of the whole system. ITIL Terminology ITIL uses its own terminology to discuss the quality management of a company.Wednesday, 20 February 2008
Firefox extensions
- chrome ->
- content ->
- - extensionname.js
- - extensionname.xul
- - overlay.xul
- locale->
- en-US->
- - extensionname.dtd
- - extensionname.properties
- - overlay.dtd
- skins->
- en-US->
- content ->
- components->
- defaults->
- preferences->
- - chrome.manifest
- - install.rdf
JGroups - Reliable Multicasting
Friday, 11 January 2008
Spring Aspect Oriented Programming
- the weaving process of the Spring framework is performed at runtime.
- the Spring framework uses proxies to implement the aspect orientation. These proxies come in two flavors, either a JDK dynamic Proxy (?) or a CGLIB proxy (see the web site of CGLIB for this).
- no field point cut (use aspectJ for this)
- the spring AOP does not implement the complete AOP, but mainly the most important aspects of AOP needed for the Inversion of Control approach of the spring framework for enterprise applications
Differences between JUnit 3.8 and JUnit 4
import static org.junit.Assert.*; in the import section of your test file. Moreover, from JUnit 4.0 on there are also method to compare arrays of objects.
Initialisation and cleaning: The initialisation and cleaning were performed using setUp() and tearDown() methods. In version 4.0, it is not possible anymore since the class does not extend TestCase. To solve this, new Annotations are used: @Before and @After. Note that there are also the annotations: @BeforeClass and @AfterClass which are the methods which are called before loading the class for test and after all the tests have been performed. Tests Tests are annotated using the @Test annotation and must also return void and may not have any parameters. These properties are checked at runtime and issue exceptions if these rules are not respected. Ignoring Tests It is possible to ignore a given test by using the @Ignore annotation before or after the @Test annotationPerforming Test One performs test using the following call: $
java –ea org.junit.runner.JUnitCore
where
is the java complete name of the test class. Timeouts It is also possible to use a Timeout parameter for the test methods Parametrised Tests It is also possible to apply the same test but with different parameters. For this, the annotation @Parameters may be used together with the class annotation @RunWith(Parameterized.class) Suite Like for the preceding version, there is also the possibility to use Suites of Tests. For this the annotations @RunWith(Suite.class) and @Suite.SuiteClasses({FirstTestClass.class, SecondTestClass.class}) The article I used to write this entry states the lack of support in IDE for the new JUnit 4.0 version. But I suppose that this changed in the latest versions of eclipse.