Des Profundis...: January 2008

Friday 11 January 2008

Spring Aspect Oriented Programming

The Spring framework has many features for aspect orientation. I do not define here the usual concepts of aspect oriented programming: join points, cut points, advice, weaving... A few important aspects:

the weaving process of the Spring framework is performed at runtime.
the Spring framework uses proxies to implement the aspect orientation. These proxies come in two flavors, either a JDK dynamic Proxy (?) or a CGLIB proxy (see the web site of CGLIB for this).
no field point cut (use aspectJ for this)
the spring AOP does not implement the complete AOP, but mainly the most important aspects of AOP needed for the Inversion of Control approach of the spring framework for enterprise applications

Differences between JUnit 3.8 and JUnit 4

There is a number of changes between these two versions of JUnit. For backward compability. Old Test method should still work I believe. I will sum up which are the features which changed. package names: version 3.8 used the junit.framework packages whereas 4.0 uses org.junit packages. Inheritance: the Test classes do not have to inherit from TestCase anymore. However, the new JUnit approach makes a stronger use of annotations. Assert changes: In order to use asserts with JUnit, you can either use the Assert.assertEqual(...) ( or similar methods) or import staticly the Assert class ( from Java 1.5 on) using:

import static org.junit.Assert.*;

in the import section of your test file.

Moreover, from JUnit 4.0 on there are also  method to compare arrays of objects.

Initialisation and cleaning:

The initialisation and cleaning were performed using setUp() and tearDown() methods. In version 4.0,
it is not possible anymore since the class does not extend TestCase. To solve this, new Annotations are
used: @Before and @After.

Note that there are also the annotations: @BeforeClass and @AfterClass which are the methods which
are called before loading the class for test and after all the tests have been performed.

Tests

Tests are annotated using the @Test annotation and must also return void and may not have any parameters.
These properties are checked at runtime and issue exceptions if these rules are not respected.

Ignoring Tests
It is possible to ignore a given test by using the @Ignore annotation before or after the @Test annotation


Performing Test

One performs test using the following call:
$ java –ea org.junit.runner.JUnitCore 

where  is the java complete name of the test class.

Timeouts

It is also possible to use a Timeout parameter for the test methods

Parametrised Tests

It is also possible to apply the same test but with different parameters. For this, the annotation
@Parameters may be used together with the class annotation @RunWith(Parameterized.class)

Suite

Like for the preceding version, there is also the possibility to use Suites of Tests. For this the
annotations @RunWith(Suite.class) and @Suite.SuiteClasses({FirstTestClass.class, SecondTestClass.class})

The article I used to write this entry states the lack of support in IDE for the new JUnit 4.0 version. But I suppose
that this changed in the latest versions of eclipse.

Maven

Once I had been using ant, and then I heard about Maven. But what am I supposed to think about Maven. To use maven, we first need to create the basic skeleton of the content of the project. This is performed using in the parent directory of the future project directory: $ mvn archetype:create -DgroupId=org.jaycode.jdream -DartifactId=jdream We can now enter the project directory: $ cd jdream It creates diverse files and directories, in particular a pom.xml, which is a file describing the project and main application and test files. From there we can start coding. But a number of task can be performed (here an extract from the maven tutorial):

validate : validate the project is correct and all necessary information is available
compile : compile the source code of the project
test : test the compiled source code using a suitable unit testing framework. These tests should not require the code be packaged or deployed
package : take the compiled code and package it in its distributable format, such as a JAR.
integration-test : process and deploy the package if necessary into an environment where integration tests can be run
verify : run any checks to verify the package is valid and meets quality criteria
install : install the package into the local repository, for use as a dependency in other projects locally
deploy : done in an integration or release environment, copies the final package to the remote repository for sharing with other developers and projects.

Another interesting feature is the possibility to adapt the pom file for eclipse:

$ mvn eclipse:eclipse

which creates the .classpath and .project files necessary for an eclipse project. One of the very interesting feature of maven is the possibility of creating template projects to simplify the creation of whole projects. It allows for example the possibility to go in the direction of rails some how.

MBeans and JMX

I am not completely convinced from what I read about MBeans and JMX. The idea is that (application) resources are managed by MBeans. This MBeans have the necessary means for sending and receiving notifications. This level is the instrumentation level. These MBeans are registered in the MBean server ( a kind of container object) as well as certain Agent services. This is the agent level, and the server or the agents are called JMX agents. A third level, the distributed service level, corresponds to the middleware that connect applications and the JMX agents. This level is composed of protocol adapters which allow the management of the agents by translating protocols and connectors which are kinds of proxies to the agents which can be manipulated by applications. Metadata provided by resources A number of pieces of informations are needed by the JMX architecture to manage resources.

attributes about the state of the resource
constructors to create instances of the resource
operations which can be performed on the resources
parameters necessary for constructors and operations
notifications which are emitted by the resource

MBeans There are different types of MBeans:

Standard MBeans (they implement a very simple interface)
Dynamic MBeans (the meta data is provided separately from the resource)
Model MBeans (Model MBeans are dynamic MBeans with supplementary model information)
Open MBeans (another type of MBean extending the OpenType class allowing extremely complex MBeans)

JMX notifications Though most JMX agents work by collecting and querying the information they are supposed to manage, means of broadcasting changes are sometimes needed. For this, a broadcasting mechanism exists where agents are separated in two roles: notification broadcasters and notification listeners. the MBean Server The MBean server is a registry through which agents access the MBeans.

Behavioral Pattern

Gamma et al have also described a number of behavioral patterns:

Chain of responsability (a request goes through an ordered chain of potential Workers, the appropriate worker determines that he has to treat the request, in that way the requester does not need to know any thing about the receiver of the request)
Command (a request is wrapped in to an object in order to allow the parametrisation, the queueing, logging or cancellation of the request)
Interpreter ()
Iterator (a mean of accessing sequentially the elements of a aggregate object without revealing its internal structure)
Mediator (an object which encapsulate the way different object interact with each others)
Memento (method to capture and externalise the structure of an object to restore it later)
Observer (an object offers the possibility of registering a number of observers which are notified as soon as some state is reached)
State (the behavior of an object is dependant on the internal state of the object )
Strategy ( define a number of interchangeable algorithms which can be used for a variety of clients and situations)
Template Method ( defines a super class which provides a skeleton of the different steps which have to be performed in an algorithm, the subclasses of this superclass implement specifically these parts)
Visitor (define operation which have to be performed on some objects of a structure and store the code for this operation in a separate object)

Structural Patterns

We now turn to the Structural patterns proposed by Gamma et al.

Adapter (a class or interface created in order to adapt an object for another implementation)
Bridge (the abstraction is separated from the implementation of a classso that they can evolve independantly)
Composite (structures are organised in tree structures in order to model part-whole relationships but still offer a somewhat similar mean of interaction)
Decorator (supplementary functionalities are attached to an object dynamically)
Facade (provide a higher level interface thus simplifying its use)
Flyweight (in order to support a large amount of fine-grained objects resources are shared)
Proxy (an object serving as surrogate for another object in order to simplify the control of the original object)

Creational Patterns

The Gamma book cites five creational patterns:

Abstract factory (This provides an abstract way to ensure a common method of creation of objects, through an interface that all the factories have to implement)
Builder (Creates an object which is specialisedin the construction of a complex from its representation)
Factory Method (define an abstract method for creating object but let subclasses decide of the details of the implementation and instantiation)
Prototype (objects are first created using a prototype instance which is copied in order to create new instances of this type)
Singleton (a class may have only one instance, and some method to get this instance is clearly defined and unambiguous)

Design Patterns

Since I have read the Gamma et al. book on design patterns, I tend to see most of the programming techniques as a set of design patterns which one combines with one another. There are many useful design patterns and the categorisation given in the Gamma book is at least useful to get an overview of these relevant design patterns:

creational patterns
structural patterns
behavioral patterns

The description of each of the design patterns follows a simple structure:

Intent (succint description of the intent of the design pattern)
Other Names (other names which have been given to this pattern)
Motivation (gives a description of the type of problem for which this pattern can be used)
Applicability (gives a summary of the main guidelines when to use the pattern)
Structure (gives a UML description of the overall structure of the pattern)
Participants ( a description of the main elements or classes used in the structure)
Collaboration (describes how the elements are used in cooperation with the others)
Consequences (the main consequences of the use of the pattern)
Implementation (a description of how to implement the pattern in a sensible way)
Sample Code ( some sample code of the use and of the implementation)
Known Uses (some example where this pattern has been used effectively)
Related Patterns ( explains the relation this pattern has to other patterns)

Thursday 10 January 2008

Java NIO - Channels

Channels are the other most important part of the NIO API.

A channel is a communication channel, that is a pipe of data.

The Channel interface has two methods: isOpen() and close() ( the latter may throws an IOException). Most Channels are specified using Interfaces. For example, there is the InterruptibleChannel interface which is a marker interface specifying how the channel behaves when a thread accessing it is interrupted.

There are different types of Channels.

ReadableByteChannel and WritableByteChannel (Interfaces which define how to read from or write in a channel using a ByteBuffer)
GatheringByteChannel and ScatteringByteChannel (interfaces defining methods to read from or write to a channel using an array of ByteBuffers)
InterruptibleChannel ( interface defining the behaviour of the channel when the thread using it is interrupted)
FileChannel (class for the interaction with a Channel based on a File)
SocketChannel and ServerSocketChannel, and DatagramChannel ( classes for the communication based on sockets)

Channels cannot be opened directly. The method used to create a Channel object depends on the type of Channels. For example FileChannel are opened using Streams or RandomAccessFiles.

ChannelsUtilityClass There is a Utility Class called java.nio.channels.Channels, which has a number of utility methods to convert between the old and the new API. There are two Channel creating methods: newChannel(InputStream) and newChannel(OutputStream) which return a ReadableByteChannel and a WritableByteChannel respectively, and there are the newInputStream(ReadableByteChannel ch) and newOutputStream(WriteableByteChannel ch) which return InputStream and OutputStream respectively. The same kind of methods exist also for Readers and Writers: newReader and newWriter methods return the appropriate classes.

FileChannel FileChannel have different facets. Using the utility class Channels we can open FileChannel instances. Note that Channels opened from InputStreams are only readable while Channels from OutputStreams are only writable.

A number of methods are important for FileChannels. The force() methods writes to the disk the data which is actually cached ( for performance reasons). The truncate() methods gives a way to cut off all the data which is further than a given data. The methods position() and positions(long positions) are methods to decide at which part of the file the Channel should go.

Pipe Another important class is the Pipe class in general it is a conduit through which data can only go in one direction. Each pipe has a SourceChannel and a SinkChannel. The information written in the SinkChannel (a WritableChannel) is then transmitted to the SourceChannel (a ReadableChannel).

MemoryMappedFiles This type of Buffer and Channels are a far reaching topic so we treat them separately. SocketChannels SocketChannels are also a separate topic so we treat them also separately. Locks The topics of locks will also be covered in its own entry.

Java NIO - Buffers

We will start the explanation of the Java NIO main classes by taking a look at buffers.

Buffers

The Buffers contain a fixed amount of data (obtained using the capacity() method). They have a number of useful methods, for example: mark, reset, clear, flip and rewind. mark remenbers a position to which on can come back by using reset.

Access methods for the buffers are the get() and put() methods (these can be absolute or relative, i.e either one specifies the position in the buffer to set or retrieve or one gets or retrieve the current element).

There are one type of Buffer for each of the main non Boolean primitive types ( plus one called the MappedByteBuffer used when using memory-mapped files). All these Buffer types are abstract but possess a factory for that type of Buffer in order to create one. Different methods exist for example in the case of a CharBuffer: allocate( int capacity), or wrap(char[] array ) which return CharBuffers either of the given capacity or containing the elements found in the array. Note: the changes to the underlying array will be seen in the buffer and vice versa. This type of buffer have a non direct access. In other words they have an array used to contain the data. using the method hasArray() allows you to check whether this Buffer is non direct or not.

A special case has to be explained that direct buffer are actually of type ByteBuffer and are created using the allocateDirect(in capacity) method. The ByteBuffer class also provides the possibility to create View Buffers. This allows the possibility of obtaining instances of other types of Buffers, e.g CharBuffer or IntBuffer without taking care of the problem of conversion. This is performed for example using the asCharBuffer() method.

Many methods of the Buffer classes return also buffer references. Thus you can perform sequence of actions on the buffer in a simple way. This is called invocation chaining.

Other methods can also create views on other buffers. These methods allow for example the duplication of a buffer (using the duplicate() method), its view as a read only buffer (using the asReadOnlyBuffer() method), or a view of only part of this buffer (using the slice()) method).

Java NIO - Intro

Here is an entry on the topic of Java NIO in order to keep in mind the main aspects of these API. First of all, Java NIO stands for Java New I/O. There are a number of useful classes in this new API:

buffers (they represent fixed size arrays of primitive data elements wrapped with supplementary state information)
Channels (a model for a communiation connection)
File locking and memory mapped files (provides classes for monitoring locking of files as well as the possibility of mapping files to memory thus improving the process)
sockets (these classes provies new methods of interacting with network sockets)
selectors (this allows the possibility to watch the status of different channels to monitor them more easily
regular expressions (Perl like regular expressions for processing of Java)
character sets (the mappings between characters and byte streams)

Models for Enterprise Architectures

According to Wikipedia articles there exist a number of models for Enterprise Architectures.

The Open Group Architecture Framework (TOGAF)
Zachman Framework (IBM Framework from the 1980s, now claimed as a de facto world standard)
Department of Defence Architectural Framework
Federal Enterprise Architecture Framework (United States Office of Management and Budget Federal Enterprise Architecture)
United Kingdom Ministry of Defence Architectural Framework
AGATE (French Délégation Générale pour l'Armement Atelier de Gestion de l'ArchiTEcture des systèmes d'information et de communication)

I think taking a look at these different modelling approaches may prove helpful in order to discuss more easily company structures as well as technical aspects.

GIT

The original developer of Linux has built another versioning system in order to help his linux kernel maintenance job. This tool is git. The Karlsruher Linux User Group has a presentation summing up the features of this distributed versioning system which I am going to use. Git repositories seem also to be smaller in size than their equivalent repositories under CVS or SVN. The presentation gives an example for gcc, which is 450 MB with git while 11GB using SVN. One of the important feature is that this system is distributed. Only few commands are needed, and help is available: > git command -h gives a short help while > git command --help gives a manual page Initialisation is performed using the following commands: > git config --global user.name "Your Name" > git config --global user.email your@e.mail Initalisation of a new repository is performed using: > git init in the location of the repository (i.e. > cd intherepositorydirectory beforehand) To clone an existing repository: > git clone giturl [dir] where giturl is the url of the repository (with http, git, ssh, user@host:repository, dir are acceptable as protocols). See git clone --help for the possibilities. Optimisations are even possibles using hardlinks and softlinks or reuse existing informations. To prepare commits, useful commands: status and diff. I don't need to explain why they are here for. To add elements to the repository, the add command is necessary which also adds directories recursively. changes are added with git-add. Commits are performed using the command: > git commit -m ’message’ Of course, it is possible to tell which files should be commited. This is done by using -- filenames at the end of the commandline. Also take a look at git commit -a -m ’message’. There is also a menu controlled used. for this use: > git commit --interactive -m ’message’ To check the code out, use: > git checkout [-m] -b newbranch [head] you can also list the branches, using git branch, or change branch: > git checkout branch or delete branches: git branch -d name. There is also the possibility to add remote repositories using: > git remote add name giturl where name is the name of the remote repository and giturl the url of the remote repository. you can fetch the changes using: > git fetch [name] [refspec] where name is the name of the remote repository and refspec is the mapping: +srcbranch:dstbranch you can merge changes using > git pull [name] [refspec] There is also > git merge but its use is infrequent. Other features are the publication of changes using mails or using the git server. There are also commands for the maintenance: > git fsck makes concistency checks on the repository and > git gc compresses the repository. Some other features exist like: signed tags, find bugs semi automatically, ...

Monday 7 January 2008

Google Web Toolkit

The Google Web Toolkit (GWT) is a framework used to develop Webapplications. It works by transforming the Java in Javascript. Moreover, GWT has the following additional features:

Interfaces for RPC Calls
Integration with JUnit
Lots of Widgets for GUI Design (similar to Swing in the way of its use

It seems that GWT uses mainly Java to develop the application and theat therefore the typical Java development tools can be reused for the purpose

AJAX

Thsi is a beginning of a long serie on the technologies which make AJAX. The basic principle of AJAX is making Web pages more responsive by allowing more complex interaction with the servers without requiring the reloading of a page. The name AJAX means asynchronous Javascript with XML. Two main important applications:

load pages more quickly through an asynchron loading of the page(i.e the whole page is not loaded immediately).
depending on the actions of the user, appropriate data can be loaded without requiring the reloading of the page

Most applications require for this task the use of XMLHttpRequest Objects.

StAX- Streaming API for XML in Java

A number a way exist to access XML Data from Java. The DOM and SAX API are the two most traditional ones. A new alternatives under the form of the Streaming API for XML. This explanation uses the informations found in this tutorial.

The principle behing this API it to give the programmer the control over the next element of the file to be treated. This simplifies things greatly when there are many elements.

There are mainly two sets of APIs with the StaX API: cursor and iterator APIs.

Cursor API The cursor walks the documents one infoset element at a time and always forward. For this there are two interfaces: XMLStreamReader and XMLStreamWriter. These interfaces are very similar to the SAX handlers.

Iterator API

The Iterator API works on the idea, that the XML documents can be seen as a sequence of Events. Each of these Events are of a given type: StartDocument, StartElement, Attribute, ....

Differences. First of all, there are some things that are possible with the Iterator API which are not possible with the Cursor API. However, with the cursor API the code tends to be smaller and more efficient.

For processing events pipelines or modifiying the event stream use the iterator API.

In general, it is preferable to use the Iterator API, since this API is more easily extensible, thus it probably requires less refactoring for future versions of the applications.

Use of StaX APIs

The use of the StaX APIs is performed using the XMLInputFactory, XMLOutputFactory and XMLEventFactory classes. These factory classes are configured using the setProperty() method.

Sunday 6 January 2008

JAXB

Another interesting Java technology pushed by Sun is JAXB, it stands for "Java Architecture XML Binding". Most of the information found here comes from Sun's JAXB tutorial. It is a means of coding Java Object into XML using the descriptive power of the XML Schemas. It is possible to generate Java Object from XML descriptions using a Schema. To simplify the process, it is also possible to use JAXB annotations of Java classe in order to generate the schemas needed for such schemas. So the important steps to consider when using this is:

write the classes of the objects which should be stored as XML files, or DOM nodes, etc...
annotate them so that the corresonding schema can be generated
generate content according to schema
unmarshal the content (i.e load into content tree in Java)
validate if required
marshall, i.e export again as XML

This page gives a list of (XML)Schema-Java correspondance tags. As specified, the binding can be generated either using annotations inside the Java files, or it is also possible to use external configuration files outside these files.

Data Mining

This is the beginning of a series of post on the topics of Data Mining and Knowledge Discovery or KDD. There exists a large amount of resources on these topics. I plan to draw a picture of the possible open source solutions for Data Mining. I also want to recall which are the main applications of Data Mining. Open Source Systems The main open source systems for Data Mining are:

Pentaho (used to be called WEKA)
Rapid Miner (used to be called YALE, some parts of the systems were using Weka)
Orange

for Text Mining, it is certainly important to add: the GATE system. I will review each of these systems. Supervised and unsupervised tasks One of the main distinction for algorithms and problems in Data Mining is whether the task to be performed is supervised or unsupervised. An algorithm is supervised if the goal of the algorithm is to create a model which is appropriate to categorise data according to an existing model. On the contrary, an unsupervised task consists in building a model which is appropriate to describe the data. For example, suppose we have a list of topics and news items which should be categorised in this list of topic. This is a typical supervised task. If we had no topics and were supposed to group the news items together according to some similarity criteria. This would be a unsupervised task. N.B: This entry does not really please me yet and is subject to change.

Des Profundis...