Research - Documentation

  
Contents
  

Introduction

TunedIT is an integrated system for sharing, evaluation and comparison of machine-learning (ML) and data-mining (DM) algorithms. Its aim is to help researchers and users evaluate learning methods in reproducible experiments and to enable valid comparison of different algorithms. TunedIT serves also as a place where researchers may share their implementations and datasets with others.

Motivation

Designing new machine-learning or data-mining algorithm is a challenging task. The algorithm cannot be claimed valuable unless its performance is verified experimentally on real-world datasets. Experiments must be repeatable, so that other researchers can validate their results. Unfortunately, in ML/DM repeatability is very hard to achieve. Reproducing someone else's experiments is a highly complex, time-consuming and error-prone task. At the end, when final results occur to be different than expected, it is completely unclear whether the difference invalidates claims of original author of experiment, or rather should be attributed to:

  1. implementation bugs of the new experiment,
  2. mistakes in data preparation or experimental procedure,
  3. non-deterministic behaviour of the algorithm, producing different results in every run, or
  4. seemingly irrelevant differences between the original and new implementation, e.g., the use of slightly different data types.

Usually, it is not possible to resolve this issue. The problem lies in the nature of ML/DM algorithms...

In classical algorithmics, algorithms either work correctly or not at all. They cannot be "partially" correct. Correctness is a binary feature: either the algorithm satisfies the required specification or not. And if not, it is always possible to find a single "counterexample" or "witness" - a particular combination of input values - which prove incorrectness of the algorithm. For instance, if we implemented quicksort and the implementation contained a bug, we could notice for some particular input data that the output is incorrect, since the generated sequence would be improperly sorted.

In ML/DM, there is no such thing like "incorrect algorithm". The algorithm can be "more" or "less" correct, but never "incorrect". We all assume as a basic axiom of ML/DM that algorithms may occasionaly make mistakes. It is just impossible to design an ML/DM algorithm that is always right when tested on real-world data. Thus, wrong answers for some input samples do not invalidate the whole algorithm. If we had a classifier trained to recognize hand-written digits and passed an image of "7" but the answer was "1", we would not start looking for implementation bugs, but rather presume that the input pattern was vague or atypical.

If the algorithm is always "correct", we do not have any indications of implementation bugs. There are no "witnesses" that would clearly prove incorrectness and point in the direction where bugs are hidden. Even if the experimenter presumes that something is wrong, he has no clues where to start investigation. For these reasons, reimplementing and reproducing someone else's experiments is practically impossible. The researcher may never be sure whether the experiment is reproduced correctly, with all important details done in the same way as originally, without any implementation bugs nor mistakes in experimental procedure.

If experiments are not repeatable, verification of experimental results is impossible, so it is easier to design a new algorithm than to verify results of an existing one. In consequence, there are thousands of competing algorithms for every type of ML/DM problem, but no general consensus over their actual quality, strengths and weaknesses of each of them. This makes the quest for better ones difficult, if not blindfold. Cogent illustration of these paradoxes can be found in Empiricism Is Not a Matter of Faith (Pedersen, 2008).

Here comes TunedIT. With its creation we want to give ML/DM community the tools that will help conduct reproducible research and obtain meaningful results, leading to formulation of generally-accepted conclusions.

We want to make experiments fully repeatable through their automation with TunedTester - the automation going side by side with flexibility and extendibility, provided by the plug-in architecture of TunedTester and its ability to handle entirely new evaluation procedures, designed for new types of tasks and algorithms.

We are creating a collaboration environment for researchers, where general consensus over performance of different algorithms could arise. The central point of this environment is Knowledge Base (KB), where all researchers can submit experimental results and build together rich and comprehensive database of performance profiles of different algorithms. The results stored in KB are repeatable and verifiable by everyone. KB is coupled with public Repository of ML/DM resources: algorithms, datasets, evaluation methods and others. Repository secures interpretability of results collected in KB and fosters exchange of data, implementations and ideas among researchers.

Finally, with development of these tools, we want to facilitate design of even more advanced and effective algorithms, able to solve numerous practical problems unsolvable today.

TunedIT builds upon previous efforts of scientific community to facilitate experimentation and collaboration in ML&DM. In particular, it employs and extends the ideas which lain in the basis of:

ExpDB: Experiment Databases for Machine Learning,
MLOSS: Machine Learning Open Source Software,
DELVE: software environment for evaluation of learning algorithms in valid experiments.

TunedIT combines strengths of these systems to deliver comprehensive, extendible and easy-to-use platform for ML&DM research.

Architecture

TunedIT platform is composed of three complementary tools:

  • TunedTester: a stand-alone application for automated evaluation of algorithms.
  • Repository: a database of ML&DM resources. These include algorithms, datasets and evaluation procedures, which can be used by TunedTester to set up and execute experiments.
  • Knowledge Base: a database of test results. On user's request, TunedTester may send results of tests to TunedIT. Here, results submitted by different researchers are merged into rich and comprehensive Knowledge Base that can be easily browsed for accurate and thorough information on specific algorithms or datasets.


TunedIT = Repository + TunedTester + Knowledge Base

Repository

Repository is a database of ML&DM-related files - resources. It is located on TunedIT server and is accessible for all registered users - they can view and download resources, as well as upload new ones. The role of Repository in TunedIT is three-fold:

  • It serves as a collection of algorithms, datasets and evaluation procedures that can be downloaded by TunedTester and used in tests.
  • It provides space where users can share ML&DM resources with each other.
  • It constitutes a context and point of reference for interpretation of results generated by TunedTester and logged in Knowledge Base. For instance, when you are browsing KB and viewing results for a given test specification, you can easily navigate to corresponding resources in Repository and check their contents, so as to validate research hypotheses or come up with new ones. Thus, Repository is not only a convenient tool that facilitates execution of tests and sharing of resources, but - most of all - secures interpretability of results collected in Knowledge Base.

Repository has similar structure as a local file system. It contains a hierarchy of folders, which in turn contain files - resources. Upon registration, every user is assigned home folder in Repository's root folder, with its name being the same as the user's login. The user has full access to his home folder, where he can upload/delete files, create subfolders and manage access rights for resources. All resources uploaded by users have unique names (access paths in Repository) and can be used in TunedTester exactly in the same way as preexisting resources.

Access rights

Every file or folder in Repository is either public (by default) or private. All users can view and download public resources. Private files are visible only to the owner, while to other users they appear like if they did not exist - they cannot be viewed nor downloaded and their results do not show up at KB page. Private folders cannot be viewed by other users, although subfolders and files contained in them can be viewed by others, given that they are public themselves. In other words, the property of being private does not propagate from a folder to files and subfolders contained inside.

Tunedtester

TunedTester (TT) is a Java application for automated evaluation of algorithms, according to test specification provided by the user. Single run of evaluation is called a test or experiment and corresponds to a triple of resources from Repository:

  • Algorithm is the subject of evaluation.
  • Dataset represents an instance of a data mining problem to be solved by the algorithm.
  • Evaluation procedure is a Java class that implements all steps of the experiment and, at the end, calculates a quality measure.

Evaluation procedure is not hard-wired into TunedTester but is a part of test configuration just like the algorithm and dataset themselves. Every user can implement new evaluation procedures to handle new kinds of algorithms, data types, quality measures or data mining tasks. In this way, TunedTester provides not only full automation of experiments, but also high level of flexibility and extendability.

TT runs locally on user's computer. All resources necessary to set up a test are automatically downloaded from Repository. If requested, TT can submit results of tests to Knowledge Base. They can be analysed later on with convenient web interface of KB.

Resources for TunedTester

All TunedIT resources are either files, like UCI/hepatitis.arff, or Java classes contained in JAR files, like

Weka/weka-3.6.1.jar:weka.classifiers.lazy.IB1

Typically, datasets have a form of files, while evaluation procedures and algorithms have a form of Java classes. For datasets and algorithms this is not a strict rule, though.

To be executable by TunedTester, evaluation procedure must be a subclass of

org.tunedit.core.EvaluationProcedure

located in TunedIT/core.jar file in Repository. TunedIT/core.jar contains also ResourceLoader and StandardLoader classes, which can be used by the evaluation procedure to communicate with TunedTester environment and read the algorithm and dataset files. It is up to the evaluation procedure how the contents of these files is interpreted: as bytecode of Java classes, as a text file, as an ARFF, CSV or ZIP file etc. Thus, different evaluation procedures may expect different file formats and not every evaluation procedure must be compatible with a given algorithm or dataset. This is natural, because usually the incompatibility of file formats is just a reflection of more inherent incompatibility of resource types. There are many different types of algorithms - for classification, regression, feature selection, clustering - and datasets - time series, images, graphs etc. - and each of them must be evaluated differently anyway. Nonetheless, it is also possible that the evaluation procedure supports several different formats at the same time.

Data file formats and algorithm APIs that are most commonly used in TunedIT and are supported by standard evaluation procedures include:

  • ARFF file format for data representation. Introduced by Weka, it became one of the most popular in ML community.
  • Debellor's API defined by org.debellor.core.Cell class for implementation of algorithms.
  • Weka's API defined by weka.classifiers.Classifier class.
  • Rseslib's API defined by rseslib.processing.classification.Classifier interface.

It is also possible for a dataset to be represented by a Java class, the class exposing methods that return data samples when requested. This is a way to overcome the problem of custom file formats. If a given dataset is stored in atypical file format, one can put it into a JAR file as a Java resource and prepare a wrapper class that reads the data and returns samples in common representation, for example as instances of Debellor's Sample class. This wrapper approach was used to give access to MNIST database of hand-written digits, which is originally stored in a custom binary representation. See some results of classification accuracy on MNIST 10K collected in KB.

Test specification

Test specification is a formal description for TunedTester of how the test should be set up. It is a combination of three identifiers (TunedIT resource names) of TunedIT resources which represent an evaluation procedure, algorithm and dataset that will be employed in the test:

Test specification = Evaluation procedure + Algorithm + Dataset

TunedIT resource name is the full access path to the resource in Repository, as it appears on Repository page. It does not include a leading slash "/". For example, the name of file containing Iris data and located in UCI folder is:

UCI/iris.arff

Java classes contained in JARs are also treated as resources, although they do not show up on Repository pages. TunedIT name of a Java class is composed of the containing JAR's name followed by a colon ":" and full (with package) name of the class. For instance, ClassificationTT70 class contained in TunedIT/base/ClassificationTT70.jar and org.tunedit.base package has the following name:

TunedIT/base/ClassificationTT70.jar:org.tunedit.base.ClassificationTT70

Note that resource names are case-sensitive.

Many algorithms expose parameters that can be set by the user to control and modify their behavior. Currently, test specification does not include values of parameters, and thus it is expected that the algorithm will apply default values. If the user wants to test an algorithm with non-default parameters he should write a wrapper class which internally invokes the algorithm with parameters set to some non-default values. The values must be hard-wired in the wrapper class, so that the wrapper itself does not expose any parameters.

Sandbox

Users of TunedTester may safely execute tests of any algorithms present in Repository, even if the code cannot be fully trusted. TunedTester exploits advanced features of Java Security Architecture to assure that the code executed during tests do not perform any harmful operation, like deleting files on disk or connecting through the network. Code downloaded from Repository executes in a sandbox which blocks the code's ability to interact with system environment. This is achieved through the use of a dedicated Java class loader and custom security policies. Similar mechanisms are used in web browsers to protect the system from potentially malicious applets found on websites.

Local cache

Communication between TunedTester and TunedIT server is efficient thanks to the cache directory which keeps local copies of resources from Repository. When the resource is needed for the first time and must be downloaded from the server, its copy is saved in the cache. In subsequent tests, when the resource is needed again, the copy is used instead. In this way, resources are downloaded from Repository only once. TunedTester detects if the resource has been updated in Repository and downloads the newest version in such case. Also, any changes introduced to the local copies of resources are detected, so it is not possible to run a test with corrupted or intentionally faked resources.

Challenge mode

TunedTester may be started in a special challenge mode, used to evaluate solutions submitted to a challenge. In this mode, TT repeatedly queries TunedIT for new submissions, then downloads and evaluates them. It runs as a background process and does not require user's interaction. Challenge mode is activated by passing the challenge name as an argument to command-line option --challenge (-c) when starting TT. The user must be the organizer of the challenge and must give his TunedIT username and password using --user (-u) and --password (-p) options to authenticate himself. If authentication fails, TT will have no access to challenge resources.

In challenge mode, GUI is not available and TT reports its current operations to the console.

It is possible to run more than one instance of TT for a given challenge in parallel. This is particularly useful when evaluation of a single solution is time-consuming, e.g., lasts more than an hour. With parallel execution, the queue of pending tests becomes shorter.

Different instances of TT running in parallel are independent of each other. The organizer may start a new instance or stop a given one at any time. Job scheduling is coordinated by TunedIT server. The instances may run on the same machine or on different ones. When running several instances on a single machine, take into account that sharing of hardware resources (CPU time, memory limit) may lead to variable evaluation conditions for different tests.

Knowledge Base

Knowledge Base (KB) is a database of test results generated by TunedTester. It is located on TunedIT server.

To guarantee that results collected in KB are always consistent with the contents of Repository and that Repository can serve indeed as a context for interpreration of results, when a new version of resource is uploaded, KB gets automatically cleaned out of all out-dated results related to the old version of the resource. Thus, there is no way to insert results into KB that are inconsistent with the contents of Repository.

Aggregated vs atomic results

Atomic result is the result of a single test executed by TunedTester. It is possible to execute many tests of the same specification and log all their results in KB. Thus, there can be many atomic results present in KB which correspond to the same specification. Note that usually these results will differ among each other, because most tests include nondeterministic factors. For instance, ClassificationTT70 and RegressionTT70 evaluation procedures split data randomly into training and test parts, which yields different splits in every trial and usually results in different outcomes of the tests. Algorithms may also employ randomness. For example, neural networks perform random initialization of weights at the beginning of learning.

Aggregated result is the aggregation (arithmetic mean, standard deviation etc.) of all atomic results from KB related to a given test specification. There can be only one aggregated result for a given specification. Aggregated results are the ones which are presented on Knowledge Base page. Currently, users of TunedIT do not have direct access to atomic results.

If tests of a given specification are fully deterministic, they will always produce the same outcome and thus the aggregated result (mean) will be the same as all atomic results, with standard deviation equal to zero. The presence of nondeterminism in tests is highly desirable, as it allows to obtain broader knowledge about the tested algorithm (non-zero deviation measures how reliably and repeatably the algorithm behaves) and more reliable estimation of expected quality of the algorithm (mean of multiple atomic results which are different between each other).

Security issues. Validity of results

The user may assume that results generated by others and collected in KB are valid, in a sense that if the user runs the same tests by himself he would obtain the same expected results. In other words, results in KB can be trusted even if their authors - unknown users of TunedIT - cannot be trusted. This is possible thanks to numerous security measures built into Repository, TunedTester and KB, which ensure that KB contents cannot be polluted neither by accidental mistakes nor intentional fakery of any user.

User's Guide

Repository page

Every file and folder in Repository may have an associated description, visible on Repository page of a given resource. Description can be modified only by the owner.

TunedTester

TunedTester can be downloaded at this page.

TunedTester runs tests of algorithms according to test specifications given by you. Specification is composed of TunedIT resource names of: an evaluation procedure, a dataset and an algorithm that should be used to set up the test. It is possible to give several test specifications at once, by listing a number of datasets and/or algorithms in text areas of TunedTester window. In such case, TunedTester will run tests for all possible combinations of the given items.

In order to download necessary resources from Repository or send results to Knowledge Base you must be authenticated by TunedIT server. For this purpose you must give your username and password in TunedTester window before starting test execution.

TunedTester creates a cache folder in the local file system to keep copies of resources downloaded from Repository. This folder may become large at some point and require manual cleaning. To do this, you should simply remove the folder with all its contents - it will be automatically recreated with empty contents upon next execution of TunedTester. Cache folder is named tunedit-cache and is located in user's home directory.

Knowledge Base page

KB page shows aggregated results of tests collected in KB. In section Filters you can specify which results you want to view, by defining a pattern that must be matched by test specifications of the results. The pattern is built as a conjunction of patterns for each part of test specification: name of algorithm, dataset and evaluation procedure. Empty pattern will match all possible names. After the filters are defined, press "Show Results" to download matching results from TunedIT server. Please be patient, this operation may take a couple of seconds. When downloaded, the results are presented in section Results, where you can manipulate them and change the way how they are presented without downloading them again.

Important: the exact meaning of "Mean Result" depends on what evaluation procedure was used. Result value can be interpreted either as gain or loss, so for some evaluation procedures it is the bigger value which indicates higher quality of the algorithm, while for others it is the lower. For instance, ClassificationTT70 measures classification accuracy of an algorithm, interpreted as gain, while RegressionTT70 calculates Root Mean Squared Error (RMSE), interpreted as loss. These differences must be taken into account when analysing results of tests. In order to find out how the results should be interpreted for a given evaluation procedure, it is best to go to its Repository page and read the description.

Filters

If "exact match" check box is on, pattern matching is case-sensitive. Please watch carefully for the case of letters.

Chart

The chart presents mean results of all algorithms that were tested on the selected dataset using the selected evaluation procedure. You can choose another dataset and evaluation procedure using drop-down lists located above the chart. If you place the mouse over a bar on the chart, a tooltip will show up in the upper-left corner of the window, displaying detailed information about the selected test.

Raw Results

Meaning of columns of the result table:

  • Evaluation procedure, Dataset, Algorithm: specification of tests whose aggregated result is presented in a given row.
  • Support: number of all atomic results stored in KB for a given test specification and contributing to the presented aggregated result. Only the tests which were correctly completed (without error) are counted.
  • Mean Result: aggregated result - arithmetic average of the atomic results.
  • Std Dev: aggregated result - standard deviation of the atomic results.

Names of evaluation procedures, algorithms and datasets are hyperlinks which lead to Repository pages of the resources, so you may click the name and see all the details of a given resource.

You can sort result tables by any column, in ascending or descending order, by clicking on the header of a chosen column.

You can download the results as CSV files for off-line analysis, by clicking on the [download as CSV] link located rigth above the result table.

Examples

In the following examples we assume that there is user 'John_Smith' registered in TunedIT and his password is 'pass'.

Example 1 - run test with TunedTester

Screenshot below [click to enlarge] shows how to evaluate J48 algorithm (decision tree induction) from Weka with TunedTester. We use here the default TunedTester's evaluation procedure, ClassificationTT70. Test will be repeated 5 times on each of two data sets, audiology.arff and iris.arff from UCI. Remember that in TunedTester you must give full names of algorithms, evaluation procedures and datasets, including their paths in Repository, as well as full package names for Java classes.

Example 2 - a classification algorithm suitable for ClassificationTT70 evaluation procedure

Default TunedTester's evaluation procedure is ClassificationTT70 (samples are randomly shuffled before splitting with ratio 70/30 into train and test sets). It was designed to support three kinds of classifier interfaces - these defined in Debellor, Rseslib and Weka libraries. We will show how to write an algorithm which may be then evaluated by the ClassificationTT70 procedure.

Debellor interface

Debellor is an open source extensible data mining framework which provides common architecture for data processing algorithms of various types. The algorithms can be combined together to build data processing networks of large complexity. The unique feature of Debellor is data streaming, which enables efficient processing of large volumes of data. Data streaming is essential for providing scalability of the algorithms. See www.debellor.org for more details.

We will do everything step by step but if you find yourself run out of patience a sligtly modified version of the below example is available and ready for evaluation in the repository - see MajorityClassifier_debellor.jar from the Examples folder.

To be able to compile our Debellor based classifier we will need a copy of the library [Debellor library download link]

Let's quote some important fragments of the Debellor's Cell class:

package org.debellor.core;

(...)

/**
 * Guidelines for writing new cells
 * 
 * To implement new data processing algorithm, you have to write a subclass of Cell and override some or all of protected 
 * methods named "on...": onLearn(), onOpen(), onNext(), onClose(), onErase(). They are called during calls to similarly 
 * named public methods (learn, open, ...). If you do not need some method, leave its default implementation, which 
 * will throw exception when called. 
 * 
 * If your cell represents a decision system (classifier, clusterer etc.), the most important methods will be onLearn() 
 * and onNext(). Training algorithm of the decision system will be implemented in onLearn(), while onNext() will perform 
 * application of the trained system to the next input sample. You will also have to override onOpen() and onClose() 
 * to open and close input stream before and after calls to onNext(). Optionally, you may also override onErase() to erase 
 * trained decision model without deallocation of the whole cell.
 */

(...)
 
public class Cell {

	(...)	

	/** 
	 * Learning procedure of the cell. For example, may train the internal decision model; read and buffer input data; 
	 * calculate an evaluation measure of another cell; calculate data-driven parameters of a preprocessing algorithm
	 * (e.g. attribute means for normalization algorithm) etc.
	 * 
	 * Must be overridden in all subclasses that implement trainable cells.	If your cell is not trainable, you must 
	 * provide this information to the Cell base class by calling Cell(boolean) instead of Cell() in your constructor.
	 */
	protected void onLearn() throws Exception (...)

	/** 
	 * Called by erase(). Must be overridden in subclasses if erasure is to be used. 
	 */
	protected void onErase() throws Exception (...)

	/** 
	 * Called by open(). Must be overridden in subclasses if open is to be used. 
	 */
	protected MetaSample onOpen() throws Exception (...)

	/** 
	 * Called by Stream.next(). Performs the actual generation of the next output sample. Must be overridden in the 
	 * subclass if next is to be used, i.e. if the subclass should generate some output data.
	 */
	protected Sample onNext() throws Exception (...)

	/** 
	 * Called by Stream.close(). Performs the actual closing of the communication session. Must be overridden 
	 * in subclasses if close is to be used. Usually the overrider will use onClose to release resources,
	 * to let them be garbage-collected.
	 */
	protected void onClose() throws Exception (...)

	(...)	

}

A source code of our simple classifier:

import java.util.*;
import org.debellor.core.*;
import org.debellor.core.data.SymbolicFeature;

/**
 * Example implementation of a majority classifier in Debellor architecture.
 * The classifier assigns always the same decision - most frequent in training data.
 */
public class MajorityClassifier extends Cell {

	private DataType decisionType;
	private SymbolicFeature decision;
	private Stream input;
	
	public MajorityClassifier() {
		super(true);	// yes, this cell is trainable
	}

	protected void onLearn() throws Exception {
		// Open stream of training samples. Check if data type is correct
		Stream input = openInputStream();
		decisionType = input.sampleType.decision;
		if(decisionType.dataClass != SymbolicFeature.class)
			throw new Exception("MajorityClassifier can handle only symbolic decisions");
		
		Map<String, Integer> counts = new HashMap<String,Integer>();
		
		// Scan all training samples and count occurences of different decisions.
		Sample s;
		while ((s = input.next()) != null) {
			if (s.decision == null) 
				continue;
			SymbolicFeature symb = s.decision.asSymbolicFeature();
			Integer count = counts.get(symb.value);
			if (count == null) 
				count = 0;
			counts.put(symb.value, count + 1);
		}
		input.close();
		
		// Find decision with the biggest count.
		int bestCount = 0;
		String bestDecision = null;
		for (Map.Entry<String, Integer> stats : counts.entrySet()) {
			if (stats.getValue() > bestCount) {
				bestDecision = stats.getKey();
				bestCount = stats.getValue();
			}
		}
		decision = new SymbolicFeature(bestDecision, decisionType);
	}

	protected Sample.SampleType onOpen() throws Exception {
		input = openInputStream();
		return input.sampleType.setDecision(decisionType);
	}
	
	protected Sample onNext() throws Exception {
		Sample s = input.next();
		if(s == null) return null;
		return s.setDecision(decision);
	}

	protected void onClose() throws Exception {
		input.close();
	}

	protected void onErase() throws Exception {
		decisionType = null;
		decision = null;
	}

}

Copy the above code and save it as a MajorityClassifier.java file. Place it in the same directory as the already downloaded debellor<version>.jar file. Then compile and pack the classifier into the jar archive:

javac -cp debellor<version>.jar MajorityClassifier.java
jar cf DebellorMajorityClassifier.jar MajorityClassifier.class
		

If everything went well you shoud have DebellorMajorityClassifier.jar file in the current directory.

Now you can upload the classifier into the repository e.g. John_Smith/Classifiers folder:

Evaluate its accuracy on some data sets using TunedTester GUI - refering to our classifier (if you followed our example) by John_Smith/Classifiers/DebellorMajorityClassifier.jar:MajorityClassifier

Or using command line:

./tunedtester.sh -g -s -u John_Smith -p pass -d UCI/iris.arff 
	-a John_Smith/Classifiers/DebellorMajorityClassifier.jar:MajorityClassifier

And finally you can inspect its results in context of other algorithms' results:

Rseslib interface

As previously - prepared source code may be dowloaded from the repository's Examples folder.

To be able to compile a Rseslib based classifier we will need a copy of the library from the repository - [Rseslib library download link]

Let's quote Rseslib's Classifier interface:

package rseslib.processing.classification;

(...)

public interface Classifier {

    /**
     * Assigns a decision to a single test object.
     */
    public abstract double classify(DoubleData dObj) throws PropertyConfigurationException;

    /**
     * Calculates statistics.
     */
    public abstract void calculateStatistics();

    /**
     * Resets statistics.
     */
    public abstract void resetStatistics();

}

We will implement constructor and classify method only:

import java.util.Properties;
import rseslib.processing.classification.Classifier;
import rseslib.structure.data.DoubleData;
import rseslib.structure.table.DoubleDataTable;
import rseslib.system.*;
import rseslib.system.progress.Progress;

/**
 * Example implementation of a majority classifier (using Rseslib architecture) which assigns always the same decision - 
 * the most frequent decision in the training set.
 */
 
public class MajorityClassifier extends ConfigurationWithStatistics implements Classifier {

	private double decision;

	public MajorityClassifier(Properties prop, DoubleDataTable trainTable, Progress prog) 
			throws PropertyConfigurationException, InterruptedException {
		super(prop);
		prog.set("Training the majority classifier", 1);
		int[] decDistr = trainTable.getDecisionDistribution();
		int bestDecision = 0;
		for (int dec = 1; dec < decDistr.length; dec++)
			if (decDistr[dec] > decDistr[bestDecision])
				bestDecision = dec;
		decision = trainTable.attributes().nominalDecisionAttribute().globalValueCode(bestDecision);
		prog.step();
	}

	public double classify(DoubleData dObj) {
		return decision;
	}

	/**
	 * Leaving it empty.
	 */
	public void calculateStatistics() {}

	/**
	 * Leaving it empty.
	 */
	public void resetStatistics() {}

}

Copy the above code and save it as a MajorityClassifier.java file. Place it in the same directory as the already downloaded rseslib<version>.jar file. Then compile and pack the classifier into the jar archive:

javac -cp rseslib<version>.jar MajorityClassifier.java
jar cf RseslibMajorityClassifier.jar MajorityClassifier.class
		

If everything went well you shoud have RseslibMajorityClassifier.jar file in the current directory.

As previously upload the classifier's jar file into the repository John_Smith/Classifiers folder and evaluate its accuracy on some data sets using either TunedTester GUI or command line - refering to the classifier by John_Smith/Classifiers/RseslibMajorityClassifier.jar:MajorityClassifier

./tunedtester.sh -g -s -u John_Smith -p pass -d UCI/iris.arff 
	-a John_Smith/Classifiers/RseslibMajorityClassifier.jar:MajorityClassifier
		

Weka interface

Prepared source code may be dowloaded from the repository's Examples folder.

To be able to compile a Weka based classifier we will need a copy of the library from the repository - [Weka library download link]

Let's quote some important fragments from Weka's Classifier class:

package weka.classifiers;

(...)

public abstract class Classifier (...) {

	(...)

	/**
	 * Generates a classifier. Must initialize all fields of the classifier that are not being set via options 
	 * (ie. multiple calls of buildClassifier must always lead to the same result).
	 * Must not change the dataset in any way.
	 */
	public abstract void buildClassifier(Instances data) throws Exception;

	/**
	 * Classifies the given test instance. The instance has to belong to a dataset when it's being classified. 
	 * Note that a classifier MUST implement either this or distributionForInstance().
	 */
	public double classifyInstance(Instance instance) throws Exception (...)
	
	/**
	 * Predicts the class memberships for a given instance. If an instance is unclassified, the returned array 
	 * elements must be all zero. If the class is numeric, the array must consist of only one element,
	 * which contains the predicted value. Note that a classifier MUST implement either this or classifyInstance().
	 */
	public double[] distributionForInstance(Instance instance) throws Exception (...)
	
	(...)

} 

We will implement buildClassifier and classifyInstance methods:

import weka.classifiers.Classifier;
import weka.core.Instance;
import weka.core.Instances;

/**
 * Example implementation of a majority classifier (using Weka architecture) which assigns always the same decision - 
 * the most frequent decision in the training set.
 */
public class MajorityClassifier extends Classifier {

	private double decision;

	public void buildClassifier(Instances instances) throws Exception {
		int decAttributeIndex = instances.classAttribute().index();
		int[] valuesCounts = instances.attributeStats(decAttributeIndex).nominalCounts;
		// Find the index with the highest count.
		int bestCountIndex = 0;
		for (int decisionIndex = 1; decisionIndex < valuesCounts.length; decisionIndex++)
			if (valuesCounts[decisionIndex] > valuesCounts[bestCountIndex]) 
				bestCountIndex = decisionIndex;
		decision = bestCountIndex;
	}

	public double classifyInstance(Instance instance) throws Exception {
		return decision;
	}

}

Copy the above code and save it as a MajorityClassifier.java file. Place it in the same directory as the already downloaded weka<version>.jar file. Then compile and pack the classifier into the jar archive:

javac -cp weka<version>.jar MajorityClassifier.java
jar cf WekaMajorityClassifier.jar MajorityClassifier.class
		

If everything went well you shoud have WekaMajorityClassifier.jar file in the current directory.

As previously upload the classifier's jar file into the repository John_Smith/Classifiers folder and evaluate its accuracy on some data sets using either TunedTester GUI or command line - refering to the classifier by John_Smith/Classifiers/WekaMajorityClassifier.jar:MajorityClassifier

./tunedtester.sh -g -s -u John_Smith -p pass -d UCI/iris.arff 
	-a John_Smith/Classifiers/WekaMajorityClassifier.jar:MajorityClassifier
		

Example 3 - writing an evaluation procedure

Will appear soon...

Frequently Asked Questions (FAQ)

Q: If I use my private resource in a test and submit result to KB, will other users see the result on KB page?

A: No. To see the result the user must have access rights to all the resources used in a given test: algorithm, dataset and evaluation procedure.

Q: What if an error occurs during test and "Send results to Knowledge Base" option is checked? Is the error sent to KB? Is it included in results shown in KB page?

A: Errors caused by the tested algorithm are sent to KB. Other errors: caused by evaluation procedure or testing environment (like problems with network connection) are not. Currently, errors submitted to KB are not included in the results shown at KB page.

Q: What programing language should I use to implement new algorithms and evaluation procedures?

A: Java.

Q: What API my algorithm should implement to be suitable for TunedTester?

A: It depends on evaluation procedures that will be used for this algorithm. The preferred way is to use API of Debellor and implement the algorithm as a subclass of org.debellor.core.Cell. This should be understood by most of evaluation procedures, including ClassificationTT70 and RegressionTT70. For regular classification/regression algorithms you can also use Weka or Rseslib API and implement the algorithm as a subclass of weka.classifiers.Classifier or rseslib.processing.classification.Classifier.

Q: In what data format should I save my dataset so that TunedTester can use it?

A: It depends on evaluation procedures that will be used. Currently, ARFF is the preferred format and it should be understood by most of evaluation procedures, including ClassificationTT70 and RegressionTT70.

Q: I receive OutOfMemory errors when running TunedTester.

A: This may occur if data used in tests are too large to fit in memory. Try to increase the amount of memory available to TunedTester: edit tunedtester.bat (on Windows) or tunedtester.sh file (on Linux) and change the parameter value -Xmx256m to something bigger, like -Xmx1024m - this will increase memory size from 256 MB to 1 GB. Due to restrictions of JVM, on 32-bit systems this value cannot exceed 1.5 GB.

See also the discussion forum to view and post questions and answers.

Copyright © 2008-2013 by TunedIT
Design by luksite