Decision Tree Applet Design Manual
Decision Tree Applet Design Notes Overview: This manual provides a very brief overview of the architecture and function of the various components used to build the decision tree applet. The applet is designed using a layered approach, separating the underlying algorithm from the GUI controls. More detailed information on each class can be found in the appropriate Javadoc HTML file.
Applet Packages: The applet code is divided into four separate Java packages, each of which is described in the table below.
Contains classes that provide a variety of general services. Classes in this package can potentially be reused in other applets and applications.
Contains classes that support the implementation of the decision tree learning algorithm.
Contains classes that provide GUI components used to control and display output from the decision tree learning algorithm.
Contains the main decision tree applet class. Table 1 - Applet package descriptions.
Applet Structure: The diagram that follows provides a high-level view of the major applet components and their interactions. Most communication between components is handled through a ComponentManager instance. The ComponentManager acts as a ‘reference repository’ – when one component wants to send a message to another component, the sender asks the ComponentManager for the required reference. Using this structure, GUI and algorithm components do not need to keep multiple references internally, which simplifies the overall applet design.
Dataset Dataset Menu
Decision Tree Algorithm
Decision Tree Visual Representation of Tree
Figure 1 - Applet architecture
Packages Details: The sections that follow provide more detailed information on important classes contained in each of the applet packages.
Package ai.common: The ai.common package contains a series of general utility classes that can potentially be reused in other applets and applications.
AlgorithmFramework Class: AlgorithmFramework encapsulates functionality allowing an algorithm to run in a separate thread. The class implements a series of synchronized methods to control the state of an algorithm (for example, whether it is started or stopped). Additionally, the class maintains a reference to the current ‘run mode’, which is one of NORMAL_MODE, TRACE_MODE, or BREAK_MODE.
AlgorithmListener, HighlightListener and TreeChangeListener Interfaces: These interfaces are loosely based on the Java 1.2 event model. For example, classes that need to respond to events associated with algorithm execution (algorithm start, stop and step events) should implement the AlgorithmListener interface.
CodePanel, CodeReader and InvalidCodeFileException Classes: The CodePanel class extends the Swing JPanel component class, and is used to display algorithm pseudo-code. CodePanel implements the HighlightListener interface, allowing lines of pseudo-code to be highlighted as an algorithm executes. Pseudo-code is stored in an external HTML file, which is read and parsed by an instance of the CodeReader class. In addition to the standard HTML tags, CodeReader and CodePanel recognize the following tags: – The tags, which are used to identify the pseudo-code for a specific function. The tags must appear in pairs. Each line of pseudocode between the function tags should be terminated with a period – a parsing error causes an InvalidCodeFileException to be thrown. – The tag, which is used to indicate a standard indent level. Multiple tags can appear together – when the pseudo-code is rendered in the CodePanel, tags are automatically replaced with a fixed number of spaces. This facilitates uniform indentation of nested lines of pseudo-code.
Package ai.decision.algorithm: Classes in the ai.decision.algorithm package implement standard decision tree learning and pruning algorithms. The package is independent of any specific GUI components, allowing the learning and pruning algorithms to run as part of a standalone, text-based application. •
Attribute Class: The Attribute class stores information about one particular attribute from a decision tree dataset, including the attribute name and a list of possible values. Additionally, each Attribute object contains an internal statistics array, which is populated with values as part of the decision tree construction process.
Attribute value index
Target value index
Number of examples with particular attribute value and associated target value Figure 2 - Internal Attribute statistics array The decision tree learning algorithm uses the values stored in the statistics array to determine which attribute to split on at each position in the tree. •
AttributeMask Class: An AttributeMask is a one-dimensional array with a size equal to the number of attributes in a given decision tree dataset (including the target attribute). The mask tracks which attributes are ‘split on’ along the path to a certain position in the tree. Each cell in the mask can contain an attribute value index, or the UNUSED designation, which indicates that an attribute is not used along a particular path. As an example, consider a dataset with four attributes (including the target attribute). The mask for the lower-left leaf node is shown in the diagram below.
Target Attribute Attribute 1 Attribute 2 Attribute 3 Target class 1 Value index 0 Value index 0 Unused 1 0 0 1
Figure 3 - An example Attribute mask In this case, attributes 1 and 2 are used along the path to the leaf. The path descends from the node representing attribute 1 along the arc labeled with attribute 1 value index 0. It then descends from the node representing attribute 2 along attribute 2 value index 0 to the leaf. The value stored at the leaf, 1, identifies the target classification for examples that follow the path described. Note that indices are used, instead of string identifiers, for the sake of efficiency – attribute 1 might correspond to “width” or “color”, for example. The target attribute is always located at position 0 in the mask. Other attributes derive their indices from their positions in the dataset.
Dataset, FileParser, InvalidMetaFileException and InvalidDataFileException Classes: A Dataset object encapsulates a decision tree dataset. Datasets are stored on disk in two separate files: a meta file, and a data file. The meta file describes dataset attributes and their associated values; the data file contains all the examples from the dataset, one line per example. When a Dataset is created, it uses an instance of the FileParser class to parse both the meta file and the data file. Examples in the data file are converted from string representation to an integer representation to save space. For instance, the third attribute listed in the meta file might be “Height”, which takes on the values “Tall” and “Short”. The Dataset assigns the Height attribute an index value of 2 (with 0 as the first index) – Tall and Short are assigned attribute value indices of 0 and 1, respectively. If the meta file contains a syntax error, the FileParser will throw an InvalidMetaFileException. Likewise, errors in the data file cause the FileParser to throw an InvalidDataFileException. A Dataset object is capable of splitting a set of examples into two groups: a training group and a testing group. Examples from each group can be accessed independently.
DecisionTree and DecisionTreeNode Classes: A DecisionTree manages the set of DecisionTreeNode objects that form the current decision tree. Each node is identified by a label (which corresponds to the attribute or target classification that the node represents) and contains references to any child nodes. The DecisionTree class controls access to the nodes, and maintains tree-related information (such as the number of internal and leaf nodes in the tree, whether the tree is complete or not, etc.). A DecisionTree object can inform associated TreeChangeListeners when the state of the tree changes (for example, when a node is added or removed).
DecisionTreeAlgorithm Class: The DecisionTreeAlgorithm class is the largest and most complex class in the series of ai packages, implementing standard decision tree learning and pruning algorithms. In addition to being self-contained (able to build and prune a tree while running in it’s own thread), many of the class methods are ‘exposed’, giving GUI classes the ability to support interactive tree building and pruning.
Package ai.decision.gui: The ai.decision.gui package contains classes that provide access and display services for the underlying decision tree algorithms. •
AlgorithmMenu and AlgorithmPanel Classes: The AlgorithmMenu class controls the current algorithm thread, allowing a user to run, suspend, backup and restart the decision tree learning and pruning algorithms. In each mode, the menu keeps track of available menu options, enabling and disabling certain items
based on the state of the algorithm. AlgorithmPanel is a thin wrapper class for the CodePanel that contains the algorithm pseudo-code. •
DatasetMenu, DatasetPanel and DatasetTableModel Classes: Together, the DatasetMenu and DatasetPanel classes are responsible for loading and displaying a decision tree dataset. DatasetMenu spawns a separate thread to load a dataset, which keeps the GUI responsive. The listing of available datasets in the menu is derived from a parameter in the applet’s HTML page (see the DecisionTreeApplet section for more information). DatasetPanel uses an instance of the DatasetTableModel class to display examples from a dataset in a standard JTable. Note that the target attribute is always shown in the leftmost column of the table (i.e. the leftmost column is the target attribute column).
TreeLayoutPanel, VisualTreePanel, AbstractTreeNode, VacantTreeNode, VisualTreeNode and VisualTreeArc Classes: The TreeLayoutPanel class provides a tree display canvas, which can properly position and display an m-ary tree with optimal width and spacing characteristics. The positioning algorithm implemented by the class is detailed in John Walker’s 1990 Software: Practice and Experience paper “A Node-positioning Algorithm for General Trees”. Data required by the positioning algorithm is contained in instances of the AbstractTreeNode class. AbstractTreeNode is an abstract class and cannot be instantiated -- instead, tree nodes that are actually drawn are instances of VacantTreeNode or VisualTreeNode, both of which extend AbstractTreeNode. A VacantTreeNode represents a position in the decision tree that is available for extension or growth, and is hence ‘vacant’. A VisualTreeNode, on the other hand, provides a visual representation of an underlying DecisionTreeNode. Because vacant positions have to be displayed, the applet keeps two representations of a decision tree at all times. The first is the representation composed of DecisionTreeNodes, which form the tree that is manipulated by the underlying algorithm. The second representation, containing at least as many nodes as the underlying decision tree, provides a visual depiction of the tree and vacant positions. VisualTreePanel extends TreeLayoutPanel and provided support for interactive construction and pruning of the decision tree. In addition, VisualTreePanel stores a number of global variables, including the current zoom level. VisualTreeNodes and VacantTreeNodes automatically check the global variable settings before drawing themselves. The VisualTreeArc class contains a single static method that draws an arc between two tree nodes. The arc can have a text label painted at its midpoint if required
Figure 4 - Relationship between visual and underlying tree nodes
DecisionTreeNodes Visual Representation
Figure 4 – Relationship between visual and underlying tree nodes
Package ai.decision.applet: The ai.decision.applet package contains a single class, DecisionTreeApplet, which is the applet implementation. •
DecisionTreeApplet Class: The DecisionTreeApplet class is responsible for instantiating various backend and GUI components and assembling the GUI display. The class looks for one particular attribute in it’s companion HTML page: the “Datasets” parameter, which references a semi-colondelimited list of available datasets.