Reverse Engineering of Object Oriented Code
Reverse Engineering Guide...
2.6 The eLib Program
Fig. 2.8. OFG associated with the abstract code in Fig. 2.6 (method in class ) and 2.7 (method in classes , ).
2 The Object Flow Graph
and the allocation’s left hand side variable, Library.borrowDocument.loan (Fig. 2.8 center, edge labeled 60). An example of a method call with a return value is provided by the first abstract statement (after the declaration) of method Library. addLoan (see Fig. 2.7 top, line 42). The left hand side location (Library.addLoan.user) is the target of an edge outgoing from Loan.getUser.return, the location associated with the value returned by the method call (see Fig. 2.8 bottom, edge labeled 42). Container operations are also responsible for some edges in the OFG of Fig. 2.8. For example, the body of User.addLoan contains just an insertion statement (line 315). The container User.loans, into which a Loan object is inserted, becomes the target of an edge starting at the inserted object location, User .addLoan. loan (Fig. 2.8 center, edge labeled 44). This indicates an object flow from the parameter loan of method addLoan into the container User .loans. The OFG constructed for the code in Fig. 2.6 and 2.7 shows the data flows through which objects are propagated from location to location. Thus, the parameter user of method borrowDocument becomes the current object (this) inside numberOfLoans, while it is the parameter user inside method authorizedLoan and it is the parameter usr inside the constructor of class Loan, as depicted at the top of Fig 2.8. Similarly, the other parameter of borrowDocument, doc, flows into isAvailable and authorizedLoan as this, and into the constructor of class Loan as the parameter doc. The object of class Document returned by Loan.getDocument (bottom-right of Fig. 2.8) flows into the local variable doc of Library. addLoan, and then becomes the current object (this) inside Document. addLoan.
2.7 Related Work The OFG and the related flow propagation algorithms are based on research conducted on pointer analysis [3, 21, 47, 49, 60, 68, 81, 86]. The aim of pointer analysis is to obtain a static approximation of any points-to relationship that may hold at run-time between pointers and program locations. Similarly, when Object-Oriented programs are considered, the relationship between reference variables and objects is analyzed. Pointer analysis algorithms can be divided into flow/context sensitive [21, 47, 60] and flow/context insensitive [3, 81]. Flow/context sensitive algorithms produce fine grained and accurate results, in that a points-to relationship is determined that holds at every program statement. Moreover, different invocation contexts can be distinguished. However, the computational complexity involved in these approaches is high, and in practice their performance does not scale to large software systems. Flow/context insensitive algorithms have lower complexity and scale well. On the other side, they produce results that hold for the whole program, and the points-to relationships they derive cannot
2.7 Related Work
be distinguished by statement or invocation context. Flow/context sensitive analyses are defined with reference to the control flow graph  of a program, while flow/context insensitive algorithms define the analysis semantics at the statement level. The algorithm most similar to ours is . Originally described for the C language, it has been recently extended to Java [49, 68]. Differently from the approach followed in this book, no explicit data structure, such as the OFG, is used in  as a support for the flow propagation: data flows are represented as set-inclusion constraints. The improvement of a control flow insensitive pointer analysis obtained by introducing object sensitivity was proposed in , where the possibility of parameterizing the degree of object sensitivity is also discussed.
This page intentionally left blank
The class diagram is the most important and most widely used description of an Object Oriented system. It shows the static structure of the core classes that are used to build a system. The most relevant features (attributes and methods) of each class are provided in the class diagram, together with the optional indication of some of their properties (visibility, type, etc.). Moreover, the class diagram shows the relationships that hold among the classes in a system. This gives a static view of the structural connections that have been designed to allow communication and interaction among the classes. Thus, the class diagram provides a very informative summary of many design decisions about the system’s organization. Recovery of the class diagram from the source code is a difficult task. The decision about what elements to show/hide profoundly affects the usability of the diagram. Moreover, interclass relationships carry semantic information that cannot be inferred just from the analysis of the code, being strongly dependent on the domain knowledge and on the design rationale. A basic algorithm for the recovery of the class diagram can be obtained by a purely syntactic analysis of the source code, provided that a precise definition of the interclass relationships is given. For example, an association can be inferred when a class attribute stores a reference to another class. One problem of the basic algorithm for the recovery of the class diagram is that declared types are an approximation of the classes actually instantiated in a program, due to inheritance and interfaces. An OFG based algorithm can be defined to improve the accuracy of the class diagram extracted from the code, in presence of subclassing and interface implementation. Another problem of the basic algorithm is related to the usage of weakly typed containers. Associations determined from the types of the container declarations are in fact not meaningful, since they do not specify the type of the contained objects. It is possible to recover information about the contained objects by exploiting a flow analysis defined on the OFG. The basic rules for the reverse engineering of the class diagram are given in Section 3.1. Accuracy of the associations in presence of inheritance and in-
3 Class Diagram
terfaces is discussed in Section 3.2, where an algorithm is provided to improve the results of a purely syntactic analysis. The problems related to the usage of weakly typed containers and an OFG based algorithm to address them are described in Section 3.3. Recovery of the class diagram is conducted on the eLib application in Section 3.4. Related works are discussed in the last section of this chapter.
3.1 Class Diagram Recovery The elements displayed in a class diagram are the classes in the system under analysis. Internal class features, such as attributes and methods, can be also displayed. Properties of the displayed features, as, for example, the type of attributes, the parameters of methods, their visibility and scope (object vs. class scope), can be indicated as well. This information can be directly obtained by analyzing the syntax of the source code. Available tools for Object Oriented design typically offer a facility for the recovery of class diagrams from the code, which include this kind of syntactic information. eLib example
Fig. 3.1. Information gathered from the code of class User.
Fig. 3.1 shows the UML representation recovered from the source code of class User, belonging to the eLib example (see Appendix A). The first compartment below the class name shows the attributes (userCode, fullName, etc.). Static attributes (nextUserCodeAvailable) are underlined. Class op-
3.1 Class Diagram Recovery
erations are in the bottom compartment. The first entry is the constructor, while the other methods provide the exported functionalities of this class. Relationships among classes are used to indicate either the presence of abstraction mechanisms or the possibility of accessing features of another class. Generalization and realization relationships are examples of abstraction mechanisms commonly used in Object Oriented programming that can be shown in a class diagram. Aggregation, association and dependency relationships are displayed in a class diagram to indicate that a class has access to resources (attributes or operations) from another class. A generalization relationship connects two classes when one inherits features (attributes and methods) from the other. The subclass can add further features and can redefine inherited methods (overriding). A realization relationship connects a class to an interface if the class implements all methods declared in the interface. Users of this class are ensured that the operations in the realized interface are actually available. Generalization and realization relationships satisfy the substitutability principle: in every place in the program where a location of the superclass/interface type is declared and used, an instance of any sublass/class realizing the interface can actually occur. Relationships of access kind hold between pairs of classes each time one class possesses a way to reference the other. Conceptually, access relationships can be categorized by relative strength. A quite strong relationship is the aggregation. A class is related to another class by an aggregation relationship if the latter is a part-of the former. This means that the existence of an object of the first class requires that one or more objects of the other class do also exist, in that they are an integral part of the first object. Participants in aggregation relationships may have their own independent life, but it is not possible to conceive the whole (first class) without adding also the parts (second class). An even stronger relationships is the composition. It is a form of aggregation in which the parts and the whole have the same lifetime, in that the parts, possibly created later, can not survive after the death of the whole. A weaker relationship among classes than the aggregation is the association. Two classes are connected by a (bidirectional) association if there is the possibility to navigate from an object instantiating the first class to an object instantiating the second class (and vice versa). Unidirectional associations exist when only one-way navigation is possible. Navigation from an object to another one requires that a stable reference exists in the first object toward the other one. In this way, the second object can be accessed at any time from the first one. An even weaker relationship among classes is the dependency. A dependency holds between two classes if any change in one class (the target of
3 Class Diagram
the dependency) might affect the dependent class. The typical case is a class that uses resources from another class (e.g., invoking one of its methods). Of course, aggregation and association are subsumed by dependency. 3.1.1 Recovery of the inter-class relationships From the implementation point of view, there is no substantial difference between aggregation and association. Both relationships are typically implemented as a class attribute referencing other objects. Attributes of container type are used whenever the multiplicity of the target objects is greater than one. In principle, there would be the possibility to approximately distinguish between composition and aggregation, by analyzing the life time of the referenced objects. However, in practice implementations of the two relation variants have a large overlap. In the implementation, dependencies that are not associations or aggregations can be distinguished from the latter ones because they are accesses to features of another class performed through program locations that, differently from class attributes, are less stable. For example, a local variable or a method parameter may be used to access an object of another class and invoke one of its methods. In such cases, the reference to the accessed object is not stable, being stored in a temporary variable. Nevertheless, any change in the target class potentially affects the user class, thus there is a dependency.
Table 3.1 summarizes the inter-class relationships and the rules for their recovery. Generalization and realization are easily determined from the class declaration, by looking for the keywords extends and implements, respectively. The declared type of the program locations (attributes, local variables, method parameters) involved in associations (including aggregations) and dependencies is used to infer the target of such relationships. In the next two