This article is a spin-off piece from the paper “Do We Need Improved Code Quality Metrics?” by myself and Prof. Diomidis Spinellis.
Earlier, I wrote an article about issues with code quality metrics; in another article I presented how all the variants of LCOM metric misrepresent simple cases of class cohesion. In this article, I first summarize the deficiencies in the present set of LCOM variants (LCOM1, LCOM2, LCOM3, LCOM4, LCOM5) and then I present YALCOM that addresses the issues of existing LCOM variants.
Deficiencies in existing LCOM variants
- LCOM1-4 take into account only the instance attributes of a class, ignoring any static attributes. Although the dynamic properties of a static attribute differ from the instance attributes (for instance, static attributes can be accessed without creating an object), these attributes are part of the class they reside in. In the context of a metric that measures the similarity among class members, their dynamic property is irrelevant. Therefore, ignoring static attributes while assessing cohesion is inappropriate.
- The existing LCOM algorithms fail to distinguish the cases where the metric cannot be measured, from the perfectly cohesive cases, by always emitting the lowest metric value in the former cases. For instance, LCOM2 reports 0 not only when the type under measurement is completely cohesive, but also when the type is an interface and a utility class, i.e., a class with no attributes. Such an approach produces the illusion to the user that all cases with a metric value of zero are cohesive, while in reality, the algorithm was not provided with enough information to measure the metric and the algorithm fails to communicate this to the user.
- Method invocations within a class show that methods are working together to achieve a goal and thus must be considered while computing LCOM. However, LCOM1-3 and LCOM5 do not consider method invocations to compute the metric.
- Furthermore, the existing LCOM implementations focus on the common attribute access among methods within a class; however, they ignore common attribute access where the attribute is defined in a superclass. Classes are extensions of their superclasses, and it is very common to elevate data and method members to superclasses to avoid duplication among siblings. Hence, two methods that share attribute access or method invocation that is defined in a superclass contribute to cohesion and thus must be considered while computing the metric.
- Lastly, Fenton and Pfleeger stated that a metric may follow a suitable measurement scale (such as nominal, ratio, and absolute) depending on the aspect being measured. LCOM1–4 measure cohesion on an absolute scale that may emit an arbitrary large number as the metric value making it almost impossible for the user to gain any insight from it. For instance, given that
mis the number of methods of a class, the maximum value that LCOM2 may produce is
(m * (m-1))/2, which could be a considerable number for large classes. To facilitate metric interpretation and comparison, bounded concept such as cohesion must be better represented by a normalized value.
The algorithm takes a type i.e., a class or an interface as an input. The algorithm returns -1 when the algorithm finds that the metric is not computable otherwise it returns a LCOM metric value [0, 1]. The metric is not computable when the number of methods is zero, or when the analyzed type is an interface. The algorithm creates a graph where the methods and attributes of the class are treated as vertices. Here, attributes from superclasses that are accessible from the class are also included. Relationships, i.e., field accesses and method invocations, among the methods and attributes, make the edges. For example, if a method m1 accesses attributes a1 and a2 as well as calls method m2, then the node corresponding to method m1 will have edges to nodes representing attributes a1 and a2 as well as to method m2. Once the graph is constructed for the input class, the algorithm finds the disconnected subgraphs of methods. If the number of disconnected subgraphs is one, then all the attributes and methods are connected to each other and hence the class is perfectly cohesive (and thus assigned as 0 as the metric value). If the number of disconnected subgraphs is more than one, it implies that there are many islands of functionality within the class and hence the class is not cohesive. Here, the higher number of such subgraphs implies poorer cohesion. We compute the metric by dividing the number of disconnected subgraphs by the number of methods in the class. Since the number of disconnected subgraphs cannot be more than the number of methods (when none of the methods is associated with rest of the methods in the class), the maximum value that the algorithm can produce is 1.
Validation and comparison
In order to establish whether the commonly used set of LCOM metrics sufficiently capture the cohesion aspect of abstractions, we handcrafted eight classes/interfaces representing different cases. They are designed to cover various common cases involving interplay of method calls, fields-their type (a class or an interface) and their accesses, and inheritance that impact class cohesion and may potentially reveal the deficiencies of the existing algorithms to compute LCOM.
The details of each of these cases as well as the performance of all the existing LCOM metrics and the proposed metric is presented in our paper. The source code that implement the considered variants of LCOM and YALCOM can be found on GitHub.