CRAFT:A Framework for Evaluating Software Clustering Results in the Absence of Benchmark Decompositions
Brian S.Mitchell and Spiros Mancoridis
Department of Mathematics&Computer Science
Drexel University,Philadelphia,PA,USA
{bmitchel,smancori}@mcs.drexel.edu
Abstract
Software clustering algorithms are used to create high-level views of a system’s structure using source code-level artifacts.Software clustering is an active area of research that has produced many clustering al-gorithms.However,we have seen very little work that investigates how the results of these algorithms can be evaluated objectively in the absence of a benchmark de-composition,or without the active participation of the original designers of the system.
Ideally,for a given system,an agreed upon refer-ence(benchmark)decomposition of the system’s struc-tur
e would exist,allowing the results of various clus-tering algorithms to be compared against it.Since such benchmarks seldom exist,we seek alternative methods to gain confidence in the quality of results produced by software clustering algorithms.
In this paper we present a tool that supports the eval-uation of software clustering results in the absence of
a benchmark decomposition.
1.Introduction&Background
Maintaining large and complex software systems is a daunting task.Poor software maintenance practices often lead to a system’s premature retirement,or ne-cessitates significant restructuring in order to keep the source code in manageable condition.There are many causes for these poor maintenance practices.Little can be done about practical issues such as limited main-tenance budgets,lack of access to the original devel-opers and designers of the system,and short devel-opment schedules.However,if software maintainers have a thorough understanding of the underlying sys-tem structure,it is more likely that changes to the source code can be performed without deviating from the system design and architecture.Additionally,if the source code structure is documented,new m
aintenance developers will require less time to become productive.
The best way to gain insight into a system’s struc-ture is to have access to accurate documentation.Un-fortunately system documentation is often not up to date,if it exists at all.One way that researchers have addressed this problem is by developing software clustering algorithms and tools that produce high-level views of a system’s structure directly from its source code.Because clustering techniques use a variety of cri-teria to decompose a software system into clusters,it is not surprising that these algorithms produce different results when applied to the same system.
The above issues further complicate things for soft-ware practitioners who are trying to understand the structure of complex systems.A manual analysis of the source code is tedious,and may not even be possible, because of its large number of components and depen-dencies.Clustering algorithms automate this analysis.
Many of the clustering techniques published in the literature can be categorized by the way they estab-lish clusters.Hutchens and Basili[7]developed an algorithm that clusters procedures into modules by measuring the interaction between pairs of procedures. Schwanke et al.[16,17]introduced the notion of using design principles such as low coupling and high cohe-sion to create clusters.Choi and Scacchi[4]
describe a clustering technique based on maximizing the cohesive-ness of clusters by evaluating the exchange of resources between modules.Hausi M¨u ller et al.[15]implemented several software clustering heuristics in the Rigi tool that(a)measure the relative strength between inter-faces,(b)identify omnipresent modules,and(c)use similarity of module names.Clustering based on simi-lar patterns in implementation ,mod-ulefile names)has been investigated by Anquetil et al.[2,3].Concept analysis[10,19,1]has also been ex-plored in the software clustering research.Our research is based on using optimization techniques[12,5,11,14] to determine clusters.
Now that a plethora of clustering approaches exist, the validation of clustering results is starting to at-tract interest from the Software Engineering research community.Many of the clustering techniques pub-lished in the literature present case studies where the results are evaluated by the authors or by the develop-ers of the system being studied.This evaluation tech-nique is very subjective.Moreover,the literature often does not describe the types of systems for which a clus-tering technique does not perform well.For example, our clustering technique,named Bunch[12,5,11,14], forms clusters based on maximizing the cohesiveness of the clusters,while minimizing the inter-cluster cou-pling.While we think that this technique is good for many types of systems,it is not suitable to all sys-tems.As an example,Bunch may not provide good results for systems whose architecture is event driven (e.g.,UI code),or whose architecture loads system re-sources dynamically1.
The issues described above should be considered,es-pecially when the decompositions produced by different clustering algorithms differ dramatically.Recently,re-searchers have begun developing infrastructure to eval-uate clustering techniques by proposing similarity mea-surements[2,3,13].These measurements enable the results of clustering algorithms to be compared to each other,and preferably to be compared to an agreed upon “benchmark”standard.A high similarity value should provide confidence that the clustering algorithm is pro-ducing good decompositions.
The remainder of this paper is dedicated to tech-niques for evaluating the results of clustering algo-rithms when no standard benchmark exists.We present a tool we developed that provides a framework for evaluating clustering techniques.We hope that our tool will be of value to researchers who are creating or evaluating clustering algorithms.
2.Establishing Confidence in Software
Clustering Results
In this section we address the problem of how to gain confidence in a software clustering result when a reference benchmark standard is not available for com-parison.Although significant research emphasis has been placed on clustering,we have seen little work on measuring the effectiveness of these techni
ques.The following list outlines the current state of practice for evaluating software clustering results:
•Software clustering research often revisits the same set of test cases for ,linux, 1This problem is interesting and has received very little formal attention by researchers.We are investigating how dynamic pro-gram behavior can be integrated into our clustering techniques.
mosaic,etc.).This is appropriate because the structure of these systems is well understood.
However,for industry adoption of software cluster-ing technology,researchers must show that these tools can be applied to a variety of systems suc-cessfully.
•There are many clustering tools and techniques.
There is value to this diversity,as certain clus-tering techniques produce better results for some types of systems than others.However,no work has been done to help guide end-users to those techniques that work best for their type of system.•The interpretation of software clustering results tends to focus on the overall ,the en-tire system),and not on partial ,a subsystem)which may also be of value.•Software Engineering researchers have been inves-tigating similarity measurements[2,3,13]to quan-tify the level of agreement between pairs of clus-tering results.Research
ers tend to focus on using these measurements to compare clustering results directly;however,we know of no work that tries tofind common patterns in clustering results that are produced by a family of different clustering al-gorithms.
•The importance of evaluating clustering tech-niques has been investigated by Koschke and Eisenbarth[9].Their paper describes several tech-niques for comparing the results of a clustering algorithm to a reference decomposition.The au-thors also describe a consensus-driven approach for creating a reference decomposition.However, their approach for establishing the reference de-composition is manual,requiring teams of software engineers.
To address some of the problems mentioned above, we propose a framework that supports a process for automatically creating views of a system’s structure that are based on common patterns produced by vari-ous clustering algorithms.The initial step in our pro-cess clusters a system many times using different clus-tering algorithms.For each clustering run,all of the modules that appear in the same cluster are recorded. Using this information our framework presents consol-idated views of the system structure to the user.By using a collection of clustering algorithms we“aver-age”the different clustering results and present views based on common agreement across the various clus-tering algorithms.In the next section we introduce our framework,which we have implemented and made
Figure1.The User Interface of the CRAFT Tool
available to researchers on the world wide web at s.drexel.edu/bunch/CRAFT.
3.Our Framework-CRAFT
In Figure1we illustrate the user interface of our clustering analysis tool,which we call CRAFT (C lustering R esults A nalysis F ramework and T ools). The main window is organized into two sections.The section at the top of the window collects general pa-rameters,and the section at the bottom of the window accepts thresholds that apply to a particular analysis service.Our tool currently supports two such services, Confidence Analysis and Impact Analysis,which are discussed later in this section.
The overall goal of the CRAFT framework is to expose common patterns in the results produced by different clustering algorithms.By highlighting the common patterns,we gain confidence that agreement across a collection of clustering algorithms is likely to reflect the underlying system structure.We are also interested in identifying modules/classes that tend to drift across many clusters,as modifications to these modules/classes may have a large impact on the over-all system structure.
As shown in Figure2,the CRAFT architecture con-sists of the following subsystems:•The User Interface:CRAFT obtains informa-tion from the user to guide the collection,process-ing and visualization of clustering results.•Clustering Services:The actual clustering pro-cess is externalized to a dynamically loadable clus-tering driver.The clustering driver is responsi-ble for partitioning a graph that represents the components and dependencies of a software sys-tem into a set of non-overlapping
clusters.The CRAFT framework invokes the clustering driver multiple times based on the value provided by the user in the Number of Runs entryfield.
•Data Analysis Services:The results of the clus-tering activity are stored in a relational database. For each clustering run,every pair of modules in the system is recorded to indicate if the modules in the pair are in the same or different clusters. The repository of clustering results supports two types of analyses:Confidence Analysis and Impact Analysis.Confidence Analysis produces a decom-position of the system based on common trends in the clustering data.Impact Analysis examines each module in the system to determine how it relates to all of the other modules in the system with respect to its placement into clusters.Con-fidence Analysis helps users gain confidence in a
clustering result when no reference decomposition exists.Impact Analysis helps users perform local analyses,which is much simpler than analyzing the entire decomposition of a system.•Visualization:Once the results of the clustering analysis have been produced,our tool presents the results in diagrammatic form.
We next present a small example illustrating the ca-pabilities of the CRAFT framework followed by a de-tailed description of the above mentioned services.
Figure2.The Architecture of the CRAFT
Tool
3.1.A Small Example
In Figure3we illustrate a small example to help ex-plain the usage of the CRAFT framework.The Mod-ule Dependency Graph(MDG),which is illustrated at the upper-left corner of thefigure,shows a system consisting of5modules{M1,M2,M3,M4,M5}with 6dependencies between the modules.The MDG is a generic,language independent representation of the structure of the system’s source code components.This representation includes all of the modules(classes)in the system and the set of dependencies that exist be-tween the modules(classes).
Intuitively,based on the dependencies between the modules,we expect that{M1,M2}and{M4,M5} should appear in the same cluster.We also expect that module M3is equally likely to appear in the{M1,M2} or{M4,M5}cluster since it is isomorphic to both of them.
We executed CRAFT with a setting that clustered the example system100times.The data from the clus-tering results repository is shown on the bottom-left corner of Figure3.Each relation in the repository in-dicates the frequency that a pair of modules appears in the same cluster.Visual inspection of this data con-firms that modules M1and M2,along with modules M4and M5appear in the same cluster100%of the time.The data also shows that module M3appears in the{M1,M2}cluster57%of the time and in the {M4,M5}cluster43%of the time.
The results of the Impact Analysis are shown in the center of Figure3.To conserve space,we have config-ured the CRAFT user interface to show only the mod-ules that exceed an upper threshold(as shown by the↑icon)that was selected by the user.In Section3.4.2we describe additional features of the Impact Analysis ser-vice inclusive of the thresholds that can be specified by the user.The Impact Analysis results clearly show that modules M1and M2always appear together,modules M4and M5always appear together,and that module M3does not appear in any particular cluster consis-tently(the icon for M3cannot be expanded).
On the right-side of Figure3we show the results of the Confidence Analysis.The partition of the system shown in the Confidence Analysis result is consistent with our intuitive decomposition,as well as the view produced by the Impact Analysis service.Note that the edges shown in Figure3do not depict dependencies in the system being studied.The edges instead represent the frequency percentage that the two modules appear in the same cluster based on all of the clustering runs. The cluster containing only module M3does not nec-essarily indicate that this module appears alone in any of the clustering results.In the Confidence Analysis visualization,singleton clusters should be interpreted as modules that do not appear in any particular cluster consistently.
Given such a small example,the value of having mul-tiple views of the clustering results data is not appar-ent.In Section4,where we present the results of a case study on a non-trivial example,the benefits of the two views becomes more obvious.Now that a simple example has been has been presented,we return to dis-cussing the architecture and services of CRAFT.
3.2.The User Interface
The user interface of our tool,which is shown in Figure1,collects information that is necessary to ana-lyze the results of a collection of clustering algorithms. The key information collected on the user interf
ace is the textfile name of the MDG,the number of times the clustering step will be performed(described in Sec-tion3.3),the name of a Java Bean that will perform the clustering activities,and the thresholds that are used by the data analysis and visualization services. The tab at the bottom of the window is used to select the analysis service to be performed,and to collect the parameters associated with each analysis service.
{M1,M2,100} {M1,M3, 57} {M1,M4,  0}
{M1,M5,  0} {M2,M3, 57}
{M2,M4,  0} {M2,M5,  0} {M3,M4, 43} {M3,M5, 43} {M4,M5,100}
MDG Graph
Impact Analysis Results Visualization Confidence Analysis Visualization
Clustering Results
Repository
Figure 3.A Small Example
3.3.The Clustering Service
Prior to using our tool,a clustering driver class must be created and packaged as a Java Bean.The respon-sibility of this driver is to accept as input:(a)the file name of the MDG,(b)the name of an output file that the clustering driver is expected to produce,(c)the current run number and (d)the collection of the pa-rameters specified on the CRAFT user interface.The clustering driver encapsulates the algorithm(s)used to cluster the system.The input file is provided in our MDG file format,and the output file must adhere to our SIL file format.Furthermore,the clustering driver must extend the AnalysisTool.IClusterHelper in-terface,which is included in the distribution of our tool.The clustering analysis tool,programmer and user documentation,along with associated file format specifications can be obtained online at the Drexel Un-viersity Software Engineering Research Group (SERG)web page at s.drexel.edu/bunch/CRAFT .The need to create an external clustering driver is consistent with our philosophy of developing tools that can be used by other researchers in the Softwa
re En-gineering community.Our approach allows researchers to create clustering drivers in the Java programming language.Clustering tools not developed in Java can also be integrated into our framework using Java’s JNI [8]capability to wrap tools developed in other programming languages.For the convenience of the user,we have created and included two drivers with our tool.Both drivers are based on algorithms sup-ported by Bunch.Specifically:
•class bunchexper.ClusterHelper :This cluster-ing driver uses the Bunch NAHC clustering algo-rithm [12,11,14].All of the algorithms supported
by Bunch use a randomized optimization approach to form clusters.Thus,repeated runs of Bunch,with the same algorithm,rarely produce the exact same result.
•class bunchexper.ClusterHelperExt :Based on the number of clustering runs specified by the user,this driver applies the Bunch NAHC clustering al-gorithm 25%of the time,the Bunch SAHC algo-rithm 25%of the time,our generic hill climbing algorithm 40%of the time,and our Genetic Algo-rithm [5]10%of the time.The generic hill climbing algorithm can be tuned using configuration parameters to behave like any algorithm on the continuum between our NAHC and SAHC hill climbing algorithms [12].Overall,10of the Bunch clustering algorithms are included in the bunchexper.ClusterHelperExt driv
er.In addition to the NAHC,SAHC and Genetic algo-rithms,7different configurations of the generic hill climbing algorithm are used.
3.4.The Data Analysis Servicespringframework事务
Once the clustering activity finishes,the data asso-ciated with each clustering run is stored in the Cluster-ing Results Repository.Next,the data is processed by our analysis tools to support the construction of useful views of the data.Our goal is to consolidate the di-versity in the results produced by the set of clustering tools so that common patterns can be identified.This approach has shown that good decompositions can be produced in the presence of a set of clustering algo-rithms.Instead of relying on a benchmark decompo-sition to evaluate the result,the result produced by
CRAFT is likely to be good because the set of cluster-ing algorithms“reached a consensus”.
As mentioned above,thefirst step performed dur-ing data analysis is the consolidation of the data that has been saved in the clustering results repository.Let S={M1,M2,...M n}be the set of all modules in the system.For each distinct pair of modules(M i,M j), where1≤i,j≤|S|,and M i=M j,we calculate the frequency(α)that this pair appears in the same cluster.Given that the clustering driver executesΠindependent runs,the number of times that a pair of modules can appear in the same cluster
is bounded by 0≤α≤Π.The principle that guides our analysis is that the closerαis toΠ,the more likely that the pair of modules belongs to the same cluster.During the first step of the data analysis,the consolidated data is saved in the Clustering Results Repository so that it can be used by our data analysis tools.
Given a system consisting of n modules that has been clusteredΠtimes,letαi,j be the number of times that Modules M i and M j appear in the same clus-ter.We also define C[m1,m2,α]as the schema2for the data in the clustering repository that includes all of the{M i,M j,αi,j}triples.Each{M i,M j,αi,j}rela-tion represents the number of times that Module M i and Module M j appear in the same cluster.
For convenience we also define D to be a view on C such that all of the triples in D are sorted in descending order based on D[α].
Now that the above definitions have been presented, we turn our attention to describing the two data anal-ysis services provided by CRAFT.
3.4.1Confidence Analysis Tool
The Confidence Analysis Tool(CAT)processes the tu-ples in the Clustering Results Repository to produ
ce a reference decomposition of a software system.Letβrepresent a user-defined threshold value,which is set on the user interface(see the left side of Figure1). Given the ordered set of tuples in the Clustering Re-sults ,set D),the user specified thresh-oldβand the set of all of the modules in the system (i.e.,set S),we create the reference decomposition by applying Algorithm1.
As Algorithm1shows,thefirst step in the CAT pro-cess involves forming an initial cluster by taking the re-lation from D with the largestαvalue.The closure of the modules in this relation is then calculated and new modules are added to the cluster by searching for addi-2Given a relation c for the schema C,we use the nota-tion c[m1,m2]to represent the projection of m1and m2on C.For example,given the relation c={parser,scanner,10}, c[m1,m2]={parser,scanner}.tional relations in D such that one of the modules in the triple is already contained in the newly formed cluster, and theαvalue exceeds the user defined thresholdβ. Algorithm1:Confidence Analysis Algorithm
ConfidenceAnalysis(Set:D,S;Threshold:β) Let P be a new partition of the system.
foreach relation d∈D sorted
decreasing by d[α]where d[α]≥βdo
if(d[M1]or d[M2])is in S then
1.Create a new cluster C and add it to
partition P.Add the modules from rela-
tion d(d[M1]and/or d[M2])to C that are
in S.Remove the modules that have just
been added to the new cluster C from S.
2.CloseCluster:Search for additional
relations r∈D that have r[α]≥β.Se-
lect the relation with the highest r[α]value
such that one module from r is in C and
the other module from r is in S.Add the
module from r that is in S to C.Once
added to C,remove this module from S.
Repeat this process until no new modules
can be added to C.
end
end
foreach module s remaining in S do
Create a new cluster C and add it to P.
Add module s to cluster C.
end
return(partition P)
When adding modules to clusters,care is taken so that each module in set S is assigned to no more th
an one cluster,otherwise we would not have a valid par-tition of the MDG.After the closure of the cluster is calculated,a new cluster is created and the process repeats.Eventually,all modules that appear in rela-tions in D that have anαvalue that exceeds the user defined thresholdβwill be assigned to a cluster.In some cases,however,there may be modules for which no relation in D exists with a large enoughαvalue to be assigned to a cluster.In these instances,for each remaining unassigned module in S,a new cluster is created containing a single module.This condition is not an indication that the module belongs to a single-ton cluster,but instead is meant to indicate that the module is somewhat“unstable”in its placement,as it tends not to get assigned to any particular cluster.
Thefinal step performed by the CAT tool is to compare the decomposition that it produced with each of the decompositions that were generated during the clustering process.This is accomplished by measur-

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。