- GluTheos: Automating the Retrieval and Analysis of Data from Publicly Available Software Repositories by Robles, Gonzalez-Barahona, Ghosh
For efficient, large scale data mining of publicly available information about libre (free, open source) software projects, automating the retrieval and analysis processes is a must. A system implementing such automation must have into account the many kinds of repositories with interesting information (each with its own structure and access methods), and the many kinds of analysis which can be applied to the retrieved data. In addition, such a system should be capable of interfacing and reusing as much existing software for both retrieving and analyzing data as possible.
As a proof of concept of how that system could be, we started sometime ago to implement the GlueTheos system, featuring a modular,flexible architecture which has been already used in several of our studies of libre software projects. In this paper we show its structure, how it can be used, and how it can be extended.
- Using CVS historical information to understand how students develop software by Liu, Stroulia, Wong, German
Software engineering courses are expected to teach students a wide
range of knowledge, e.g. software development methodologies, tools,
work habits, collaboration skills, and a good sense of scheduling,
etc. In this paper, we present a method to track the progress of the
students in the development of a term project using the historical
information stored in their CVS repository. This information is
analyzed and presented to the instructor in a variety of forms. The
goal of this analysis is, first, to understand how students interact,
and second, to find out if there is any correlation between their
grades and the nature of their collaboration. Understanding these
factors will allow instructors to detect potential problems early in
the course, so they can concentrate their help in those teams who need
it the most.
Database Techniques for the Analysis and Exploration of Software Repositories by Alonso, Devanbu, Gertz
In a typical software engineering project there is a large and diverse
body of documents that a development team produces, including
requirement documents, specifications, designs, code, and bug reports.
Documents have different formats and are managed in several repositories. The heterogeneity among document formats and the diversity of repositories make it often not feasible to query and explore the repositories in a transparent fashion during the phases of the software development process.
In this paper, we present a framework for the analysis and exploration of software repositories. Our approach applies database techniques to integrate and manage different documents produced by a team. Tools that exploit the database functionality then allow for the processing of complex queries against a document collection to extract trends and analyze correlations, which provide important insights into the software development process.
We present a prototype implementation using the Apache Web-server project as a case study
- Empirical Project Monitor: A Tool for Mining Multiple Project Data by Ohira, Yokomori, Sakai, Matsumoto, Inoue, Torii
Project management for effective software process improvement must be achieved based on quantitative data. However, because data collection for measurement requires high costs and collaboration with developers, it is difficult to collect coherent, quantitative data continuously and to utilize the data for practicing software process improvement. In this paper, we describe Empirical Project Monitor (EPM) which automatically collects and measures data from three kinds of repositories in widely used software development support systems such as configuration management systems, mailing list managers and issue tracking systems. Providing integrated measurement results graphically, EPM helps developers/managers keep projects under control in real time.