With the bug prediction dataset it is possible to use (or to compute) a number of metrics which can be used to create generalized linear regression models which predict, at the class level, the number of post-release defects. The performances of these models can then be evaluated by comparing the prediction results agaist the actual post-release defects provided as part of the dataset.
In particular, it is possible to use/compute the following metrics to use as predictors, or to design and compute novel ones:
- Change metrics (from CVS change logs), as proposed by Moser et. al.
- CK metrics, as proposed by Basili et. al.
- Object oriented metrics (e.g. number of methods, number of attributes, etc).
- Number of previous defects, as proposed by Kim et. al.
- Complexity of code change, as proposed by Hassan
- Churn of CK and object oriented metrics, as proposed by D'Ambros et. al.
- Entropy of CK and object oriented metrics, as proposed by D'Ambros et. al.
All the listed defect prediction techniques, and their application on the bug prediction dataset, are described in details in the paper:
An Extensive Comparison of Bug Prediction Approaches
Marco D'Ambros, Michele Lanza, Romain Robbes
In Proceedings of MSR 2010 (7th IEEE Working Conference on Mining Software Repositories), to be published. IEEE CS Press, 2010.