Rand index
From Wikipedia, the free encyclopedia
The Rand index or Rand measure is a technique for measure of similarity between two data clusters.
[edit] Definition
Given a set of n objects S = {O1, ..., On} and two data clusters of S which we want to compare: X = {x1, ..., xR} and Y = {y1, ..., yS} where the different partitions of X and Y are disjoint and their union is equal to S; we can compute the following values:
- a is the number of elements in S that are in the same partition in X and in the same partition in Y,
- b is the number of elements in S that are not in the same partition in X and not in the same partition in Y,
- c is the number of elements in S that are in the same partition in X and not in the same partition in Y,
- d is the number of elements in S that are not in the same partition in X but are in the same partition in Y.
Intuitively, one can think of a + b as the number of agreements between X and Y and c + d the number of disagreements between X and Y. The rand index, R, then becomes,
The rand index has a value between 0 and 1 with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.
[edit] References
- W. M. Rand, Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, pp846–850 (1971).
- K. Y. Yeung, W. L. Ruzzo, Details of the Adjusted Rand index and Clustering algorithms, Bioinformatics. [1]