public class BisectingKMeans
extends Object
k
leaf clusters in total or no leaf clusters are divisible.
The bisecting steps of clusters on the same level are grouped together to increase parallelism.
If bisecting all divisible clusters on the bottom level would result more than k
leaf clusters,
larger clusters get higher priority.
param: k the desired number of leaf clusters (default: 4). The actual number could be smaller if there are no divisible leaf clusters. param: maxIterations the max number of k-means iterations to split clusters (default: 20) param: minDivisibleClusterSize the minimum number of points (if >= 1.0) or the minimum proportion of points (if < 1.0) of a divisible cluster (default: 1) param: seed a random seed (default: hash value of the class name)
http://glaros.dtc.umn.edu/gkhome/fetch/papers/docclusterKDDTMW00.pdf
Steinbach, Karypis, and Kumar, A comparison of document clustering techniques,
KDD Workshop on Text Mining, 2000.}
Constructor and Description |
---|
BisectingKMeans()
Constructs with the default configuration
|
Modifier and Type | Method and Description |
---|---|
int |
getK()
Gets the desired number of leaf clusters.
|
int |
getMaxIterations()
Gets the max number of k-means iterations to split clusters.
|
double |
getMinDivisibleClusterSize()
Gets the minimum number of points (if >=
1.0 ) or the minimum proportion of points
(if < 1.0 ) of a divisible cluster. |
long |
getSeed()
Gets the random seed.
|
BisectingKMeansModel |
run(JavaRDD<Vector> data)
Java-friendly version of
run() . |
BisectingKMeansModel |
run(RDD<Vector> input)
Runs the bisecting k-means algorithm.
|
BisectingKMeans |
setK(int k)
Sets the desired number of leaf clusters (default: 4).
|
BisectingKMeans |
setMaxIterations(int maxIterations)
Sets the max number of k-means iterations to split clusters (default: 20).
|
BisectingKMeans |
setMinDivisibleClusterSize(double minDivisibleClusterSize)
Sets the minimum number of points (if >=
1.0 ) or the minimum proportion of points
(if < 1.0 ) of a divisible cluster (default: 1). |
BisectingKMeans |
setSeed(long seed)
Sets the random seed (default: hash value of the class name).
|
public BisectingKMeans()
public BisectingKMeans setK(int k)
k
- (undocumented)public int getK()
public BisectingKMeans setMaxIterations(int maxIterations)
maxIterations
- (undocumented)public int getMaxIterations()
public BisectingKMeans setMinDivisibleClusterSize(double minDivisibleClusterSize)
1.0
) or the minimum proportion of points
(if < 1.0
) of a divisible cluster (default: 1).minDivisibleClusterSize
- (undocumented)public double getMinDivisibleClusterSize()
1.0
) or the minimum proportion of points
(if < 1.0
) of a divisible cluster.public BisectingKMeans setSeed(long seed)
seed
- (undocumented)public long getSeed()
public BisectingKMeansModel run(RDD<Vector> input)
input
- RDD of vectorspublic BisectingKMeansModel run(JavaRDD<Vector> data)
run()
.data
- (undocumented)