public class MultivariateOnlineSummarizer extends java.lang.Object implements MultivariateStatisticalSummary, scala.Serializable
MultivariateStatisticalSummary
to compute the mean,
variance, minimum, maximum, counts, and nonzero counts for instances in sparse or dense vector
format in a online fashion.
Two MultivariateOnlineSummarizer can be merged together to have a statistical summary of the corresponding joint dataset.
A numerically stable algorithm is implemented to compute the mean and variance of instances:
Reference: variance-wiki
Zero elements (including explicit zero values) are skipped when calling add(),
to have time complexity O(nnz) instead of O(n) for each column.
For weighted instances, the unbiased estimation of variance is defined by the reliability
weights: https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Reliability_weights
.
Constructor and Description |
---|
MultivariateOnlineSummarizer() |
Modifier and Type | Method and Description |
---|---|
MultivariateOnlineSummarizer |
add(Vector sample)
Add a new sample to this summarizer, and update the statistical summary.
|
long |
count()
Sample size.
|
Vector |
max()
Maximum value of each dimension.
|
Vector |
mean()
Sample mean of each dimension.
|
MultivariateOnlineSummarizer |
merge(MultivariateOnlineSummarizer other)
Merge another MultivariateOnlineSummarizer, and update the statistical summary.
|
Vector |
min()
Minimum value of each dimension.
|
Vector |
normL1()
L1 norm of each dimension.
|
Vector |
normL2()
L2 (Euclidian) norm of each dimension.
|
Vector |
numNonzeros()
Number of nonzero elements in each dimension.
|
Vector |
variance()
Unbiased estimate of sample variance of each dimension.
|
public MultivariateOnlineSummarizer add(Vector sample)
sample
- The sample in dense/sparse vector format to be added into this summarizer.public MultivariateOnlineSummarizer merge(MultivariateOnlineSummarizer other)
this
object will be modified.)
other
- The other MultivariateOnlineSummarizer to be merged.public Vector mean()
mean
in interface MultivariateStatisticalSummary
public Vector variance()
variance
in interface MultivariateStatisticalSummary
public long count()
count
in interface MultivariateStatisticalSummary
public Vector numNonzeros()
numNonzeros
in interface MultivariateStatisticalSummary
public Vector max()
max
in interface MultivariateStatisticalSummary
public Vector min()
min
in interface MultivariateStatisticalSummary
public Vector normL2()
normL2
in interface MultivariateStatisticalSummary
public Vector normL1()
normL1
in interface MultivariateStatisticalSummary