object ChiSquareTest
Chi-square hypothesis testing for categorical data.
See Wikipedia for more information on the Chi-squared test.
- Annotations
- @Since("2.2.0")
- Source
- ChiSquareTest.scala
- Alphabetic
- By Inheritance
- ChiSquareTest
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def test(dataset: DataFrame, featuresCol: String, labelCol: String, flatten: Boolean): DataFrame
- dataset
DataFrame of categorical labels and categorical features. Real-valued features will be treated as categorical for each distinct value.
- featuresCol
Name of features column in dataset, of type
Vector
(VectorUDT
)- labelCol
Name of label column in dataset, of any numerical type
- flatten
If false, the returned DataFrame contains only a single Row, otherwise, one row per feature.
- Annotations
- @Since("3.1.0")
- def test(dataset: DataFrame, featuresCol: String, labelCol: String): DataFrame
Conduct Pearson's independence test for every feature against the label.
Conduct Pearson's independence test for every feature against the label. For each feature, the (feature, label) pairs are converted into a contingency matrix for which the Chi-squared statistic is computed. All label and feature values must be categorical.
The null hypothesis is that the occurrence of the outcomes is statistically independent.
- dataset
DataFrame of categorical labels and categorical features. Real-valued features will be treated as categorical for each distinct value.
- featuresCol
Name of features column in dataset, of type
Vector
(VectorUDT
)- labelCol
Name of label column in dataset, of any numerical type
- returns
DataFrame containing the test result for every feature against the label. This DataFrame will contain a single Row with the following fields:
pValues: Vector
degreesOfFreedom: Array[Int]
statistics: Vector
Each of these fields has one value per feature.
- Annotations
- @Since("2.2.0")
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)