pyspark.sql.functions.tuple_sketch_estimate_double#
- pyspark.sql.functions.tuple_sketch_estimate_double(col)[source]#
Returns the estimated number of distinct keys from a Datasketches TupleSketch with double summaries.
New in version 4.2.0.
- Parameters
- col
Columnor column name The column containing a binary TupleSketch representation
- col
- Returns
ColumnThe estimated cardinality.
See also
Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1, 10.0), (2, 20.0), (2, 30.0)], ["key", "value"]) >>> df.agg(sf.tuple_sketch_estimate_double( ... sf.tuple_sketch_agg_double("key", "value"))).show() +--------------------------------------------------------------------------+ |tuple_sketch_estimate_double(tuple_sketch_agg_double(key, value, 12, sum))| +--------------------------------------------------------------------------+ | 2.0| +--------------------------------------------------------------------------+