pyspark.pandas.Index.spark.transform#
- spark.transform(func)#
- Applies a function that takes and returns a Spark column. It allows natively applying a Spark function and column APIs with the Spark column internally used in Series or Index. The output length of the Spark column should be the same as input’s. - Note - It requires to have the same input and output length; therefore, the aggregate Spark functions such as count does not work. - Parameters
- funcfunction
- Function to use for transforming the data by using Spark columns. 
 
- Returns
- Series or Index
 
- Raises
- ValueErrorIf the output from the function is not a Spark column.
 
 - Examples - >>> from pyspark.sql.functions import log >>> df = ps.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, columns=["a", "b"]) >>> df a b 0 1 4 1 2 5 2 3 6 - >>> df.a.spark.transform(lambda c: log(c)) 0 0.000000 1 0.693147 2 1.098612 Name: a, dtype: float64 - >>> df.index.spark.transform(lambda c: c + 10) Index([10, 11, 12], dtype='int64') - >>> df.a.spark.transform(lambda c: c + df.b.spark.column) 0 5 1 7 2 9 Name: a, dtype: int64