public class SQLContext extends Object implements Logging, scala.Serializable
As of Spark 2.0, this is replaced by SparkSession
. However, we are keeping the class
here for backward compatibility.
Modifier and Type | Class and Description |
---|---|
class |
SQLContext.implicits$
(Scala-specific) Implicit methods available in Scala for converting
common Scala objects into
DataFrame s. |
Modifier and Type | Method and Description |
---|---|
Dataset<Row> |
baseRelationToDataFrame(BaseRelation baseRelation)
Convert a
BaseRelation created for external data sources into a DataFrame . |
void |
cacheTable(String tableName)
Caches the specified table in-memory.
|
void |
clearCache()
Removes all cached tables from the in-memory cache.
|
Dataset<Row> |
createDataFrame(JavaRDD<?> rdd,
Class<?> beanClass)
Applies a schema to an RDD of Java Beans.
|
Dataset<Row> |
createDataFrame(JavaRDD<Row> rowRDD,
StructType schema)
|
Dataset<Row> |
createDataFrame(java.util.List<?> data,
Class<?> beanClass)
Applies a schema to a List of Java Beans.
|
Dataset<Row> |
createDataFrame(java.util.List<Row> rows,
StructType schema)
:: DeveloperApi ::
Creates a
DataFrame from a java.util.List containing Row s using the given schema. |
Dataset<Row> |
createDataFrame(RDD<?> rdd,
Class<?> beanClass)
Applies a schema to an RDD of Java Beans.
|
<A extends scala.Product> |
createDataFrame(RDD<A> rdd,
scala.reflect.api.TypeTags.TypeTag<A> evidence$1)
Creates a DataFrame from an RDD of Product (e.g.
|
Dataset<Row> |
createDataFrame(RDD<Row> rowRDD,
StructType schema)
|
<A extends scala.Product> |
createDataFrame(scala.collection.Seq<A> data,
scala.reflect.api.TypeTags.TypeTag<A> evidence$2)
Creates a DataFrame from a local Seq of Product.
|
<T> Dataset<T> |
createDataset(java.util.List<T> data,
Encoder<T> evidence$5)
Creates a
Dataset from a java.util.List of a given type. |
<T> Dataset<T> |
createDataset(RDD<T> data,
Encoder<T> evidence$4)
Creates a
Dataset from an RDD of a given type. |
<T> Dataset<T> |
createDataset(scala.collection.Seq<T> data,
Encoder<T> evidence$3)
Creates a
Dataset from a local Seq of data of a given type. |
void |
dropTempTable(String tableName)
Drops the temporary table with the given table name in the catalog.
|
Dataset<Row> |
emptyDataFrame()
Returns a
DataFrame with no rows or columns. |
ExperimentalMethods |
experimental()
:: Experimental ::
A collection of methods that are considered experimental, but can be used to hook into
the query planner for advanced functionality.
|
scala.collection.immutable.Map<String,String> |
getAllConfs()
Return all the configuration properties that have been set (i.e.
|
String |
getConf(String key)
Return the value of Spark SQL configuration property for the given key.
|
String |
getConf(String key,
String defaultValue)
Return the value of Spark SQL configuration property for the given key.
|
SQLContext.implicits$ |
implicits()
Accessor for nested Scala object
|
boolean |
isCached(String tableName)
Returns true if the table is currently cached in-memory.
|
ExecutionListenerManager |
listenerManager()
An interface to register custom
QueryExecutionListener s
that listen for execution metrics. |
SQLContext |
newSession()
Returns a
SQLContext as new session, with separated SQL configurations, temporary
tables, registered functions, but sharing the same SparkContext , cached data and
other things. |
Dataset<Row> |
range(long end)
Creates a
DataFrame with a single LongType column named id , containing elements
in a range from 0 to end (exclusive) with step value 1. |
Dataset<Row> |
range(long start,
long end)
Creates a
DataFrame with a single LongType column named id , containing elements
in a range from start to end (exclusive) with step value 1. |
Dataset<Row> |
range(long start,
long end,
long step)
Creates a
DataFrame with a single LongType column named id , containing elements
in a range from start to end (exclusive) with a step value. |
Dataset<Row> |
range(long start,
long end,
long step,
int numPartitions)
Creates a
DataFrame with a single LongType column named id , containing elements
in an range from start to end (exclusive) with an step value, with partition number
specified. |
DataFrameReader |
read()
Returns a
DataFrameReader that can be used to read non-streaming data in as a
DataFrame . |
DataStreamReader |
readStream()
Returns a
DataStreamReader that can be used to read streaming data in as a DataFrame . |
void |
setConf(java.util.Properties props)
Set Spark SQL configuration properties.
|
void |
setConf(String key,
String value)
Set the given Spark SQL configuration property.
|
SparkContext |
sparkContext() |
SparkSession |
sparkSession() |
Dataset<Row> |
sql(String sqlText)
Executes a SQL query using Spark, returning the result as a
DataFrame . |
StreamingQueryManager |
streams()
Returns a
StreamingQueryManager that allows managing all the
StreamingQueries active on this context. |
Dataset<Row> |
table(String tableName)
Returns the specified table as a
DataFrame . |
String[] |
tableNames()
Returns the names of tables in the current database as an array.
|
String[] |
tableNames(String databaseName)
Returns the names of tables in the given database as an array.
|
Dataset<Row> |
tables()
Returns a
DataFrame containing names of existing tables in the current database. |
Dataset<Row> |
tables(String databaseName)
Returns a
DataFrame containing names of existing tables in the given database. |
UDFRegistration |
udf()
A collection of methods for registering user-defined functions (UDF).
|
void |
uncacheTable(String tableName)
Removes the specified table from the in-memory cache.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
initializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public SQLContext.implicits$ implicits()
public SparkSession sparkSession()
public SparkContext sparkContext()
public SQLContext newSession()
SQLContext
as new session, with separated SQL configurations, temporary
tables, registered functions, but sharing the same SparkContext
, cached data and
other things.
public ExecutionListenerManager listenerManager()
QueryExecutionListener
s
that listen for execution metrics.public void setConf(java.util.Properties props)
props
- (undocumented)public void setConf(String key, String value)
key
- (undocumented)value
- (undocumented)public String getConf(String key)
key
- (undocumented)public String getConf(String key, String defaultValue)
defaultValue
.
key
- (undocumented)defaultValue
- (undocumented)public scala.collection.immutable.Map<String,String> getAllConfs()
public ExperimentalMethods experimental()
public Dataset<Row> emptyDataFrame()
DataFrame
with no rows or columns.
public UDFRegistration udf()
The following example registers a Scala closure as UDF:
sqlContext.udf.register("myUDF", (arg1: Int, arg2: String) => arg2 + arg1)
The following example registers a UDF in Java:
sqlContext.udf().register("myUDF",
(Integer arg1, String arg2) -> arg2 + arg1,
DataTypes.StringType);
public boolean isCached(String tableName)
tableName
- (undocumented)public void cacheTable(String tableName)
tableName
- (undocumented)public void uncacheTable(String tableName)
tableName
- (undocumented)public void clearCache()
public <A extends scala.Product> Dataset<Row> createDataFrame(RDD<A> rdd, scala.reflect.api.TypeTags.TypeTag<A> evidence$1)
rdd
- (undocumented)evidence$1
- (undocumented)public <A extends scala.Product> Dataset<Row> createDataFrame(scala.collection.Seq<A> data, scala.reflect.api.TypeTags.TypeTag<A> evidence$2)
data
- (undocumented)evidence$2
- (undocumented)public Dataset<Row> baseRelationToDataFrame(BaseRelation baseRelation)
BaseRelation
created for external data sources into a DataFrame
.
baseRelation
- (undocumented)public Dataset<Row> createDataFrame(RDD<Row> rowRDD, StructType schema)
DataFrame
from an RDD
containing Row
s using the given schema.
It is important to make sure that the structure of every Row
of the provided RDD matches
the provided schema. Otherwise, there will be runtime exception.
Example:
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val schema =
StructType(
StructField("name", StringType, false) ::
StructField("age", IntegerType, true) :: Nil)
val people =
sc.textFile("examples/src/main/resources/people.txt").map(
_.split(",")).map(p => Row(p(0), p(1).trim.toInt))
val dataFrame = sqlContext.createDataFrame(people, schema)
dataFrame.printSchema
// root
// |-- name: string (nullable = false)
// |-- age: integer (nullable = true)
dataFrame.createOrReplaceTempView("people")
sqlContext.sql("select name from people").collect.foreach(println)
rowRDD
- (undocumented)schema
- (undocumented)public <T> Dataset<T> createDataset(scala.collection.Seq<T> data, Encoder<T> evidence$3)
Dataset
from a local Seq of data of a given type. This method requires an
encoder (to convert a JVM object of type T
to and from the internal Spark SQL representation)
that is generally created automatically through implicits from a SparkSession
, or can be
created explicitly by calling static methods on Encoders
.
== Example ==
import spark.implicits._
case class Person(name: String, age: Long)
val data = Seq(Person("Michael", 29), Person("Andy", 30), Person("Justin", 19))
val ds = spark.createDataset(data)
ds.show()
// +-------+---+
// | name|age|
// +-------+---+
// |Michael| 29|
// | Andy| 30|
// | Justin| 19|
// +-------+---+
data
- (undocumented)evidence$3
- (undocumented)public <T> Dataset<T> createDataset(RDD<T> data, Encoder<T> evidence$4)
Dataset
from an RDD of a given type. This method requires an
encoder (to convert a JVM object of type T
to and from the internal Spark SQL representation)
that is generally created automatically through implicits from a SparkSession
, or can be
created explicitly by calling static methods on Encoders
.
data
- (undocumented)evidence$4
- (undocumented)public <T> Dataset<T> createDataset(java.util.List<T> data, Encoder<T> evidence$5)
Dataset
from a java.util.List
of a given type. This method requires an
encoder (to convert a JVM object of type T
to and from the internal Spark SQL representation)
that is generally created automatically through implicits from a SparkSession
, or can be
created explicitly by calling static methods on Encoders
.
== Java Example ==
List<String> data = Arrays.asList("hello", "world");
Dataset<String> ds = spark.createDataset(data, Encoders.STRING());
data
- (undocumented)evidence$5
- (undocumented)public Dataset<Row> createDataFrame(JavaRDD<Row> rowRDD, StructType schema)
DataFrame
from a JavaRDD
containing Row
s using the given schema.
It is important to make sure that the structure of every Row
of the provided RDD matches
the provided schema. Otherwise, there will be runtime exception.
rowRDD
- (undocumented)schema
- (undocumented)public Dataset<Row> createDataFrame(java.util.List<Row> rows, StructType schema)
DataFrame
from a java.util.List
containing Row
s using the given schema.
It is important to make sure that the structure of every Row
of the provided List matches
the provided schema. Otherwise, there will be runtime exception.
rows
- (undocumented)schema
- (undocumented)public Dataset<Row> createDataFrame(RDD<?> rdd, Class<?> beanClass)
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
rdd
- (undocumented)beanClass
- (undocumented)public Dataset<Row> createDataFrame(JavaRDD<?> rdd, Class<?> beanClass)
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
rdd
- (undocumented)beanClass
- (undocumented)public Dataset<Row> createDataFrame(java.util.List<?> data, Class<?> beanClass)
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
data
- (undocumented)beanClass
- (undocumented)public DataFrameReader read()
DataFrameReader
that can be used to read non-streaming data in as a
DataFrame
.
sqlContext.read.parquet("/path/to/file.parquet")
sqlContext.read.schema(schema).json("/path/to/file.json")
public DataStreamReader readStream()
DataStreamReader
that can be used to read streaming data in as a DataFrame
.
sparkSession.readStream.parquet("/path/to/directory/of/parquet/files")
sparkSession.readStream.schema(schema).json("/path/to/directory/of/json/files")
public void dropTempTable(String tableName)
tableName
- the name of the table to be unregistered.public Dataset<Row> range(long end)
DataFrame
with a single LongType
column named id
, containing elements
in a range from 0 to end
(exclusive) with step value 1.
end
- (undocumented)public Dataset<Row> range(long start, long end)
DataFrame
with a single LongType
column named id
, containing elements
in a range from start
to end
(exclusive) with step value 1.
start
- (undocumented)end
- (undocumented)public Dataset<Row> range(long start, long end, long step)
DataFrame
with a single LongType
column named id
, containing elements
in a range from start
to end
(exclusive) with a step value.
start
- (undocumented)end
- (undocumented)step
- (undocumented)public Dataset<Row> range(long start, long end, long step, int numPartitions)
DataFrame
with a single LongType
column named id
, containing elements
in an range from start
to end
(exclusive) with an step value, with partition number
specified.
start
- (undocumented)end
- (undocumented)step
- (undocumented)numPartitions
- (undocumented)public Dataset<Row> sql(String sqlText)
DataFrame
. The dialect that is
used for SQL parsing can be configured with 'spark.sql.dialect'.
sqlText
- (undocumented)public Dataset<Row> table(String tableName)
DataFrame
.
tableName
- (undocumented)public Dataset<Row> tables()
DataFrame
containing names of existing tables in the current database.
The returned DataFrame has two columns, tableName and isTemporary (a Boolean
indicating if a table is a temporary one or not).
public Dataset<Row> tables(String databaseName)
DataFrame
containing names of existing tables in the given database.
The returned DataFrame has two columns, tableName and isTemporary (a Boolean
indicating if a table is a temporary one or not).
databaseName
- (undocumented)public StreamingQueryManager streams()
StreamingQueryManager
that allows managing all the
StreamingQueries
active on this
context.
public String[] tableNames()
public String[] tableNames(String databaseName)
databaseName
- (undocumented)