Skip to main content
Version: v2

Scala Spark Client

support environment

Currently, Spark integration is only supported in On-Prem Cluster Mode and AWS environment


You can download the Jar from the link below

Quick Start

Setup Project


│ └── canner-spark-client-assembly_<version>.jar
├── src/main/scala/test
│ └── QuickStart.scala
├── project
│ └──
└── build.sbt

Please make sure the downloaded Jar is placed under /lib

name := "canner-client-test"

version := "0.1"

scalaVersion := "2.12.12"

mainClass in (Compile, run) := Some("test. QuickStart")

libraryDependencies += "com.canner" % "canner-spark-client" % "1.0.0" from "file:///<path-to-client-test>/canner-client-test/lib"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.5"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.5"

Count Sample Program

If you use Canner Enterprise distributed mode on AWS, you need to create an AWS User with permission to access Canner Enterprise's S3 Bucket, and then put the user's <AWS_ACCESS_KEY_ID>, <AWS_SECRET_ACCESS_KEY> into the code used in

If you use Standalone mode. Please ask the administrator to get <AWS_ACCESS_KEY_ID>, <AWS_SECRET_ACCESS_KEY>

package test

import org.apache.spark.sql.SparkSession
import com.canner.Canner

object QuickStart {

def main(args: Array[String]): Unit = {

val spark = SparkSession. builder()
// if you're using standalone, please specify the endpoint to object storage url.
// if you're using ssl and encounter certification issue, please add `-Dcom.amazonaws.sdk.disableCertChecking=true` to avoid this problem
//.config("spark.hadoop.fs.s3a.endpoint", "https://<my-endpoint-name>:9000")
.config("spark.hadoop.fs.s3a.access.key", "<AWS_ACCESS_KEY_ID>")
.config("spark.hadoop.fs.s3a.secret.key", "<AWS_SECRET_ACCESS_KEY>")
.config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem").master("local[*]").getOrCreate()

// entrypoint is the hostname of Canner
val entrypoint = ""
val workspace = "<workspace ID>"

val canner = Canner(entrypoint, TOKEN, workspace, spark)

// Executing a sql query on Canner and getting a `DataFrame` back
val df = canner.genQuery("select * from lineitem")

print("df count: " + df.count())