Skip to main content
Version: v2

Scala Spark Client

support environment

Currently, Spark integration is only supported in On-Prem Cluster Mode and AWS environment

Installation

You can download the Jar from the link below

Quick Start

Setup Project

Folders

/canner-client-test
├──lib
│ └── canner-spark-client-assembly_<version>.jar
├── src/main/scala/test
│ └── QuickStart.scala
├── project
│ └── build.properties
└── build.sbt

Please make sure the downloaded Jar is placed under /lib

/build.sbt
name := "canner-client-test"

version := "0.1"

scalaVersion := "2.12.12"

mainClass in (Compile, run) := Some("test. QuickStart")

libraryDependencies += "com.canner" % "canner-spark-client" % "1.0.0" from "file:///<path-to-client-test>/canner-client-test/lib"

// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.5"

// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.5"

Count Sample Program

If you use Canner Enterprise distributed mode on AWS, you need to create an AWS User with permission to access Canner Enterprise's S3 Bucket, and then put the user's <AWS_ACCESS_KEY_ID>, <AWS_SECRET_ACCESS_KEY> into the code used in

If you use Standalone mode. Please ask the administrator to get <AWS_ACCESS_KEY_ID>, <AWS_SECRET_ACCESS_KEY>

/src/main/scala/test/QuickStart.scala
package test

import org.apache.spark.sql.SparkSession
import com.canner.Canner

object QuickStart {
val TOKEN = "<CANNER_PERSONAL_ACCESS_TOKEN>"

def main(args: Array[String]): Unit = {

val spark = SparkSession. builder()
// if you're using standalone, please specify the endpoint to object storage url.
// if you're using ssl and encounter certification issue, please add `-Dcom.amazonaws.sdk.disableCertChecking=true` to avoid this problem
//.config("spark.hadoop.fs.s3a.endpoint", "https://<my-endpoint-name>:9000")
.config("spark.hadoop.fs.s3a.access.key", "<AWS_ACCESS_KEY_ID>")
.config("spark.hadoop.fs.s3a.secret.key", "<AWS_SECRET_ACCESS_KEY>")
.config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem").master("local[*]").getOrCreate()

// entrypoint is the hostname of Canner
val entrypoint = "https://test.canner.your.host.com"
val workspace = "<workspace ID>"

val canner = Canner(entrypoint, TOKEN, workspace, spark)

// Executing a sql query on Canner and getting a `DataFrame` back
val df = canner.genQuery("select * from lineitem")

println("--------------------")
print("df count: " + df.count())
println("--------------------")

}
}