Python SDK
Installation
pip install canner-python-client
Constructor
- Normal Python Env (Standalone Mode)
- Normal Python Env (Cluster Mode)
client = canner.client.bootstrap(
endpoint='https://web.default.myname.apps.cannerdata.com/web',
workspace_id='444e8753-a4c0-4875-bdc0-834c79061d56',
token='Y2xpZW50XzA0OTgzODM4LWNhZjktNGNmZi1hNDA4LWFkZDY3ZDc5MjIxNjo2N2YyNGY5OWEzYjFiZTEyZTg2MDI2MmMzNGQzZDRiYQ=='
)
| Name | Type | Description |
|---|---|---|
| endpoint | string | Part of Canner Enterprise's URL to /web, eg: https://web.default.myname.apps.cannerdata.com/web. |
| token | string | Personal access token, please refer to Get Personal Access Token |
| workspace_id | string | Each workspace will have a unique ID, which will be displayed in the URL of the workspace. For example https://web.default.myname.apps.cannerdata.com/web/workspaces/444e8753-a4c0-4875-bdc0-834c79061d56 |
client = canner.client.bootstrap(
endpoint='https://web.default.myname.apps.cannerdata.com',
workspace_id='444e8753-a4c0-4875-bdc0-834c79061d56',
token='Y2xpZW50XzA0OTgzODM4LWNhZjktNGNmZi1hNDA4LWFkZDY3ZDc5MjIxNjo2N2YyNGY5OWEzYjFiZTEyZTg2MDI2MmMzNGQzZDRiYQ=='
)
| Name | Type | Description |
|---|---|---|
| endpoint | string | Canner Enterprise URL, eg: https://web.default.myname.apps.cannerdata.com. |
| token | string | Personal access token, please refer to Get Personal Access Token |
| workspace_id | string | Each workspace will have a unique ID, which will display in the workspace URL. For example https://web.default.myname.apps.cannerdata.com/workspaces/444e8753-a4c0-4875-bdc0-834c79061d56 |
Saved SQL Operations
List Saved Query
list_saved_query(): Array<string>Lists all the titles of stored SQL statements in a workspace.
queries = client.list_saved_query()
print(queries)
# ['select_users1', 'select_users2']
Use Saved Query
use_saved_query(title: string, data_format: data_format, cache_refresh: boolean, cache_ttl: number): QueryAfter the execution is completed, the returned object is Query, which provides different methods to operate on the data. For example, You can get all the data through
query.get_all().title: The title of the SQL Query to executedata_format: Determine the data format returned by the query. Currently, there are three types:list,df,np, and the default islist.cache_refresh:TrueorFalse, the default isFalse. Whether to update the existing cache data, if you want to get cache data, this option should set toFalse.cache_ttl: The number of seconds, the earliest allowed to cache data a few seconds ago, the default is86400(allow cache within one day). Only valid whencache_refresh=False.queries = client.list_saved_query()
# ['select_users1', 'select_users2']
query = client.use_saved_query(queries[0], data_format='list')
query. wait_for_finish()
data = query. get_all()
Query Operations
Generate Query
gen_query(sql: string, data_format: data_format): QueryAfter the execution is completed, the returned object is Query, which provides different methods to operate on the data. For example, you can get all the data through
query.get_all().sql: the SQL statement to executedata_format: Determine the data format returned by the query. Currently, there are three types:list,df,np, and the default islist.query = client.gen_query('select * from canner.myworkspace.users', data_format='list')
query. wait_for_finish()
data = query. get_all()
Query
Query is the object returned by client.gen_query and client.use_saved_query.
example: use_saved_query
queries = client.list_saved_query()
# ['select_users1', 'select_users2']
query = client. use_saved_query(queries[0])
query. wait_for_finish()
data = query. get_all()
example: gen_query
query = client.gen_query('select * from canner.myworkspace.users')
query. wait_for_finish()
data = query. get_all()
Properties
columnscolumnsinformationquery.columns
# [{'name': 'status',
# 'type': 'varchar',
# 'typeSignature': {'rawType': 'varchar',
# 'typeArguments': [],
# 'literalArguments': [],
# 'arguments': [{'kind': 'LONG_LITERAL', 'value': 2147483647}]}
# }]row_countTotal data
query.row_count
#100
Get Data
wait_for_finish(timeout=5, period=3)Executing SQL is asynchronous. Before obtaining data or any result-related information such as
columns,row_count, etc., you must first ensure that the SQL execution is completed.timeout: The number of seconds, the maximum waiting time.period: The number of seconds, every few seconds to check whether the Query status is complete.
get_all()Return all data. When Data Format is
listordf, the columns will be included.get_first(limit: number)Return the previous data, the default is one. When Data Format is
listordf, the columns will be included.get_last(limit: number)Return the last few data, the default is one. When Data Format is
listordf, the columns will be included.get(limit: number, offset: number)Return partial data according to the given
limitandoffset. When Data Format islistordf, the columns will be included.
Data Format
There are currently 3 data formats available
- list:
Python list, the default value - df:
DataFrame - np:
Numpy.ndarray
You can specify it in client.gen_query or client.use_saved_query
query = client.use_saved_query('query_name', data_format="list")
print(type(query. get_first()))
# <class 'list'>
query = client. use_saved_query('query_name', data_format="df")
print(type(query. get_first()))
# <class 'pandas. core. frame. DataFrame'>
query = client. use_saved_query('query_name', data_format="np")
print(type(query. get_first()))
# <class 'numpy.ndarray'>
It is also possible to change query.data_format at any time to use a different data_format.
query = client. use_saved_query('query_name')
query.data_format = 'list'
print(type(query. get_first()))
# <class 'list'>
query.data_format = 'df'
print(type(query. get_first()))
# <class 'pandas. core. frame. DataFrame'>
query.data_format = 'np'
print(type(query. get_first()))
# <class 'numpy.ndarray'>