Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.summation.com/llms.txt

Use this file to discover all available pages before exploring further.

The Databricks connector supports three connection modes. The form changes depending on which mode you pick — the common fields are always there, then a mode-specific section appears below.
ModeWhen to use it
SQL WarehouseQuery through a Databricks SQL Warehouse. Recommended for most users. See SQL warehouse types.
Spark ConnectQuery through an interactive cluster with Spark Connect. See Compute.
Delta LakeRead Delta tables directly from object storage (S3, Azure Blob, or GCS) without going through a Databricks cluster. See What is Delta Lake?.
For all modes, find your endpoint, warehouse ID, and cluster ID in the Databricks UI under Connection details for compute resources.

Common fields

These appear in every mode:
FieldRequiredStored asNotes
ModeYesConfigOne of SQL Warehouse, Spark Connect, Delta Lake.
EndpointYesConfigWorkspace hostname, e.g. dbc-a1b2345c-d6e7.cloud.databricks.com. Don’t include https://. See Connection details.
Use SSLYesConfigtrue or false. Almost always true.
AuthenticationYesConfigPersonal Access Token or Service Principal (see below).

Authentication

Choose one of the two:
Auth typeFieldsVendor docs
Personal Access TokenPersonal Access Token (starts with dapi, stored as a secret)Personal access token authentication
Service PrincipalClient ID (config), Client Secret (secret)Service principals for Databricks automation

Mode-specific fields

SQL Warehouse

FieldRequiredNotes
SQL Warehouse IDYesThe 16-character ID of the warehouse, e.g. 2b4e24cff378fb24. Find it under SQL Warehouses → your warehouse → Connection details, documented at Get connection details for a Databricks SQL warehouse.

Spark Connect

FieldRequiredNotes
Cluster IDYese.g. 1234-567890-abcde123. Find it under Compute → your cluster → ⋮ More → View JSON, or in the cluster URL. See Get connection details for a Databricks compute resource.

Delta Lake

In Delta Lake mode, Summation reads files directly from your object store. You’ll see one extra field plus an Object Store picker.
FieldRequiredNotes
Client TimeoutOptionale.g. 30s.
Object StoreYesAWS S3, Azure Blob, or Google Cloud Storage.
The fields shown after that depend on which object store you pick:

AWS S3

FieldRequiredStored asNotes
AWS RegionOptionalConfige.g. us-west-2. See AWS Regions and Zones.
AWS EndpointOptionalConfigCustom S3 endpoint, e.g. s3.us-west-2.amazonaws.com.
AWS Access Key IDYesSecretSee Managing access keys for IAM users.
AWS Secret Access KeyYesSecret
Allow HTTPOptionalConfigtrue or false. Default false.

Azure Blob

FieldRequiredStored asNotes
Azure Storage Account NameYesConfige.g. myaccount. See Storage account overview.
Azure Storage EndpointOptionalConfige.g. blob.core.windows.net.
Azure AuthenticationYesConfigAccount Key, Service Principal, or SAS Key. See Authorize access to Azure Blob Storage.
Azure Storage Account KeyIf Account KeySecretSee Manage account access keys.
Azure Storage Client IDIf Service PrincipalSecret
Azure Storage Client SecretIf Service PrincipalSecret
Azure Storage SAS KeyIf SAS KeySecretSee Grant limited access with SAS.

Google Cloud Storage

FieldRequiredStored asNotes
Google Service Account PathYesConfigPath to a service account JSON file, e.g. /path/to/service-account.json. See Create and delete service account keys.

Adding datasets

For SQL Warehouse and Spark Connect, browse Unity Catalog catalogs / schemas / tables. Source references use the form:
databricks:catalog.schema.table
For Delta Lake, point at s3://, abfss://, or gs:// Delta table paths directly.

Common problems

Error or symptomLikely cause
401 UnauthorizedPAT is expired, or the service principal doesn’t have workspace access. Regenerate or re-grant.
Cluster ... is not runningSpark Connect requires the cluster to be running. Use a SQL Warehouse for serverless / on-demand workloads.
Delta Lake mode can’t read filesStorage credentials or IAM are wrong. A Databricks PAT alone isn’t enough — the storage policy must permit reads of the path.