> ## Documentation Index
> Fetch the complete documentation index at: https://docs.summation.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks

> Connect Summation to Databricks via SQL Warehouse, Spark Connect, or direct Delta Lake.

The Databricks connector supports three connection **modes**. The form changes depending on which mode you pick — the common fields are always there, then a mode-specific section appears below.

| Mode              | When to use it                                                                                                                                                                            |
| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **SQL Warehouse** | Query through a Databricks SQL Warehouse. **Recommended for most users.** See [SQL warehouse types](https://docs.databricks.com/aws/en/compute/sql-warehouse/).                           |
| **Spark Connect** | Query through an interactive cluster with Spark Connect. See [Compute](https://docs.databricks.com/aws/en/compute/).                                                                      |
| **Delta Lake**    | Read Delta tables directly from object storage (S3, Azure Blob, or GCS) without going through a Databricks cluster. See [What is Delta Lake?](https://docs.databricks.com/aws/en/delta/). |

For all modes, find your endpoint, warehouse ID, and cluster ID in the Databricks UI under [Connection details for compute resources](https://docs.databricks.com/aws/en/integrations/compute-details).

## Common fields

These appear in every mode:

| Field              | Required | Stored as | Notes                                                                                                                                                                                   |
| ------------------ | -------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Mode**           | Yes      | Config    | One of *SQL Warehouse*, *Spark Connect*, *Delta Lake*.                                                                                                                                  |
| **Endpoint**       | Yes      | Config    | Workspace hostname, e.g. `dbc-a1b2345c-d6e7.cloud.databricks.com`. Don't include `https://`. See [Connection details](https://docs.databricks.com/aws/en/integrations/compute-details). |
| **Use SSL**        | Yes      | Config    | `true` or `false`. Almost always `true`.                                                                                                                                                |
| **Authentication** | Yes      | Config    | *Personal Access Token* or *Service Principal* (see below).                                                                                                                             |

### Authentication

Choose one of the two:

| Auth type                 | Fields                                                             | Vendor docs                                                                                                 |
| ------------------------- | ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- |
| **Personal Access Token** | **Personal Access Token** (starts with `dapi`, stored as a secret) | [Personal access token authentication](https://docs.databricks.com/aws/en/dev-tools/auth/pat)               |
| **Service Principal**     | **Client ID** (config), **Client Secret** (secret)                 | [Service principals for Databricks automation](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-m2m) |

## Mode-specific fields

### SQL Warehouse

| Field                | Required | Notes                                                                                                                                                                                                                                                                                                                                |
| -------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **SQL Warehouse ID** | Yes      | The 16-character ID of the warehouse, e.g. `2b4e24cff378fb24`. Find it under **SQL Warehouses → your warehouse → Connection details**, documented at [Get connection details for a Databricks SQL warehouse](https://docs.databricks.com/aws/en/integrations/compute-details#get-connection-details-for-a-databricks-sql-warehouse). |

### Spark Connect

| Field          | Required | Notes                                                                                                                                                                                                                                                                                                        |
| -------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Cluster ID** | Yes      | e.g. `1234-567890-abcde123`. Find it under **Compute → your cluster → ⋮ More → View JSON**, or in the cluster URL. See [Get connection details for a Databricks compute resource](https://docs.databricks.com/aws/en/integrations/compute-details#get-connection-details-for-a-databricks-compute-resource). |

### Delta Lake

In Delta Lake mode, Summation reads files directly from your object store. You'll see one extra field plus an **Object Store** picker.

| Field              | Required | Notes                                              |
| ------------------ | -------- | -------------------------------------------------- |
| **Client Timeout** | Optional | e.g. `30s`.                                        |
| **Object Store**   | Yes      | *AWS S3*, *Azure Blob*, or *Google Cloud Storage*. |

The fields shown after that depend on which object store you pick:

#### AWS S3

| Field                     | Required | Stored as | Notes                                                                                                                                     |
| ------------------------- | -------- | --------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| **AWS Region**            | Optional | Config    | e.g. `us-west-2`. See [AWS Regions and Zones](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html). |
| **AWS Endpoint**          | Optional | Config    | Custom S3 endpoint, e.g. `s3.us-west-2.amazonaws.com`.                                                                                    |
| **AWS Access Key ID**     | Yes      | Secret    | See [Managing access keys for IAM users](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html).               |
| **AWS Secret Access Key** | Yes      | Secret    |                                                                                                                                           |
| **Allow HTTP**            | Optional | Config    | `true` or `false`. Default `false`.                                                                                                       |

#### Azure Blob

| Field                           | Required             | Stored as | Notes                                                                                                                                                                                   |
| ------------------------------- | -------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Azure Storage Account Name**  | Yes                  | Config    | e.g. `myaccount`. See [Storage account overview](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview).                                                      |
| **Azure Storage Endpoint**      | Optional             | Config    | e.g. `blob.core.windows.net`.                                                                                                                                                           |
| **Azure Authentication**        | Yes                  | Config    | *Account Key*, *Service Principal*, or *SAS Key*. See [Authorize access to Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-data-operations-portal). |
| **Azure Storage Account Key**   | If Account Key       | Secret    | See [Manage account access keys](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage).                                                                   |
| **Azure Storage Client ID**     | If Service Principal | Secret    |                                                                                                                                                                                         |
| **Azure Storage Client Secret** | If Service Principal | Secret    |                                                                                                                                                                                         |
| **Azure Storage SAS Key**       | If SAS Key           | Secret    | See [Grant limited access with SAS](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview).                                                                       |

#### Google Cloud Storage

| Field                           | Required | Stored as | Notes                                                                                                                                                                          |
| ------------------------------- | -------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Google Service Account Path** | Yes      | Config    | Path to a service account JSON file, e.g. `/path/to/service-account.json`. See [Create and delete service account keys](https://cloud.google.com/iam/docs/keys-create-delete). |

## Adding datasets

For SQL Warehouse and Spark Connect, browse Unity Catalog catalogs / schemas / tables. Source references use the form:

```
databricks:catalog.schema.table
```

For Delta Lake, point at `s3://`, `abfss://`, or `gs://` Delta table paths directly.

## Common problems

| Error or symptom                 | Likely cause                                                                                                                  |
| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
| `401 Unauthorized`               | PAT is expired, or the service principal doesn't have workspace access. Regenerate or re-grant.                               |
| `Cluster ... is not running`     | Spark Connect requires the cluster to be running. Use a SQL Warehouse for serverless / on-demand workloads.                   |
| Delta Lake mode can't read files | Storage credentials or IAM are wrong. A Databricks PAT alone isn't enough — the storage policy must permit reads of the path. |
