Databricks

Overview

Databricks streams logs from a Praxis collector into a Databricks Unity Catalog table by:

Uploading batches of log records as files (default: NDJSON gzip) into a Unity Catalog volume.
Triggering a COPY INTO statement against a SQL warehouse to ingest those files into a target table.

The destination is logs-first; the table schema is owned by the user. The COPY INTO step is optional — set copy.mode=upload_only to keep just the file-drop step (useful when an external Databricks Workflow / DLT pipeline does the load).

Supported types: Logs

Authentication

Credential type	Token kind	When to use
Databricks PAT (`databrickspat`)	Workspace user PAT, sent as `Authorization: Bearer <pat>`	Quick start; fine for non-prod / single-user installs.
OAuth2 (`oauth2`)	Service-principal client credentials grant against the workspace	Production; use a service principal so the token is rotatable and the SP can be granted least-privilege Unity Catalog permissions.

The exporter rejects any other credential type at config time.

Basic Configuration

Parameter	Type	Default	Required	Description
`endpoint`	string	—	Yes	Workspace URL, e.g. `https://adb-1234567890.0.azuredatabricks.net`.
`warehouse_id`	string	—	Yes (unless `copy.mode=upload_only`)	SQL Warehouse ID that runs the `COPY INTO`. Find it under SQL Warehouses → ⓘ.
`target.catalog`	string	—	Yes	Target Unity Catalog catalog.
`target.schema`	string	—	Yes	Target schema within the catalog.
`target.table`	string	—	Yes	Target table. The exporter does not issue DDL — pre-create the table with the schema you want.
`volume.catalog`	string	—	Yes	Catalog holding the staging volume.
`volume.schema`	string	—	Yes	Schema holding the staging volume.
`volume.name`	string	—	Yes	Unity Catalog volume name where uploaded files land. The exporter does not create the volume — pre-create it.
`volume.subpath_template`	string	—	No	Subdirectory template under the volume root. Supports time placeholders (e.g. `dt=%Y-%m-%d/hr=%H/`) for partition-friendly layout.

File format

Parameter	Type	Default	Description
`file.format`	string	`JSON` (NDJSON)	Format of uploaded files.
`file.compression`	string	`gzip`	File compression. Use `none` to disable.

COPY INTO control

Parameter	Type	Default	Description
`copy.mode`	string	`auto`	One of `auto`, `external`, `upload_only`. `auto` runs `COPY INTO` after every batch upload. `external` skips the COPY (some other process loads). `upload_only` skips both COPY and the warehouse — files just land in the volume.
`copy.interval_seconds`	int	exporter default	Minimum interval between `COPY INTO` triggers. Coalesces multiple uploads into one COPY.
`copy.on_error`	string	exporter default	Pass-through to `COPY INTO ... ON_ERROR`. Values per Databricks SQL docs (`abort`, `continue`).
`copy.wait_timeout`	int	exporter default	Seconds to wait for the COPY statement to complete.

Advanced

Parameter	Type	Default	Description
`advanced.max_batch_rows`	int	exporter default	Cap on records per uploaded file.
`advanced.max_batch_bytes`	int	exporter default	Cap on uncompressed bytes per uploaded file. Larger files are split.
`advanced.timeout`	int	exporter default	HTTP request timeout in seconds (for both file upload and COPY).
`advanced.retry_on_failure`	bool	`true`	Enable automatic retries on send failures.
`advanced.backpressure_queue`	bool	`true`	Enable a sending queue (with optional disk backing).

TLS (`advanced.tls`)

Parameter	Type	Default	Description
`insecure_skip_verify`	bool	`false`	Skip TLS server cert verification. Not for production.
`ca_file`	string	—	Custom CA bundle for the workspace endpoint.

Retry settings (`advanced.retry_on_failure_settings`)

Parameter	Type	Default	Description
`initial_interval`	int	exporter default	Initial backoff in seconds.
`max_interval`	int	exporter default	Maximum backoff in seconds.
`max_time_elapsed`	int	exporter default	Maximum total seconds spent retrying a batch.

Backpressure queue settings (`advanced.backpressure_queue_settings`)

Parameter	Type	Default	Description
`queue_size`	int	exporter default	Maximum buffered batches.
`number_of_consumers`	int	exporter default	Worker count draining the queue.
`enable_disk_backed_queue`	bool	`true`	Persist the queue to a `file_storage` extension at `${COL_HOME}/metadata/databricks/<node_name>` so it survives collector restarts. Default flipped to `true` in v0.3.

Pre-flight: catalog / schema / volume / table

-- Run as a workspace admin
USE CATALOG <catalog>;
USE SCHEMA <schema>;

CREATE VOLUME IF NOT EXISTS <volume_name>;

CREATE TABLE IF NOT EXISTS <table_name> (
  timestamp           TIMESTAMP,
  observed_timestamp  TIMESTAMP,
  severity            STRING,
  body                VARIANT,
  attributes          VARIANT,
  resource_attributes VARIANT,
  scope_name          STRING,
  trace_id            STRING,
  span_id             STRING
);

-- Service principal needs WRITE on the volume and INSERT on the table
GRANT WRITE VOLUME ON VOLUME <catalog>.<schema>.<volume_name>      TO `<service_principal>`;
GRANT MODIFY                ON TABLE  <catalog>.<schema>.<table_name> TO `<service_principal>`;
GRANT USAGE                 ON SCHEMA <catalog>.<schema>             TO `<service_principal>`;
GRANT USE_CATALOG           ON CATALOG <catalog>                     TO `<service_principal>`;

Example Configuration

{
  "endpoint": "https://adb-1234567890.0.azuredatabricks.net",
  "warehouse_id": "1234abcd56ef7890",

  "target": {
    "catalog": "telemetry",
    "schema": "logs",
    "table": "praxis_logs",
  },

  "volume": {
    "catalog": "telemetry",
    "schema": "logs",
    "name": "praxis_staging",
    "subpath_template": "dt=%Y-%m-%d/hr=%H/",
  },

  "file": {
    "format": "JSON",
    "compression": "gzip",
  },

  "copy": {
    "mode": "auto",
    "interval_seconds": 60,
    "on_error": "continue",
    "wait_timeout": 120,
  },

  "advanced": {
    "max_batch_rows": 50000,
    "max_batch_bytes": 16777216,
    "timeout": 60,
    "retry_on_failure": true,
    "retry_on_failure_settings": {
      "initial_interval": 5,
      "max_interval": 60,
      "max_time_elapsed": 600,
    },
    "backpressure_queue": true,
    "backpressure_queue_settings": {
      "queue_size": 1000,
      "number_of_consumers": 4,
      "enable_disk_backed_queue": true,
    },
    "tls": {
      "insecure_skip_verify": false,
    },
  },
}

Limitations

Logs only. Metrics and traces are not supported.
No DDL. The exporter never creates catalogs, schemas, volumes, or tables. Pre-provision everything (script above).
At-least-once. Network retries during file upload or COPY INTO may produce duplicate rows. Add a primary key + MERGE INTO downstream if exactly-once is required.
One target per destination node. Routing to multiple tables requires multiple destination nodes.
Warehouse cost. Every COPY INTO keeps the SQL warehouse spinning. Use copy.interval_seconds to coalesce uploads, and consider a serverless warehouse with aggressive auto-stop.

Overview​

Authentication​

Basic Configuration​

File format​

COPY INTO control​

Advanced​

TLS (advanced.tls)​

Retry settings (advanced.retry_on_failure_settings)​

Backpressure queue settings (advanced.backpressure_queue_settings)​

Pre-flight: catalog / schema / volume / table​

Example Configuration​

Limitations​