Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
236 changes: 107 additions & 129 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,148 +1,126 @@
# databricks-jdbc
Repository for Java connector for Databricks
# Databricks JDBC Driver
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add our public documentation link here for more details.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


**Status**: In Development
The Databricks JDBC driver implements the JDBC interface providing connectivity to a Databricks SQL warehouse.
Please refer to [Databricks documentation](https://docs.databricks.com/aws/en/integrations/jdbc-oss/) for more
information.

The Databricks JDBC driver implements the JDBC interface providing connectivity to a Databricks SQL warehouse
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

## Getting started
You can install Databricks JDBC driver by adding the following to your `pom.xml`:
## Prerequisites

```pom.xml
Databricks JDBC is compatible with Java 11 and higher. CI testing runs on Java versions 11, 17, and 21.

## Installation

### Maven

Add the following dependency to your `pom.xml`:

```xml
<dependency>
<groupId>com.databricks</groupId>
<artifactId>databricks-jdbc</artifactId>
<version>1.0.4-oss</version>
</dependency>
```
Databricks JDBC is compatible with Java 11 and higher. CI testing runs on Java versions 11, 17, and 21.
## Instructions for building
From development or main branch, run `mvn clean package`

The jar file is generated as target/databricks-jdbc-oss-jar-with-dependencies.jar

## Authentication
The JDBC driver supports following modes for authentication:

1. Personal Access Tokens: Set AuthMech=3 in connection string to use Personal Access Tokens, which can be set using PWD property.
2. OAuth2: Set AuthMech=11 for using OAuth2. We only support Azure and AWS as cloud providers for OAuth2.
- Access Token: Set Auth_Flow=0 for providing passthrough access token using PWD property.
- Client Credentials: Set Auth_Flow=1 for using Machine-to-machine OAuth flow.
- Browser based OAuth: Set Auth_Flow=2 for using User-to-machine OAuth flow.

## Integration Tests
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this covered anywhere? else we can still keep the running part and move full documentation to a confluence page

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a separate page for integration test in docs folder

The project includes a suite of integration tests located in the
`src/test/java/com/databricks/jdbc/integration/fakeservice/tests`. Each test runs against a set of fake-services
corresponding to each production service, namely `SQL_EXEC`/`SQL_GATEWAY` and `DBFS`. The [fake-service](./src/test/java/com/databricks/jdbc/integration/fakeservice/FakeServiceExtension.java)
is based on the open-source project [WireMock](https://wiremock.org/). The tests can be run in the following
fake-service modes controlled by the environment variable <u>`FAKE_SERVICE_TEST_MODE`</u>:

1. `RECORD`: In this mode, the fake-service will record the responses from the production service and save them to the
corresponding directory in `/src/test/resources/`. This mode is useful for updating the responses when contract with
the production service changes.
2. `REPLAY` (default): In this mode, the fake-service will replay the recorded responses saved in the corresponding
directory in `/src/test/resources/`. This mode is useful for running the tests without connecting to the production
service.
3. `DRY`: In this mode, the tests will run against the production service and the fake-service will simply act as a
pass-through proxy, meaning it neither records nor replays the responses. This mode is useful for debugging and
authoring the tests.

### Running Integration Tests
The driver supports both SQL-Execution (default) and Thrift clients. Integration tests can be executed using either the
SQL-Execution or Thrift client, determined by setting the environment variable <u>`USE_THRIFT_CLIENT`</u> to `true` or
`false`. By default, tests run using the SQL-Execution client. Depending on the environment, either the `SQL_EXEC` or
`SQL_GATEWAY` (Thrift) fake-service is used, and test properties such as `HTTP_PATH`, `DATABRICKS_HOST`, `CATALOG`,
`SCHEMA`, etc., are loaded accordingly.

Running [connection](./src/test/java/com/databricks/jdbc/integration/fakeservice/tests/ConnectionIntegrationTests.java)
tests in `REPLAY` mode using `SQL_GATEWAY`:

### Build from Source

1. Clone the repository
2. Run the following command:
```bash
mvn clean package
```
3. The jar file is generated as `target/databricks-jdbc-<version>.jar`
4. The test coverage report is generated in `target/site/jacoco/index.html`

## Usage

### Connection String

```
USE_THRIFT_CLIENT=true FAKE_SERVICE_TEST_MODE=replay mvn -Dtest=com.databricks.jdbc.integration.fakeservice.tests.ConnectionIntegrationTests test
jdbc:databricks://<host>:<port>;transportMode=http;ssl=1;AuthMech=3;httpPath=<path>;UID=token;PWD=<token>
```

Running all tests in `REPLAY` mode using `SQL_EXEC`:
### Authentication
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add public documentation link for config properties if one wants to go into more details on Auth

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that our public documentation doesn't include all authentication types, and we also plan to exclude them from the README. Therefore, I believe this comment is now unnecessary.


The JDBC driver supports the following authentication methods:

#### Personal Access Token (PAT)

Use `AuthMech=3` for personal access token authentication:

```
USE_THRIFT_CLIENT=false FAKE_SERVICE_TEST_MODE=replay mvn -Dtest=*IntegrationTests test
AuthMech=3;UID=token;PWD=<your_token>
```

To run tests in either `RECORD` or `DRY` mode, set a personal access token in the <u>`DATABRICKS_TOKEN`</u> environment
variable.
#### OAuth2 Authentication

Use `AuthMech=11` for OAuth2-based authentication. Several OAuth flows are supported:

##### Token Passthrough

Direct use of an existing OAuth token:

```
AuthMech=11;Auth_Flow=0;Auth_AccessToken=<your_access_token>
```

##### OAuth Client Credentials (Machine-to-Machine)

Configure standard OAuth client credentials flow:

Running [execution](./src/test/java/com/databricks/jdbc/integration/fakeservice/tests/ExecutionIntegrationTests.java)
tests in `RECORD` mode using `SQL_EXEC`:
```
DATABRICKS_TOKEN=<personal-access-token> USE_THRIFT_CLIENT=false FAKE_SERVICE_TEST_MODE=record mvn -Dtest=com.databricks.jdbc.integration.fakeservice.tests.ExecutionIntegrationTests test
AuthMech=11;Auth_Flow=1;OAuth2ClientId=<client_id>;OAuth2Secret=<client_secret>
```
This will replace the recorded responses with the new responses from the production services.

## Logging

The driver supports both [SLF4J](https://www.slf4j.org/) and [JUL](https://docs.oracle.com/javase/8/docs/api/java/util/logging/package-summary.html) logging frameworks.

- __SLF4J__: SLF4J logging can be enabled by setting the system property `-Dcom.databricks.jdbc.loggerImpl=SLF4JLOGGER`.
Customers need to provide the SLF4J binding implementation and corresponding configuration file in the classpath.
The intention is to give freedom to customers to adapt the JDBC logging as per their needs.
Example of using SLF4J with Log4j2; dependencies and configuration in `pom.xml` and `log4j2.xml` respectively:

```
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j2-impl</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j.version}</version>
</dependency>
```

```
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
<Appenders>
<!-- Console appender for default logging -->
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss} %-5level %logger{36} - %msg%n"/>
</Console>
</Appenders>

<Loggers>
<!-- Root logger to catch any logs that don't match other loggers -->
<Root level="info">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
```

- __Java Util Logging (JUL)__: JUL logging can be enabled by setting the system property
`-Dcom.databricks.jdbc.loggerImpl=JDKLOGGER`. By default, JDBC driver uses the JUL logging framework. The intention is
to provide an out-of-the-box logging implementation without dependencies external to the JDK. There are two ways to
configure JUL logging in the JDBC driver:
- __JDBC URL__: Standard logging parameters namely, `logLevel`, `logPath`, `logFileSize` (MB), and `logFileCount`can
be passed in the JDBC URL. Example:

```
jdbc:databricks://your-databricks-host:443;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/your-warehouse-id;UID=token;logLevel=DEBUG;logPath=/path/to/dir;logFileSize=10;logFileCount=5
```

- __Configuration File__: The logging properties can also be set in a `logging.properties` file. The file should be
present in the classpath. Example:

```
handlers=java.util.logging.FileHandler, java.util.logging.ConsoleHandler
.level=INFO
java.util.logging.FileHandler.level=ALL
java.util.logging.FileHandler.pattern=/path/to/dir/databricks-jdbc.log
java.util.logging.FileHandler.limit=10000000
java.util.logging.FileHandler.count=5
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
java.util.logging.ConsoleHandler.level=ALL
java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
```

Optional parameters:
- `AzureTenantId`: Azure tenant ID for Azure Databricks (default: null). If enabled, the driver will include refreshed
Azure Active Directory (AAD) Service Principal OAuth tokens with every request.

##### Browser-Based OAuth

Interactive browser-based OAuth flow with PKCE:

```
AuthMech=11;Auth_Flow=2
```

Optional parameters:
- `OAuth2ClientId` - Client ID for OAuth2 (default: databricks-cli)
- `OAuth2RedirectUrlPort` - Ports for redirect URL (default: 8020)
- `EnableOIDCDiscovery` - Enable OIDC discovery (default: 1)
- `OAuthDiscoveryURL` - OIDC discovery endpoint (default: /oidc/.well-known/oauth-authorization-server)

### Logging

The driver supports both SLF4J and Java Util Logging (JUL) frameworks:

- **SLF4J**: Enable with `-Dcom.databricks.jdbc.loggerImpl=SLF4JLOGGER`
- **JUL**: Enable with `-Dcom.databricks.jdbc.loggerImpl=JDKLOGGER` (default)

For detailed logging configuration options, see [Logging Documentation](./docs/logging.md).

## Running Tests
Copy link
Copy Markdown
Collaborator

@gopalldb gopalldb Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also add JVM property for nio under using the driver section

--add-opens=java.base/java.nio=org.apache.arrow.memory.core ALL-UNNAMED

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this cover running fake service tests also?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fake service tests, I have added a separate detailed page in docs folder


Basic test execution:

```bash
mvn test
```

**Note**: Due to a change in JDK 16 that introduced a compatibility issue with the Apache Arrow library used by the JDBC
driver, runtime errors may occur when using the JDBC driver with JDK 16 or later. To avoid these errors, restart your
application or driver with the following JVM command option:

```
--add-opens=java.base/java.nio=org.apache.arrow.memory.core ALL-UNNAMED
```

For more detailed information about integration tests and fake services, see [Testing Documentation](./docs/testing.md).

## Documentation

For more information, see the following resources:
- [Integration Tests Guide](./docs/testing.md)
- [Logging Configuration](./docs/logging.md)
94 changes: 94 additions & 0 deletions docs/LOGGING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Logging Configuration

The Databricks JDBC driver supports both [SLF4J](https://www.slf4j.org/) and [JUL](https://docs.oracle.com/javase/8/docs/api/java/util/logging/package-summary.html) logging frameworks.

## SLF4J Logging

SLF4J logging can be enabled by setting the system property:
```
-Dcom.databricks.jdbc.loggerImpl=SLF4JLOGGER
```

You need to provide an SLF4J binding implementation and corresponding configuration file in the classpath. This gives you the freedom to adapt the JDBC logging to your specific needs.

### Example: Using SLF4J with Log4j2

Add the following dependencies to your `pom.xml`:

```xml
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j2-impl</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j.version}</version>
</dependency>
```

Create a `log4j2.xml` configuration file:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
<Appenders>
<!-- Console appender for default logging -->
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss} %-5level %logger{36} - %msg%n"/>
</Console>
</Appenders>

<Loggers>
<!-- Root logger to catch any logs that don't match other loggers -->
<Root level="info">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
```

## Java Util Logging (JUL)

JUL logging is enabled by default, or can be explicitly set with:
```
-Dcom.databricks.jdbc.loggerImpl=JDKLOGGER
```

There are two ways to configure JUL logging:

### 1. JDBC URL Parameters

Standard logging parameters can be passed in the JDBC URL:

```
jdbc:databricks://your-databricks-host:443;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/your-warehouse-id;UID=token;logLevel=DEBUG;logPath=/path/to/dir;logFileSize=10;logFileCount=5
```

Available parameters:
- `logLevel`: Logging level (e.g., DEBUG, INFO)
- `logPath`: Directory path for log files
- `logFileSize`: Maximum size of each log file in MB
- `logFileCount`: Maximum number of log files to keep

### 2. Configuration File

Logging properties can also be set in a `logging.properties` file in the classpath:

```properties
handlers=java.util.logging.FileHandler, java.util.logging.ConsoleHandler
.level=INFO
java.util.logging.FileHandler.level=ALL
java.util.logging.FileHandler.pattern=/path/to/dir/databricks-jdbc.log
java.util.logging.FileHandler.limit=10000000
java.util.logging.FileHandler.count=5
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
java.util.logging.ConsoleHandler.level=ALL
java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
```
Loading
Loading