databricks · jayantsing-db · Apr 28, 2025 · Apr 23, 2025 · Apr 28, 2025 · gopalldb
diff --git a/README.md b/README.md
@@ -1,148 +1,126 @@
-# databricks-jdbc
-Repository for Java connector for Databricks
+# Databricks JDBC Driver
 
-**Status**: In Development
+The Databricks JDBC driver implements the JDBC interface providing connectivity to a Databricks SQL warehouse.
+Please refer to [Databricks documentation](https://docs.databricks.com/aws/en/integrations/jdbc-oss/) for more
+information.
 
-The Databricks JDBC driver implements the JDBC interface providing connectivity to a Databricks SQL warehouse
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 
-## Getting started
-You can install Databricks JDBC driver by adding the following to your `pom.xml`:
+## Prerequisites
 
-```pom.xml
+Databricks JDBC is compatible with Java 11 and higher. CI testing runs on Java versions 11, 17, and 21.
+
+## Installation
+
+### Maven
+
+Add the following dependency to your `pom.xml`:
+
+```xml
 <dependency>
   <groupId>com.databricks</groupId>
   <artifactId>databricks-jdbc</artifactId>
   <version>1.0.4-oss</version>
 </dependency>
 ```
-Databricks JDBC is compatible with Java 11 and higher. CI testing runs on Java versions 11, 17, and 21.
-## Instructions for building
-From development or main branch, run `mvn clean package`
-
-The jar file is generated as target/databricks-jdbc-oss-jar-with-dependencies.jar
-
-## Authentication
-The JDBC driver supports following modes for authentication:
-
-1. Personal Access Tokens: Set AuthMech=3 in connection string to use Personal Access Tokens, which can be set using PWD property.
-2. OAuth2: Set AuthMech=11 for using OAuth2. We only support Azure and AWS as cloud providers for OAuth2.
-   - Access Token: Set Auth_Flow=0 for providing passthrough access token using PWD property.
-   - Client Credentials: Set Auth_Flow=1 for using Machine-to-machine OAuth flow.
-   - Browser based OAuth: Set Auth_Flow=2 for using User-to-machine OAuth flow.
-
-## Integration Tests
-The project includes a suite of integration tests located in the
-`src/test/java/com/databricks/jdbc/integration/fakeservice/tests`. Each test runs against a set of fake-services
-corresponding to each production service, namely `SQL_EXEC`/`SQL_GATEWAY` and `DBFS`. The [fake-service](./src/test/java/com/databricks/jdbc/integration/fakeservice/FakeServiceExtension.java)
-is based on the open-source project [WireMock](https://wiremock.org/). The tests can be run in the following
-fake-service modes controlled by the environment variable <u>`FAKE_SERVICE_TEST_MODE`</u>:
-
-1. `RECORD`: In this mode, the fake-service will record the responses from the production service and save them to the
-   corresponding directory in `/src/test/resources/`. This mode is useful for updating the responses when contract with
-   the production service changes.
-2. `REPLAY` (default): In this mode, the fake-service will replay the recorded responses saved in the corresponding
-   directory in `/src/test/resources/`. This mode is useful for running the tests without connecting to the production
-   service.
-3. `DRY`: In this mode, the tests will run against the production service and the fake-service will simply act as a
-   pass-through proxy, meaning it neither records nor replays the responses. This mode is useful for debugging and
-   authoring the tests.
-
-### Running Integration Tests
-The driver supports both SQL-Execution (default) and Thrift clients. Integration tests can be executed using either the
-SQL-Execution or Thrift client, determined by setting the environment variable <u>`USE_THRIFT_CLIENT`</u> to `true` or
-`false`. By default, tests run using the SQL-Execution client. Depending on the environment, either the `SQL_EXEC` or
-`SQL_GATEWAY` (Thrift) fake-service is used, and test properties such as `HTTP_PATH`, `DATABRICKS_HOST`, `CATALOG`,
-`SCHEMA`, etc., are loaded accordingly.
-
-Running [connection](./src/test/java/com/databricks/jdbc/integration/fakeservice/tests/ConnectionIntegrationTests.java)
-tests in `REPLAY` mode using `SQL_GATEWAY`:
+
+### Build from Source
+
+1. Clone the repository
+2. Run the following command:
+   ```bash
+   mvn clean package
+   ```
+3. The jar file is generated as `target/databricks-jdbc-<version>.jar`
+4. The test coverage report is generated in `target/site/jacoco/index.html`
+
+## Usage
+
+### Connection String
+
 ```
-USE_THRIFT_CLIENT=true FAKE_SERVICE_TEST_MODE=replay mvn -Dtest=com.databricks.jdbc.integration.fakeservice.tests.ConnectionIntegrationTests test
+jdbc:databricks://<host>:<port>;transportMode=http;ssl=1;AuthMech=3;httpPath=<path>;UID=token;PWD=<token>
 ```
 
-Running all tests in `REPLAY` mode using `SQL_EXEC`:
+### Authentication
+
+The JDBC driver supports the following authentication methods:
+
+#### Personal Access Token (PAT)
+
+Use `AuthMech=3` for personal access token authentication:
+
 ```
-USE_THRIFT_CLIENT=false FAKE_SERVICE_TEST_MODE=replay mvn -Dtest=*IntegrationTests test
+AuthMech=3;UID=token;PWD=<your_token>
 ```
 
-To run tests in either `RECORD` or `DRY` mode, set a personal access token in the <u>`DATABRICKS_TOKEN`</u> environment
-variable.
+#### OAuth2 Authentication
+
+Use `AuthMech=11` for OAuth2-based authentication. Several OAuth flows are supported:
+
+##### Token Passthrough
+
+Direct use of an existing OAuth token:
+
+```
+AuthMech=11;Auth_Flow=0;Auth_AccessToken=<your_access_token>
+```
+
+##### OAuth Client Credentials (Machine-to-Machine)
+
+Configure standard OAuth client credentials flow:
 
-Running [execution](./src/test/java/com/databricks/jdbc/integration/fakeservice/tests/ExecutionIntegrationTests.java)
-tests in `RECORD` mode using `SQL_EXEC`:
 ```
-DATABRICKS_TOKEN=<personal-access-token> USE_THRIFT_CLIENT=false FAKE_SERVICE_TEST_MODE=record mvn -Dtest=com.databricks.jdbc.integration.fakeservice.tests.ExecutionIntegrationTests test
+AuthMech=11;Auth_Flow=1;OAuth2ClientId=<client_id>;OAuth2Secret=<client_secret>
 ```
-This will replace the recorded responses with the new responses from the production services.
-
-## Logging
-
-The driver supports both [SLF4J](https://www.slf4j.org/) and [JUL](https://docs.oracle.com/javase/8/docs/api/java/util/logging/package-summary.html) logging frameworks.
-
-- __SLF4J__: SLF4J logging can be enabled by setting the system property `-Dcom.databricks.jdbc.loggerImpl=SLF4JLOGGER`.
-  Customers need to provide the SLF4J binding implementation and corresponding configuration file in the classpath.
-  The intention is to give freedom to customers to adapt the JDBC logging as per their needs.
-  Example of using SLF4J with Log4j2; dependencies and configuration in `pom.xml` and `log4j2.xml` respectively:
-
-  ```
-  <dependency>
-    <groupId>org.apache.logging.log4j</groupId>
-    <artifactId>log4j-slf4j2-impl</artifactId>
-    <version>${log4j.version}</version>
-  </dependency>
-  <dependency>
-    <groupId>org.apache.logging.log4j</groupId>
-    <artifactId>log4j-core</artifactId>
-    <version>${log4j.version}</version>
-  </dependency>
-  <dependency>
-    <groupId>org.apache.logging.log4j</groupId>
-    <artifactId>log4j-api</artifactId>
-    <version>${log4j.version}</version>
-  </dependency>
-  ```
-
-  ```
-   <?xml version="1.0" encoding="UTF-8"?>
-   <Configuration status="WARN">
-       <Appenders>
-           <!-- Console appender for default logging -->
-           <Console name="Console" target="SYSTEM_OUT">
-               <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss} %-5level %logger{36} - %msg%n"/>
-           </Console>
-       </Appenders>
-
-       <Loggers>
-           <!-- Root logger to catch any logs that don't match other loggers -->
-           <Root level="info">
-               <AppenderRef ref="Console"/>
-           </Root>
-       </Loggers>
-   </Configuration>
-  ```
-
-- __Java Util Logging (JUL)__: JUL logging can be enabled by setting the system property
-  `-Dcom.databricks.jdbc.loggerImpl=JDKLOGGER`. By default, JDBC driver uses the JUL logging framework. The intention is
-  to provide an out-of-the-box logging implementation without dependencies external to the JDK. There are two ways to
-  configure JUL logging in the JDBC driver:
-  - __JDBC URL__: Standard logging parameters namely, `logLevel`, `logPath`, `logFileSize` (MB), and `logFileCount`can
-    be passed in the JDBC URL. Example:
-
-    ```
-    jdbc:databricks://your-databricks-host:443;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/your-warehouse-id;UID=token;logLevel=DEBUG;logPath=/path/to/dir;logFileSize=10;logFileCount=5
-    ```
-
-  - __Configuration File__: The logging properties can also be set in a `logging.properties` file. The file should be
-    present in the classpath. Example:
-
-    ```
-    handlers=java.util.logging.FileHandler, java.util.logging.ConsoleHandler
-    .level=INFO
-    java.util.logging.FileHandler.level=ALL
-    java.util.logging.FileHandler.pattern=/path/to/dir/databricks-jdbc.log
-    java.util.logging.FileHandler.limit=10000000
-    java.util.logging.FileHandler.count=5
-    java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
-    java.util.logging.ConsoleHandler.level=ALL
-    java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
-    ```
+
+Optional parameters:
+- `AzureTenantId`: Azure tenant ID for Azure Databricks (default: null). If enabled, the driver will include refreshed
+Azure Active Directory (AAD) Service Principal OAuth tokens with every request.
+
+##### Browser-Based OAuth
+
+Interactive browser-based OAuth flow with PKCE:
+
+```
+AuthMech=11;Auth_Flow=2
+```
+
+Optional parameters:
+- `OAuth2ClientId` - Client ID for OAuth2 (default: databricks-cli)
+- `OAuth2RedirectUrlPort` - Ports for redirect URL (default: 8020)
+- `EnableOIDCDiscovery` - Enable OIDC discovery (default: 1)
+- `OAuthDiscoveryURL` - OIDC discovery endpoint (default: /oidc/.well-known/oauth-authorization-server)
+
+### Logging
+
+The driver supports both SLF4J and Java Util Logging (JUL) frameworks:
+
+- **SLF4J**: Enable with `-Dcom.databricks.jdbc.loggerImpl=SLF4JLOGGER`
+- **JUL**: Enable with `-Dcom.databricks.jdbc.loggerImpl=JDKLOGGER` (default)
+
+For detailed logging configuration options, see [Logging Documentation](./docs/logging.md).
+
+## Running Tests
+
+Basic test execution:
+
+```bash
+mvn test
+```
+
+**Note**: Due to a change in JDK 16 that introduced a compatibility issue with the Apache Arrow library used by the JDBC
+driver, runtime errors may occur when using the JDBC driver with JDK 16 or later. To avoid these errors, restart your
+application or driver with the following JVM command option:
+
+```
+--add-opens=java.base/java.nio=org.apache.arrow.memory.core ALL-UNNAMED
+```
+
+For more detailed information about integration tests and fake services, see [Testing Documentation](./docs/testing.md).
+
+## Documentation
+
+For more information, see the following resources:
+- [Integration Tests Guide](./docs/testing.md)
+- [Logging Configuration](./docs/logging.md)
diff --git a/docs/LOGGING.md b/docs/LOGGING.md
@@ -0,0 +1,94 @@
+# Logging Configuration
+
+The Databricks JDBC driver supports both [SLF4J](https://www.slf4j.org/) and [JUL](https://docs.oracle.com/javase/8/docs/api/java/util/logging/package-summary.html) logging frameworks.
+
+## SLF4J Logging
+
+SLF4J logging can be enabled by setting the system property:
+```
+-Dcom.databricks.jdbc.loggerImpl=SLF4JLOGGER
+```
+
+You need to provide an SLF4J binding implementation and corresponding configuration file in the classpath. This gives you the freedom to adapt the JDBC logging to your specific needs.
+
+### Example: Using SLF4J with Log4j2
+
+Add the following dependencies to your `pom.xml`:
+
+```xml
+<dependency>
+  <groupId>org.apache.logging.log4j</groupId>
+  <artifactId>log4j-slf4j2-impl</artifactId>
+  <version>${log4j.version}</version>
+</dependency>
+<dependency>
+  <groupId>org.apache.logging.log4j</groupId>
+  <artifactId>log4j-core</artifactId>
+  <version>${log4j.version}</version>
+</dependency>
+<dependency>
+  <groupId>org.apache.logging.log4j</groupId>
+  <artifactId>log4j-api</artifactId>
+  <version>${log4j.version}</version>
+</dependency>
+```
+
+Create a `log4j2.xml` configuration file:
+
+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<Configuration status="WARN">
+    <Appenders>
+        <!-- Console appender for default logging -->
+        <Console name="Console" target="SYSTEM_OUT">
+            <PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss} %-5level %logger{36} - %msg%n"/>
+        </Console>
+    </Appenders>
+
+    <Loggers>
+        <!-- Root logger to catch any logs that don't match other loggers -->
+        <Root level="info">
+            <AppenderRef ref="Console"/>
+        </Root>
+    </Loggers>
+</Configuration>
+```
+
+## Java Util Logging (JUL)
+
+JUL logging is enabled by default, or can be explicitly set with:
+```
+-Dcom.databricks.jdbc.loggerImpl=JDKLOGGER
+```
+
+There are two ways to configure JUL logging:
+
+### 1. JDBC URL Parameters
+
+Standard logging parameters can be passed in the JDBC URL:
+
+```
+jdbc:databricks://your-databricks-host:443;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/your-warehouse-id;UID=token;logLevel=DEBUG;logPath=/path/to/dir;logFileSize=10;logFileCount=5
+```
+
+Available parameters:
+- `logLevel`: Logging level (e.g., DEBUG, INFO)
+- `logPath`: Directory path for log files
+- `logFileSize`: Maximum size of each log file in MB
+- `logFileCount`: Maximum number of log files to keep
+
+### 2. Configuration File
+
+Logging properties can also be set in a `logging.properties` file in the classpath:
+
+```properties
+handlers=java.util.logging.FileHandler, java.util.logging.ConsoleHandler
+.level=INFO
+java.util.logging.FileHandler.level=ALL
+java.util.logging.FileHandler.pattern=/path/to/dir/databricks-jdbc.log
+java.util.logging.FileHandler.limit=10000000
+java.util.logging.FileHandler.count=5
+java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
+java.util.logging.ConsoleHandler.level=ALL
+java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
+```