Skip to content

Commit 3444068

Browse files
Task/69025 data lake (#510)
* Adds Data Lake documentation Introduces documentation for the Data Lake feature, covering blob storage connections, catalog exploration, open file formats (Avro/Parquet), data replay, and data lake sink. This documentation aims to provide users with comprehensive information on how to connect to blob storage, explore datasets, understand the underlying data formats, replay data back to Kafka, and configure a sink for persisting data. * Groups managed services Refactors the navigation configuration to group related managed services under a common "Services" heading. This improves organization and readability of the documentation. * Renames "Data Lake" to "Quix Lake" in documentation Updates documentation to reflect the renaming of "Data Lake" to "Quix Lake". This includes updating titles, descriptions, and references to the storage layer across various documentation pages. * Formats Quixlake API documentation Updates the Quixlake API documentation to use proper Markdown headings for sections like Catalog, Data, Data Deletion, Metadata, and Security. This improves readability and organization of the documentation. * Data Lake docs improvements * Some more tweaks * more * last change --------- Co-authored-by: Patrick Mira <[email protected]>
1 parent d22a679 commit 3444068

30 files changed

+988
-6
lines changed
60.9 KB
Loading
38.6 KB
Loading
71 KB
Loading
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
---
2+
title: Blob storage connections
3+
description: Connect your cluster to external object storage (S3, GCS, Azure Blob, MinIO) to enable Quix Lake.
4+
---
5+
6+
# Blob storage connections
7+
8+
Connect your cluster to a bucket/container so Quix can enable **Quix Lake** or any other managed service that requires a Blob storage connection.
9+
10+
![Connections list](../../images/blob-storage/connections-list-running.png)
11+
12+
!!! important "One connection per cluster"
13+
Each **cluster** supports **one** blob storage connection.
14+
You can configure different connections for different clusters.
15+
16+
???+ info "Quix Lake at a glance"
17+
**Summary** - Quix Lake persists Kafka topic data as **Avro/Parquet** in your own bucket (S3, GCS, Azure Blob, MinIO), partitioned for fast discovery and full-fidelity **Replay**.
18+
19+
**Why it exists** - Preserve exact Kafka messages (timestamps, headers, partitions, offsets, gaps) with indexed metadata so **Catalog**, **Replay**, **Sinks**, and future services operate on open formats you control.
20+
21+
**Key properties**
22+
- **Portable** - open Avro & Parquet
23+
- **Efficient** - Hive-style partitions + Parquet metadata
24+
- **Flexible** - historical + live workflows
25+
- **Replay** - preserves order, partitions, timestamps, headers, gaps
26+
27+
**Flow** - **Ingest** (Avro) → **Index** (Parquet metadata) → **Discover** (Data Catalog & Metadata API) → **Replay** (full fidelity back to Kafka) → **Use** (explore, combine historical + live, run queries/export).
28+
29+
[Learn more about Quix Lake →](../quix-cloud/quixlake/overview.md)
30+
31+
## Create a connection
32+
33+
1. **Settings → Blob storage connections → Create**
34+
2. Pick **Cluster**, set **Display name**, choose **Provider**, fill the fields
35+
3. **Test connection** (below)
36+
4. **Save**
37+
38+
## Test before saving
39+
40+
![Testing connection](../../images/blob-storage/test-connecting.png)
41+
42+
When you click **Test connection**, Quix runs a short round-trip check to make sure your details are correct and that the platform can both see and use your storage.
43+
44+
**Here’s what happens:**
45+
46+
1. **Connect** - Quix creates a storage client using the details you entered.
47+
2. **Upload** - it writes a small temporary file into a `tmp/` folder in your bucket or container.
48+
3. **Check visibility** - it confirms the file shows up in the storage listing.
49+
4. **Query** - it runs a simple check to ensure the file is discoverable for later Quix Lake operations.
50+
5. **Clean up** - the temporary file is deleted so your storage stays tidy.
51+
52+
**Success**
53+
Each step is shown in the dialog. Successful steps are marked with a ✓, and you’ll see confirmation when everything checks out.
54+
55+
**Failure**
56+
If a step fails, you’ll see ✗ next to it along with the reason (for example, “Access denied” or “Wrong region”). This makes it easy to fix permissions or update your settings.
57+
58+
![Access denied example](../../images/blob-storage/test-error.png)
59+
60+
## Providers
61+
62+
=== "Amazon S3"
63+
64+
1. Log in to the **AWS Management Console**.
65+
2. Go to **IAM**.
66+
3. Open **Users**.
67+
4. Select an existing user or click **Add user** to create a new one.
68+
5. **Permissions**
69+
- In the **Permissions** tab, attach a policy that allows bucket access.
70+
6. **Security credentials**
71+
- Open the **Security credentials** tab.
72+
- Click **Create access key**.
73+
7. **Save credentials**
74+
- Copy the **Access Key ID** and **Secret Access Key** (the secret appears only once).
75+
8. Copy the information into the Quix S3 form.
76+
9. Click **Test Connection**.
77+
78+
=== "Google Cloud Storage (GCS)"
79+
80+
1. **Ensure access**
81+
- Have Google Cloud project owner or similar permissions where your bucket resides or will be created.
82+
- Create a service account and assign it to the bucket with R/W (e.g., `roles/storage.objectAdmin`) or equivalent minimal object roles.
83+
2. **Open Cloud Storage settings**
84+
- In the Google Cloud Console, go to **Storage → Settings**.
85+
3. **Interoperability tab**
86+
- Select **Interoperability**.
87+
- If disabled, click **Enable S3 interoperability**.
88+
4. **Create (or view) keys**
89+
- Under **Access keys for service accounts**, click **Create key** and follow the process to assign one to the service account.
90+
5. **Save credentials**
91+
- Copy the **Access key** and **Secret** (the secret is shown only once).
92+
- Paste this information into the Quix S3 connector form.
93+
6. Click **Test Connection**.
94+
95+
96+
=== "Azure Blob Storage"
97+
98+
1. **Ensure access**
99+
- Your Azure user must have at least the **Storage Blob Data Contributor** role (or higher).
100+
- Open the **Azure Portal** and go to your **Storage account**.
101+
2. **Navigate to credentials**
102+
- In the left menu, expand **Security + networking**.
103+
- Click **Access keys**.
104+
3. **Copy credentials**
105+
- Note the **Storage account name**.
106+
- Copy **Key1** (or **Key2**) value.
107+
- Paste the information into the Quix Azure Blob connector form.
108+
4. Click **Test Connection**.
109+
110+
=== "MinIO (S3-compatible)"
111+
112+
1. **Ensure access**
113+
- Your MinIO user or role must include permissions to create and list access keys (e.g., `consoleAdmin` or a custom PBAC policy).
114+
2. **Log in** to the MinIO Console.
115+
3. **Go to Access keys**
116+
- Select **Access keys** in the left menu.
117+
4. **Create a new key**
118+
- Click **Create access key** to generate an **Access Key** and **Secret Key**.
119+
5. **Save credentials**
120+
- Copy the **Access Key** and **Secret Key** - the secret is shown only once.
121+
6. Copy the information into the Quix MinIO connector form.
122+
7. Click **Test Connection**.
123+
124+
## Security & operations
125+
126+
- Dedicated principals per connection (IAM user / Service Account / MinIO user)
127+
- Scope credentials to **one** bucket/container
128+
- Rotate keys regularly; store secrets securely
129+
- Consider server-side encryption and access logging
130+
131+
## See more
132+
133+
* [What is Quix Lake](../quixlake/overview.md) - what it is and why it exists
134+
* [Open format](../quixlake/open-format.md) - layout and schemas (Avro, Parquet)
135+
* [Quix Lake - API](../quixlake/api.md) - browse, search, and manage datasets
136+
* [Quix Lake - Sink](./sink.md) - persist topics to your bucket/container
137+
* [Quix Lake - Replay](./replay.md) - re-run datasets back to Kafka

docs/quix-cloud/managed-services/dynamic-configuration.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
The **Dynamic Configuration Manager** is a managed service for handling
44
**large, versioned configuration files** related to devices, sensors, or
55
physical assets.
6+
67
These configurations often change in real time (e.g., updates to
78
equipment parameters, IoT sensor mappings, or lab/test system setups),
89
but are **too large to send through Kafka directly**.
78.1 KB
Loading
78 KB
Loading
15.5 KB
Loading
17.6 KB
Loading
151 KB
Loading

0 commit comments

Comments
 (0)