Skip to content

Commit 646685d

Browse files
authored
Merge branch 'master' into testcontainers
2 parents 1bc22e8 + 54e3756 commit 646685d

36 files changed

+597
-182
lines changed

.github/workflows/build.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ jobs:
4242
run: ./gradlew check
4343
env:
4444
GRADLE_OPTS: '-Dorg.gradle.daemon=false'
45+
NXF_SMOKE: 1
4546

4647
- name: Publish
4748
if: failure()

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,5 @@
22
.idea
33
.nextflow
44
build
5-
dist
5+
dist
6+
out

Makefile

Lines changed: 67 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,71 @@
1+
config ?= compileClasspath
2+
3+
ifdef module
4+
mm = :${module}:
5+
else
6+
mm =
7+
endif
8+
9+
clean:
10+
rm -rf .nextflow*
11+
rm -rf work
12+
rm -rf build
13+
rm -rf plugins/*/build
14+
./gradlew clean
15+
16+
compile:
17+
./gradlew compileGroovy
18+
@echo "DONE `date`"
19+
20+
21+
check:
22+
./gradlew check
23+
24+
125
#
226
# Show dependencies try `make deps config=runtime`, `make deps config=google`
327
#
4-
deps: FORCE
5-
./gradlew -q :plugins:nf-sqldb:dependencies --configuration runtimeClasspath
28+
deps:
29+
./gradlew -q ${mm}dependencies --configuration ${config}
30+
31+
deps-all:
32+
./gradlew -q dependencyInsight --configuration ${config} --dependency ${module}
33+
34+
#
35+
# Refresh SNAPSHOTs dependencies
36+
#
37+
refresh:
38+
./gradlew --refresh-dependencies
39+
40+
#
41+
# Run all tests or selected ones
42+
#
43+
test:
44+
ifndef class
45+
./gradlew ${mm}test
46+
else
47+
./gradlew ${mm}test --tests ${class}
48+
endif
49+
50+
assemble:
51+
./gradlew assemble
52+
53+
#
54+
# generate build zips under build/plugins
55+
# you can install the plugin copying manually these files to $HOME/.nextflow/plugins
56+
#
57+
buildPlugins:
58+
./gradlew copyPluginZip
59+
60+
#
61+
# Upload JAR artifacts to Maven Central
62+
#
63+
upload:
64+
./gradlew upload
65+
66+
67+
upload-plugins:
68+
./gradlew plugins:upload
669

7-
FORCE: ;
70+
publish-index:
71+
./gradlew plugins:publishIndex

NOTICE

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
NEXTFLOW SQLDB PLUGIN
22

3-
Copyright 2020-2022, Seqera Labs
3+
Copyright 2020-2025, Seqera Labs
44

55

66
This software includes source code and libraries developed by:
@@ -10,7 +10,7 @@ This software includes source code and libraries developed by:
1010
Licensed under Apache License, Version 2.0
1111

1212
Nextflow
13-
Copyright 2020-2022, Seqera Labs
13+
Copyright 2020-2025, Seqera Labs
1414
Licensed under the Apache License, Version 2.0
1515

1616
Logback

README.md

Lines changed: 92 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,133 +1,139 @@
11
# SQL DB plugin for Nextflow
22

3-
This plugin provides an extension to implement built-in support for SQL DB access and manipulation in Nextflow scripts.
3+
This plugin provides support for interacting with SQL databases in Nextflow scripts.
44

5-
It provides the ability to create a Nextflow channel from SQL queries and to populate database tables.
6-
The current version provides out-of-the-box support for the following databases:
5+
The following databases are currently supported:
76

7+
* [AWS Athena](https://aws.amazon.com/athena/) (Setup guide [here](docs/aws-athena.md))
8+
* [DuckDB](https://duckdb.org/)
89
* [H2](https://www.h2database.com)
9-
* [MySQL](https://www.mysql.com/)
10+
* [MySQL](https://www.mysql.com/)
1011
* [MariaDB](https://mariadb.org/)
1112
* [PostgreSQL](https://www.postgresql.org/)
1213
* [SQLite](https://www.sqlite.org/index.html)
13-
* [DuckDB](https://duckdb.org/)
14-
* [AWS Athena](https://aws.amazon.com/athena/)
15-
14+
1615
NOTE: THIS IS A PREVIEW TECHNOLOGY, FEATURES AND CONFIGURATION SETTINGS CAN CHANGE IN FUTURE RELEASES.
1716

18-
This repository only holds plugin artefacts. Source code is available at this [link](https://github.com/nextflow-io/nextflow/tree/master/plugins/nf-sqldb).
17+
## Getting started
1918

20-
## Get started
21-
22-
Make sure to have Nextflow `22.08.1-edge` or later. Add the following snippet to your `nextflow.config` file.
19+
This plugin requires Nextflow `22.08.1-edge` or later. You can enable the plugin by adding the following snippet to your `nextflow.config` file:
2320

24-
```
21+
```groovy
2522
plugins {
26-
id 'nf-sqldb@0.5.0'
23+
id 'nf-sqldb'
2724
}
2825
```
2926

30-
The above declaration allows the use of the SQL plugin functionalities in your Nextflow pipelines.
31-
See the section below to configure the connection properties with a database instance.
32-
3327

3428
## Configuration
3529

36-
The target database connection coordinates are specified in the `nextflow.config` file using the
37-
`sql.db` scope. The following are available
30+
You can configure any number of databases under the `sql.db` configuration scope. For example:
3831

39-
| Config option | Description |
40-
|--- |--- |
41-
| `sql.db.'<DB-NAME>'.url` | The database connection URL based on Java [JDBC standard](https://docs.oracle.com/javase/tutorial/jdbc/basics/connecting.html#db_connection_url).
42-
| `sql.db.'<DB-NAME>'.driver` | The database driver class name (optional).
43-
| `sql.db.'<DB-NAME>'.user` | The database connection user name.
44-
| `sql.db.'<DB-NAME>'.password` | The database connection password.
45-
46-
For example:
47-
48-
```
32+
```groovy
4933
sql {
5034
db {
5135
foo {
52-
url = 'jdbc:mysql://localhost:3306/demo'
53-
user = 'my-user'
54-
password = 'my-password'
55-
}
36+
url = 'jdbc:mysql://localhost:3306/demo'
37+
user = 'my-user'
38+
password = 'my-password'
39+
}
5640
}
5741
}
58-
5942
```
6043

61-
The above snippet defines SQL DB named *foo* that connects to a MySQL server running locally at port 3306 and
62-
using `demo` schema, with `my-name` and `my-password` as credentials.
44+
The above example defines a database named `foo` that connects to a MySQL server running locally at port 3306 and
45+
using the `demo` schema, with `my-name` and `my-password` as credentials.
6346

64-
## Available operations
47+
The following options are available:
6548

66-
This plugin adds to the Nextflow DSL the following extensions that allows performing of queries and populating database tables.
49+
`sql.db.'<DB-NAME>'.url`
50+
: The database connection URL based on the [JDBC standard](https://docs.oracle.com/javase/tutorial/jdbc/basics/connecting.html#db_connection_url).
51+
52+
`sql.db.'<DB-NAME>'.driver`
53+
: The database driver class name (optional).
54+
55+
`sql.db.'<DB-NAME>'.user`
56+
: The database connection user name.
57+
58+
`sql.db.'<DB-NAME>'.password`
59+
: The database connection password.
60+
61+
## Dataflow Operators
62+
63+
This plugin provides the following dataflow operators for querying from and inserting into database tables.
6764

6865
### fromQuery
6966

70-
The `fromQuery` factory method allows for performing a query against a SQL database and creating a Nextflow channel emitting
71-
a tuple for each row in the corresponding result set. For example:
67+
The `fromQuery` factory method queries a SQL database and creates a channel that emits a tuple for each row in the corresponding result set. For example:
7268

73-
```
69+
```nextflow
7470
include { fromQuery } from 'plugin/nf-sqldb'
7571
76-
ch = channel.fromQuery('select alpha, delta, omega from SAMPLE', db: 'foo')
72+
channel.fromQuery('select alpha, delta, omega from SAMPLE', db: 'foo').view()
7773
```
7874

7975
The following options are available:
8076

81-
| Operator option | Description |
82-
|--- |--- |
83-
| `db` | The database handle. It must must a `sql.db` name defined in the `nextflow.config` file.
84-
| `batchSize` | Performs the query in batches of the specified size. This is useful to avoid loading the complete resultset in memory for query returning a large number of entries. NOTE: this feature requires that the underlying SQL database to support `LIMIT` and `OFFSET` capability.
85-
| `emitColumns` | When `true` the column names in the select statement are emitted as first tuple in the resulting channel.
77+
`db`
78+
: The database handle. It must be defined under `sql.db` in the Nextflow configuration.
79+
80+
`batchSize`
81+
: Query the data in batches of the given size. This option is recommended for queries that may return large a large result set, so that the entire result set is not loaded into memory at once.
82+
: *NOTE:* this feature requires that the underlying SQL database supports `LIMIT` and `OFFSET`.
83+
84+
`emitColumns`
85+
: When `true`, the column names in the `SELECT` statement are emitted as the first tuple in the resulting channel.
8686

8787
### sqlInsert
8888

89-
The `sqlInsert` operator provided by this plugin allows populating a database table with the data emitted
90-
by a Nextflow channels and therefore produced as result by a pipeline process or an upstream operator. For example:
89+
The `sqlInsert` operator collects the items in a source channel and inserts them into a SQL database. For example:
9190

92-
```
91+
```nextflow
9392
include { sqlInsert } from 'plugin/nf-sqldb'
9493
9594
channel
9695
.of('Hello','world!')
9796
.map( it -> tuple(it, it.length) )
9897
.sqlInsert( into: 'SAMPLE', columns: 'NAME, LEN', db: 'foo' )
99-
10098
```
10199

102-
The above example creates and performs the following two SQL statements into the database with name `foo` as defined
103-
in the `nextflow.config` file.
100+
The above example executes the following SQL statements into the database `foo` (as defined in the Nextflow configuration).
104101

105-
```
102+
```sql
106103
INSERT INTO SAMPLE (NAME, LEN) VALUES ('HELLO', 5);
107104
INSERT INTO SAMPLE (NAME, LEN) VALUES ('WORLD!', 6);
108105
```
109106

110-
NOTE: the target table (e.g. `SAMPLE` in the above example) must be created ahead.
107+
*NOTE:* the target table (e.g. `SAMPLE` in the above example) must be created beforehand.
111108

112109
The following options are available:
113110

114-
| Operator option | Description |
115-
|-------------------|--- |
116-
| `db` | The database handle. It must must a `sql.db` name defined in the `nextflow.config` file.
117-
| `into` | The database table name into with the data needs to be stored.
118-
| `columns` | The database table column names to be filled with the channel data. The column names order and cardinality must match the tuple values emitted by the channel. The columns can be specified as a `List` object or a comma-separated value string.
119-
| `statement` | The SQL `insert` statement to be performed to insert values in the database using `?` as placeholder for the actual values, for example: `insert into SAMPLE(X,Y) values (?,?)`. When provided the `into` and `columsn` parameters are ignored.
120-
| `batchSize` | The number of insert statements that are grouped together before performing the SQL operations (default: `10`).
121-
| `setup` | A SQL statement that's executed before the first insert operation. This is useful to create the target DB table. NOTE: the underlying DB should support the *create table if not exist* idiom (i.e. the plugin will execute this statement every time the script is run).
111+
`db`
112+
: The database handle. It must be defined under `sql.db` in the Nextflow configuration.
122113

123-
## Query CSV files
114+
`into`
115+
: The target table for inserting the data.
124116

125-
The SQL plugin includes the [H2](https://www.h2database.com/html/main.html) database engine that allows the query of CSV files
126-
as DB tables using SQL statements.
117+
`columns`
118+
: The database table column names to be filled with the channel data. The column names order and cardinality must match the tuple values emitted by the channel. The columns can be specified as a list or as a string of comma-separated values.
127119

128-
For example, create CSV file using the snippet below:
120+
`statement`
121+
: The SQL `INSERT` statement to execute, using `?` as a placeholder for the actual values, for example: `insert into SAMPLE(X,Y) values (?,?)`. The `into` and `columns` options are ignored when this option is provided.
129122

130-
```
123+
`batchSize`
124+
: Insert the data in batches of the given size (default: `10`).
125+
126+
`setup`
127+
: A SQL statement that is executed before inserting the data, e.g. to create the target table.
128+
: *NOTE:* the underlying database should support the *create table if not exist* idiom, as the plugin will execute this statement every time the script is run.
129+
130+
## Querying CSV files
131+
132+
This plugin supports the [H2](https://www.h2database.com/html/main.html) database engine, which can query CSV files like database tables using SQL statements.
133+
134+
For example, create a CSV file using the snippet below:
135+
136+
```bash
131137
cat <<EOF > test.csv
132138
foo,bar
133139
1,hello
@@ -137,27 +143,31 @@ foo,bar
137143
EOF
138144
```
139145

140-
To query this file in a Nextflow script use the following snippet:
146+
Then query it in a Nextflow script:
141147

142148
```nextflow
143-
channel
144-
.sql
145-
.fromQuery("SELECT * FROM CSVREAD('test.csv') where foo>=2;")
146-
.view()
149+
include { fromQuery } from 'plugin/nf-sqldb'
150+
151+
channel
152+
.fromQuery("SELECT * FROM CSVREAD('test.csv') where foo>=2;")
153+
.view()
147154
```
148155

156+
The `CSVREAD` function provided by the H2 database engine allows you to query any CSV file in your filesystem. As shown in the example, you can use standard SQL clauses like `SELECT` and `WHERE` to define your query.
157+
158+
## Caveats
149159

150-
The `CSVREAD` function provided by the H2 database engine allows the access of a CSV file in your computer file system,
151-
you can replace `test.csv` with a CSV file path of your choice. The `foo>=2` condition shows how to define a filtering
152-
clause using the conventional SQL WHERE constrains.
160+
Like all dataflow operators in Nextflow, the operators provided by this plugin are executed asynchronously.
153161

154-
## Important
162+
In particular, data inserted using the `sqlInsert` operator is *not* guaranteed to be available to any subsequent queries using the `fromQuery` operator, as it is not possible to make a channel factory operation dependent on some upstream operation.
155163

156-
This plugin is not expected to be used to store and access a pipeline status in a synchronous manner during the pipeline
157-
execution.
158164

159-
This means that if your script has a `sqlInsert` operation followed by a successive `fromQuery` operation, the query
160-
may *not* contain previously inserted data due to the asynchronous nature of Nextflow operators.
165+
## Developtment
161166

162-
The SQL support provided by this plugin is meant to be used to fetch DB data from a previous run or to populate DB tables
163-
for storing or archival purpose.
167+
#### Publish artifacts to Maven repo
168+
169+
Use the following command:
170+
171+
```
172+
./gradlew plugins:nf-sqldb:publishMavenPublicationToMavenRepository
173+
```

act.sh

Lines changed: 0 additions & 25 deletions
This file was deleted.

0 commit comments

Comments
 (0)