An implementation of SDQL (Semi-ring Dictionary Query Language) in Scala.
- For more details, refer to the OOPSLA'22 paper or the more recent A semi-ring dictionary query language for data science.
- For comparison,
<
and<=
are used instead of>
and>=
. - The following syntactic sugar constructs are not supported: array construction
[| x1, ..., xn |]
and key-set of dictionarydom
.
Generate datasets as follows:
# or clone it elsewhere
git clone https://github.com/edin-dal/tpch-dbgen
cd tpch-dbgen
make
./dbgen -s 1 -vf
# or move them elsewhere and create a symlink
# (careful: macOS aliases are not symlinks!)
mv *.tbl ../datasets/tpch
cd ..
Note: for the interpreter you will want a smaller scale factor like -s 0.01
.
The data loader does not expect TPCH tables to have end-of-line |
characters.
Strip them:
sed -i 's/|$//' datasets/tpch/*.tbl
On macOS:
sed -i '' 's/|$//' datasets/tpch/*.tbl
You can check everything works by running the tests:
sbt test
To automatically run sbt test
before each push, configure the local git hooks in hooks
:
git config core.hooksPath hooks
These are slower end-to-end tests: they generate, compile/interpret, and check the results of full queries.
First, comment out the global -l
options in build.sbt
.
You can then run the optional tests using these commands:
# fast test (< 1 min)
sbt "testOnly * -- -n TestTPCH0_01"
# slower test (> 1 min)
sbt "testOnly * -- -n TestTPCH1"
These tests will compare their results against a ground truth we provide.
Set it up as follows:
# or clone it elsewhere
git clone https://github.com/edin-dal/sdql-benchmark
# create a symlink from the path expected by the tests
ln -s sdql-benchmark/results results
Note: make sure you also have the required files in your datasets
folder.
sbt
run compile <path> <sdql_files>*
For example, to run compiled TPCH Q1 and Q6:
sbt
run compile progs/tpch q1.sdql q6.sdql
Note: compilation requires clang++
and clang-format
to be installed.
progs/tpch-interpreter
.
sbt
run interpret <path> <sdql_files>*
For example, to run TPCH Q6, first make sure that the folder datasets/tpch
contains TPCH tables (with a small scale
factor). Then, run the following command:
sbt
run interpret progs/tpch-interpreter q6.sdql
Or as a one-liner: sbt "run interpret progs/tpch-interpreter q6.sdql"
To cite SDQL, use the following BibTex:
@article{DBLP:journals/pacmpl/ShaikhhaHSO22,
author = {Amir Shaikhha and
Mathieu Huot and
Jaclyn Smith and
Dan Olteanu},
title = {Functional collection programming with semi-ring dictionaries},
journal = {Proc. {ACM} Program. Lang.},
volume = {6},
number = {{OOPSLA1}},
pages = {1--33},
year = {2022},
url = {https://doi.org/10.1145/3527333},
doi = {10.1145/3527333},
timestamp = {Tue, 10 Jan 2023 16:19:51 +0100},
biburl = {https://dblp.org/rec/journals/pacmpl/ShaikhhaHSO22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Depending on your usecase, the following papers are also relevant:
- SDQLpy, a python embedding of SDQL for query processing
@inproceedings{DBLP:conf/cc/ShahrokhiS23,
author = {Hesam Shahrokhi and
Amir Shaikhha},
editor = {Clark Verbrugge and
Ondrej Lhot{\'{a}}k and
Xipeng Shen},
title = {Building a Compiled Query Engine in Python},
booktitle = {Proceedings of the 32nd {ACM} {SIGPLAN} International Conference on
Compiler Construction, {CC} 2023, Montr{\'{e}}al, QC, Canada,
February 25-26, 2023},
pages = {180--190},
publisher = {{ACM}},
year = {2023},
url = {https://doi.org/10.1145/3578360.3580264},
doi = {10.1145/3578360.3580264},
timestamp = {Mon, 20 Feb 2023 14:39:08 +0100},
biburl = {https://dblp.org/rec/conf/cc/ShahrokhiS23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- SDQLite, a subset of SDQL for (sparse) tensor algebra
@article{DBLP:journals/pacmmod/SchleichSS23,
author = {Maximilian Schleich and
Amir Shaikhha and
Dan Suciu},
title = {Optimizing Tensor Programs on Flexible Storage},
journal = {Proc. {ACM} Manag. Data},
volume = {1},
number = {1},
pages = {37:1--37:27},
year = {2023},
url = {https://doi.org/10.1145/3588717},
doi = {10.1145/3588717},
timestamp = {Thu, 15 Jun 2023 21:57:49 +0200},
biburl = {https://dblp.org/rec/journals/pacmmod/SchleichSS23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- Forward-mode Automatic Differentiation for SDQLite
@inproceedings{DBLP:conf/cgo/ShaikhhaHH24,
author = {Amir Shaikhha and
Mathieu Huot and
Shideh Hashemian},
editor = {Tobias Grosser and
Christophe Dubach and
Michel Steuwer and
Jingling Xue and
Guilherme Ottoni and
ernando Magno Quint{\~{a}}o Pereira},
title = {A Tensor Algebra Compiler for Sparse Differentiation},
booktitle = {{IEEE/ACM} International Symposium on Code Generation and Optimization,
{CGO} 2024, Edinburgh, United Kingdom, March 2-6, 2024},
pages = {1--12},
publisher = {{IEEE}},
year = {2024},
url = {https://doi.org/10.1109/CGO57630.2024.10444787},
doi = {10.1109/CGO57630.2024.10444787},
timestamp = {Mon, 11 Mar 2024 13:45:28 +0100},
biburl = {https://dblp.org/rec/conf/cgo/ShaikhhaHH24.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}