Skip to content

Commit 940c673

Browse files
committed
Add basic example with netflix movie and tv series data
* show case the usage of SQLMesh * used tests to validate model behaviour with test data * used audits to validate data correctness
1 parent bdb5c1f commit 940c673

20 files changed

+20801
-0
lines changed

.gitignore

+273
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,273 @@
1+
### JetBrains template
2+
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
3+
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
4+
5+
.idea
6+
7+
# Gradle and Maven with auto-import
8+
# When using Gradle or Maven with auto-import, you should exclude module files,
9+
# since they will be recreated, and may cause churn. Uncomment if using
10+
# auto-import.
11+
# .idea/artifacts
12+
# .idea/compiler.xml
13+
# .idea/jarRepositories.xml
14+
# .idea/modules.xml
15+
# .idea/*.iml
16+
# .idea/modules
17+
# *.iml
18+
# *.ipr
19+
20+
# CMake
21+
cmake-build-*/
22+
23+
# Mongo Explorer plugin
24+
.idea/**/mongoSettings.xml
25+
26+
# File-based project format
27+
*.iws
28+
29+
# IntelliJ
30+
out/
31+
32+
# mpeltonen/sbt-idea plugin
33+
.idea_modules/
34+
35+
# JIRA plugin
36+
atlassian-ide-plugin.xml
37+
38+
# Cursive Clojure plugin
39+
.idea/replstate.xml
40+
41+
# SonarLint plugin
42+
.idea/sonarlint/
43+
44+
# Crashlytics plugin (for Android Studio and IntelliJ)
45+
com_crashlytics_export_strings.xml
46+
crashlytics.properties
47+
crashlytics-build.properties
48+
fabric.properties
49+
50+
# Editor-based Rest Client
51+
.idea/httpRequests
52+
53+
# Android studio 3.1+ serialized cache file
54+
.idea/caches/build_file_checksums.ser
55+
56+
### PyCharm template
57+
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
58+
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
59+
60+
# User-specific stuff
61+
62+
# AWS User-specific
63+
64+
# Generated files
65+
66+
# Sensitive or high-churn files
67+
68+
# Gradle
69+
70+
# Gradle and Maven with auto-import
71+
# When using Gradle or Maven with auto-import, you should exclude module files,
72+
# since they will be recreated, and may cause churn. Uncomment if using
73+
# auto-import.
74+
# .idea/artifacts
75+
# .idea/compiler.xml
76+
# .idea/jarRepositories.xml
77+
# .idea/modules.xml
78+
# .idea/*.iml
79+
# .idea/modules
80+
# *.iml
81+
# *.ipr
82+
83+
# CMake
84+
85+
# Mongo Explorer plugin
86+
87+
# File-based project format
88+
89+
# IntelliJ
90+
91+
# mpeltonen/sbt-idea plugin
92+
93+
# JIRA plugin
94+
95+
# Cursive Clojure plugin
96+
97+
# SonarLint plugin
98+
99+
# Crashlytics plugin (for Android Studio and IntelliJ)
100+
101+
# Editor-based Rest Client
102+
103+
# Android studio 3.1+ serialized cache file
104+
105+
### Python template
106+
# Byte-compiled / optimized / DLL files
107+
__pycache__/
108+
*.py[cod]
109+
*$py.class
110+
111+
# C extensions
112+
*.so
113+
114+
# Distribution / packaging
115+
.Python
116+
build/
117+
develop-eggs/
118+
dist/
119+
downloads/
120+
eggs/
121+
.eggs/
122+
lib/
123+
lib64/
124+
parts/
125+
sdist/
126+
var/
127+
wheels/
128+
share/python-wheels/
129+
*.egg-info/
130+
.installed.cfg
131+
*.egg
132+
MANIFEST
133+
134+
# PyInstaller
135+
# Usually these files are written by a python script from a template
136+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
137+
*.manifest
138+
*.spec
139+
140+
# Installer logs
141+
pip-log.txt
142+
pip-delete-this-directory.txt
143+
144+
# Unit test / coverage reports
145+
htmlcov/
146+
.tox/
147+
.nox/
148+
.coverage
149+
.coverage.*
150+
.cache
151+
nosetests.xml
152+
coverage.xml
153+
*.cover
154+
*.py,cover
155+
.hypothesis/
156+
.pytest_cache/
157+
cover/
158+
159+
# Translations
160+
*.mo
161+
*.pot
162+
163+
# Django stuff:
164+
*.log
165+
local_settings.py
166+
db.sqlite3
167+
db.sqlite3-journal
168+
169+
# Flask stuff:
170+
instance/
171+
.webassets-cache
172+
173+
# Scrapy stuff:
174+
.scrapy
175+
176+
# Sphinx documentation
177+
docs/_build/
178+
179+
# PyBuilder
180+
.pybuilder/
181+
target/
182+
183+
# Jupyter Notebook
184+
.ipynb_checkpoints
185+
186+
# IPython
187+
profile_default/
188+
ipython_config.py
189+
190+
# pyenv
191+
# For a library or package, you might want to ignore these files since the code is
192+
# intended to run in multiple environments; otherwise, check them in:
193+
# .python-version
194+
195+
# pipenv
196+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
197+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
198+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
199+
# install all needed dependencies.
200+
#Pipfile.lock
201+
202+
# poetry
203+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
204+
# This is especially recommended for binary packages to ensure reproducibility, and is more
205+
# commonly ignored for libraries.
206+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
207+
#poetry.lock
208+
209+
# pdm
210+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
211+
#pdm.lock
212+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
213+
# in version control.
214+
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
215+
.pdm.toml
216+
.pdm-python
217+
.pdm-build/
218+
219+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
220+
__pypackages__/
221+
222+
# Celery stuff
223+
celerybeat-schedule
224+
celerybeat.pid
225+
226+
# SageMath parsed files
227+
*.sage.py
228+
229+
# Environments
230+
.env
231+
.venv
232+
env/
233+
venv/
234+
ENV/
235+
env.bak/
236+
venv.bak/
237+
238+
# Spyder project settings
239+
.spyderproject
240+
.spyproject
241+
242+
# Rope project settings
243+
.ropeproject
244+
245+
# mkdocs documentation
246+
/site
247+
248+
# mypy
249+
.mypy_cache/
250+
.dmypy.json
251+
dmypy.json
252+
253+
# Pyre type checker
254+
.pyre/
255+
256+
# pytype static type analyzer
257+
.pytype/
258+
259+
# Cython debug symbols
260+
cython_debug/
261+
262+
# PyCharm
263+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
264+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
265+
# and can be added to the global gitignore or merged into this file. For a more nuclear
266+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
267+
#.idea/
268+
269+
# some additional exlcudes for this project
270+
/logs/
271+
/.cache/
272+
/sqlmesh.db
273+
/uv.lock

.python-version

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.12

README.md

+81
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Setup guide
2+
3+
## DuckDB CLI
4+
5+
**Install duckdb CLI (on MacOS)**
6+
```bash
7+
brew install duckdb
8+
```
9+
10+
**Install duckdb CLI (on Windows)**
11+
```bash
12+
winget install DuckDB.cli
13+
```
14+
15+
## Python environment
16+
17+
I've used for this project python version 3.12
18+
19+
**Setup project with pip**
20+
```bash
21+
python3 -m venv .venv
22+
source .venv/bin/activate
23+
pip install "sqlmesh[llm,postgres,web]"
24+
```
25+
26+
**Setup project with uv**
27+
```bash
28+
uv venv -p python3.12
29+
source .venv/bin/activate
30+
uv sync
31+
```
32+
33+
## postgres as state database (optional)
34+
35+
If you want to try out postgres as a dedicated state database for SQLMesh, you can use the compose.yml to start a docker
36+
container for postgres and one for adminer (http://localhost:8080) to connect via web ui with the instance.
37+
38+
If you don't want to use this setup, you can simply remove the `state_connection` from the `config.yaml.
39+
```diff
40+
gateways:
41+
local:
42+
connection:
43+
type: duckdb
44+
database: sqlmesh.db
45+
- state_connection:
46+
- type: postgres
47+
- host: localhost
48+
- port: 5432
49+
- user: sqlmesh
50+
- password: sqlmesh
51+
- database: sqlmesh
52+
```
53+
SQLMesh will then implicitly use the connection as the state store.
54+
55+
**Install docker desktop (on MacOS)**
56+
```bash
57+
brew install --cask docker
58+
```
59+
60+
**Install docker desktop (on Windows)**
61+
Please follow the guide on https://docs.docker.com/desktop/setup/install/windows-install/
62+
63+
**Start postgres and adminer container in background**
64+
```bash
65+
docker compose up -d
66+
```
67+
68+
# development workflow
69+
70+
```bash
71+
sqlmesh plan
72+
```
73+
74+
## DuckDB as storage database
75+
76+
Query tables with duckdb CLI:
77+
```bash
78+
duckdb sqlmesh.db "SELECT * FROM imdb.netflix"
79+
duckdb sqlmesh.db "SELECT * FROM imdb.trends_by_year"
80+
duckdb sqlmesh.db "SELECT * FROM imdb.trends_by_country"
81+
```

audits/.gitkeep

Whitespace-only changes.

audits/assert_not_null_country.sql

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
AUDIT (
2+
name assert_not_null_country
3+
);
4+
5+
SELECT
6+
*
7+
FROM @this_model
8+
WHERE
9+
country IS NULL

audits/assert_not_null_genre.sql

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
AUDIT (
2+
name assert_not_null_genre
3+
);
4+
5+
SELECT
6+
*
7+
FROM @this_model
8+
WHERE
9+
genre IS NULL

compose.yml

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
services:
2+
db:
3+
image: postgres
4+
restart: always
5+
shm_size: 128mb
6+
environment:
7+
POSTGRES_USER: sqlmesh
8+
POSTGRES_PASSWORD: sqlmesh
9+
ports:
10+
- "5432:5432"
11+
12+
adminer:
13+
image: adminer
14+
restart: always
15+
ports:
16+
- "8080:8080"

0 commit comments

Comments
 (0)