Releases: chdb-io/chdb
Releases · chdb-io/chdb
v1.0.2
What's Changed
- Fix dbapi init issue and add dbapi basic tests by @Daniel-Robbins in #147
- TPCH and Movielens DNN examples in jupyter notebook by @Daniel-Robbins in #143
- Build py38 py39 py310 on macOS 12 by @Daniel-Robbins in #149
Full Changelog: v1.0.1...v1.0.2
v1.0.1
What's Changed
- feat: get latest tag by git rev-list by @nmreadelf in #142
- Lower the log level of "Lowered xxx cache size" by @Daniel-Robbins in #145
New Contributors
- @Daniel-Robbins made their first contribution in #145
Full Changelog: https://github.com/chdb-io/chdb/commits/v1.0.1
v1.0.0
v1.0.0rc3
v1.0.0rc2
v1.0.0rc1
v0.16.0rc2
v0.16.0rc1
chdb Release Summary
chdb 0.16 based on clickhouse 23.10
Query Enhancements
-
Vector Addition:
python3 -m chdb "SELECT [1, 2, 3] + [4, 5, 6]".
-
Omit file() Function:
python3 -m chdb "SELECT * from '/home/Clickhouse/bench/hits_0.parquet' limit 10".
-
NumPy as Input Format:
- Support for NumPy as an input format with the query
SELECT * FROM 'data.npy'.
- Support for NumPy as an input format with the query
-
Parquet Optimizations:
- Writing parquet files is 10x faster, it's multi-threaded now. Almost the same speed as reading.
- Parquet filter pushdown. I.e. when reading Parquet files, row groups (chunks of the file) are skipped based on the WHERE condition and the min/max values in each column.
- Optimize reading small row groups by batching them together in Parquet.
-
Condition Pushdown for ORC:
- Using data skipping indices in
ORC, similarly toParquet.
- Using data skipping indices in
-
PRQL Support:
- Added support for
PRQLas a query language.
- Added support for
-
urlCluster Function:
- Add
urlClustertable function.
- Add
New Features
- Introducing
arrayFoldfor applying a lambda function to multiple arrays. - Extended support for asynchronous inserts with external data via the native protocol.
- Introduced function
jsonMergePatchfor merging JSON strings. - Continued support for Kusto Query Language dialect with Phase 1 implementation.
- Introduced a new SQL functionarrayRandomSamplefor sampling elements from an input array.
- Added support for dropping cache for Protobuf format withSYSTEM DROP SCHEMA FORMAT CACHE [FOR Protobuf]. - Conditions on arguments of a table with a space-filling curve in its key can now be used for indexing.
- New setting
force_optimize_projection_namechecks that a projection is used in the query. - Added aggregation function
lttbusing the Largest-Triangle-Three-Buckets algorithm for downsampling data. CHECK TABLEquery has better performance and usability, supporting checking particular parts.
- Introduced functionbyteSwapfor reversing the bytes of unsigned integers.
- Added functionsformatQueryandformatQuerySingleLinefor formatted SQL query output.
- Introduced DWARF input format for reading debug symbols from an ELF file.
- IntroducedSHOW SETTING setting_nameas a simpler version ofSHOW SETTINGS.
- Added fieldssubstreamsandfilenamesto thesystem.parts_columnstable.
- Introduced a settingcreate_table_empty_primary_key_by_defaultfor defaultORDER BY ().
Performance Improvements
- Fixed contention on Context lock, significantly improving performance for short-running concurrent queries.
- Improved the performance of inverted index creation by 30%.
- Optimized memory consumption for external aggregation with many temporary files.
- Added option
query_plan_preserve_num_streams_after_window_functionsto preserve the number of streams after evaluating window functions. - Released more streams if data is small, optimizing resource usage.
- Optimized RoaringBitmaps before serialization.
- Optimized inverted index posting lists to use the smallest possible representation.
- Set a reasonable size for the marks cache for secondary indices by default.
- Avoided unnecessary reconstruction of index granules when reading skip indexes.
- Cached CAST function in set during execution to improve the performance of function
INwhen set element type doesn't match column type. - Improved write performance to EmbeddedRocksDB tables.
- Improved overall resilience for ClickHouse in case of many parts within a partition.
- Reduced memory consumption during loading of hierarchical dictionaries.
- All dictionaries now support the setting
dictionary_use_async_executor. - Prevented excessive memory usage when deserializing
AggregateFunctionTopKGenericData. - Reduced CPU consumption for AsyncMetrics threads on a Keeper with lots of watches.
- Experimental inverted indexes now do not store tokens with too many matches, saving space.
- Improved write performance to EmbeddedRocksDB tables.
- Improved write performance to hierarchical dictionaries.
v0.15.0
What's Changed
- Enable hdfs, avro and rapidJson/simdJson by @nmreadelf in #123
Full Changelog: v0.14.2...v0.15.0
