Skip to content

Capability of pushing down the query and only return the final result set from data source by session #3203

Closed
@RugratsJ

Description

@RugratsJ

Currently many Presto (JDBC based) connectors such as mysql, postgres, Oracle are single threaded. When doing aggregation calculation, Presto will do a full table scan, then do the aggregation on all records returned from the data source, which is very slow in some case. For example, for TPCH 100G orders 150M records table stored in Postgresql, if we do the following count aggregation query on presto, it took 8 minutes 30 seconds to return the result.

presto> select o_custkey, count() from pgtpch100g.public.orders where to_char(o_orderdate, 'mm/yyyy') = '03/1997' group by o_custkey having count() > 4;
o_custkey | _col1
-----------+-------
10843207 | 5
13299838 | 5
1809103 | 5
12675628 | 5
12038404 | 5
(5 rows)

Query 20200322_190724_00019_njknr, FINISHED, 5 nodes
Splits: 177 total, 177 done (100.00%)
8:30 [150M rows, 0B] [294K rows/s, 0B/s]

While in postgresql itself, it only took 5 seconds for the results to return.

tpch100g=# EXPLAIN ANALYSE select o_custkey, count() from orders where to_char(o_orderdate, 'mm/yyyy') = '03/1997' group by o_custkey having count() > 4;
QUERY PLAN
Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=0 width=0) (actual time=5134.805..5134.806 rows=5 loops=1)
Task Count: 32
Tasks Shown: One of 32
-> Task
Node: host=myhost port=5432 dbname=tpch100g
-> Finalize GroupAggregate (cost=122383.11..125236.20 rows=7492 width=16) (actual time=1396.918..1396.918 rows=0 loops=1)
Group Key: o_custkey
Filter: (count(*) > 4)
Rows Removed by Filter: 55564
-> Gather Merge (cost=122383.11..124808.74 rows=19534 width=16) (actual time=1356.274..1378.625 rows=58563 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=121383.09..121554.01 rows=9767 width=16) (actual time=1354.141..1363.419 rows=19521 loops=3)
Group Key: o_custkey
-> Sort (cost=121383.09..121407.51 rows=9767 width=8) (actual time=1354.135..1356.468 rows=20041 loops=3)
Sort Key: o_custkey
Sort Method: quicksort Memory: 1721kB
Worker 0: Sort Method: quicksort Memory: 1699kB
Worker 1: Sort Method: quicksort Memory: 1704kB
-> Parallel Seq Scan on orders_102808 orders (cost=0.00..120735.84 rows=9767 width=8) (actual time=0.305..1346.003 rows=20041 loops=3)
Filter: (to_char((o_orderdate)::timestamp with time zone, 'mm/yyyy'::text) = '03/1997'::text)
Rows Removed by Filter: 1542604
Planning Time: 0.086 ms
Execution Time: 1397.120 ms
Planning Time: 1.381 ms
Execution Time: 5134.832 ms
(26 rows)

In the above example, it's 8.5 minutes in Presto vs 5 seconds in Postgresql Citus cluster. It's a big difference.

It will be nice to have a switch that can be turned on/off in each session, that we could decide if we want to have Presto to return the full table do the aggregation, or we could push down to the data source to do the aggregation, and only returns the final results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions