Skip to content

Athena read_sql_query provides completely wrong results for qmark style parametrized queries with cache enabled #2956

@scimas

Description

@scimas

Describe the bug

Exactly as stated in the title. If you're querying Athena using a qmark style query and use the query cache, and you change the actual parameter values between two query runs, aws-sdk-pandas/awswrangler retrieves the result of the previous query run with incorrect parameter values.

How to Reproduce

import awswrangler as wr

df = wr.athena.read_sql_query(
    sql="SELECT distinct id FROM my_database.my_table WHERE id in (?, ?)",
    database="my_database",
    ctas_approach=False,
    unload_approach=False,
    params=["id1", "id2"],
    paramstyle="qmark",
    workgroup="primary",
    athena_cache_settings={"max_cache_seconds": 300}
)
print(df)

df2 = wr.athena.read_sql_query(
    sql="SELECT distinct id FROM my_database.my_table WHERE id in (?, ?)",
    database="my_database",
    ctas_approach=False,
    unload_approach=False,
    params=["id3", "id4"],
    paramstyle="qmark",
    workgroup="primary",
    athena_cache_settings={"max_cache_seconds": 300}
)
print(df2)

this with a database and table should be enough to reproduce the issue.

Expected behavior

Since the parameters to the query have changed, a fresh query should be run.

Your project

No response

Screenshots

No response

OS

Amazon Linux 2023

Python version

3.12

AWS SDK for pandas version

layer:AWSSDKPandas-Python312-Arm64:12

Additional context

My guess is that this is happening due to aws-sdk-pandas using a hand-rolled cache implementation rather than just letting Athena handle the cache through the ResultReuseConfiguration of StartQueryExecution. As far as I know, Athena does not populate the query parameter values in the GetQueryExecution output. So you get the parametrized query without the parameter values and check the current query against it, which is incorrect.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions