Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] gluten1.2 read parquet v1 file has incorrect decimal(27,2) field values #7932

Open
liuchunhua opened this issue Nov 13, 2024 · 5 comments
Labels
bug Something isn't working triage

Comments

@liuchunhua
Copy link

Backend

VL (Velox)

Bug description

write

使用spark 3.2.4 原生版本生成文件

create table bigdata.tmp_decimal_col(
amt_int int,
amt_bigint long,
amt_decimal8 decimal(8, 2),
amt_decimal16 decimal(16, 2),
amt_decimal24 decimal(24,2),
amt_decimal27 decimal(27, 2),
amt_decimal32 decimal(32, 2),
amt_decimal38 decimal(38, 2)
);

insert into bigdata.tmp_decimal_col values(2322, 223243243224, 
2343.33, 343423423333.12, 
3242343243243232423.00,
3242343243243232423.00,
3242343243243232423.00,
3242343243243232423.00);

read

spark3.4.4 + gluten+ velox:
spark.read.parquet("/user/hive/warehouse/bigdata.db/tmp_decimal_col/part-00000-b1079bc2-7027-4b3b-99cb-83a187c30883-c000.snappy.parquet").show()

+-------+------------+------------+---------------+--------------------+--------------------+--------------------+--------------------+
|amt_int| amt_bigint|amt_decimal8| amt_decimal16| amt_decimal24| amt_decimal27| amt_decimal32| amt_decimal38|
+-------+------------+------------+---------------+--------------------+--------------------+--------------------+--------------------+
| 2322|223243243224| 2343.33|343423423333.12|32423432432432324...|90700136833857145...|32423432432432324...|32423432432432324...|
+-------+------------+------------+---------------+--------------------+--------------------+--------------------+--------------------+

gluten version

Backend: Velox
Backend Branch: HEAD
Backend Revision: 88856e6b139c761e7876b1cd3b29e8dad236d8c7
Backend Revision Time: 2024-08-20 16:45:45 +0800
GCC Version: gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
Gluten Branch: branch-1.2
Gluten Build Time: 2024-08-21T13:12:51Z
Gluten Repo URL: https://github.com/apache/incubator-gluten
Gluten Revision: c82af60
Gluten Revision Time: 2024-08-21 16:12:39 +0800
Gluten Version: 1.2.0
Hadoop Version: 2.7.4
Java Version: 1.8
Scala Version: 2.12.15
Spark Version: 3.4.2

Spark version

Spark-3.4.x

Spark configurations

No response

System information

No response

Relevant logs

No response

@liuchunhua liuchunhua added bug Something isn't working triage labels Nov 13, 2024
@FelixYBW
Copy link
Contributor

Can we try Velox directly? @rui-mo

@rui-mo
Copy link
Contributor

rui-mo commented Nov 14, 2024

@liuchunhua @FelixYBW Sure. I will take a look.

@liujiayi771
Copy link
Contributor

I can reproduce this issue in spark 3.3.1.

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.1
      /_/

Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_412)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.parquet("/user/hive/warehouse/bigdata.db/tmp_decimal_col/part-00000-4bb6f91f-c039-493c-8617-10f289fc063e-c000.snappy.parquet").show()
+-------+------------+------------+---------------+--------------------+--------------------+--------------------+--------------------+
|amt_int|  amt_bigint|amt_decimal8|  amt_decimal16|       amt_decimal24|       amt_decimal27|       amt_decimal32|       amt_decimal38|
+-------+------------+------------+---------------+--------------------+--------------------+--------------------+--------------------+
|   2322|223243243224|     2343.33|343423423333.12|32423432432432324...|90700136833857145...|32423432432432324...|32423432432432324...|
+-------+------------+------------+---------------+--------------------+--------------------+--------------------+--------------------+

@liujiayi771
Copy link
Contributor

liujiayi771 commented Nov 14, 2024

The latest code should have fixed the issue. The reason for the error was the incorrect handling of int128_t parse logic in IntDecoder for reading the int96 timestamp. This issue would only be triggered by decimal(27,s) and decimal(28,s).
cc @rui-mo @FelixYBW.

@rui-mo
Copy link
Contributor

rui-mo commented Nov 14, 2024

@liuchunhua You could pick facebookincubator/velox@da39954 to replace the original INT96 timestamp reader support in OAP/Velox to fix this issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

4 participants