-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DSIP-78][Data Quality] Suggest remove data quality module #16728
Comments
+1, if we don't remove this, we should rewrite it, this is currently blocking the itegration of the code base. |
I generally agree with it, data quality is an important module of our project we not only have a data quality type task but also have a data quality result sub-module in UI, if we want to remove it we should earn enough +1 vote in our community |
Yes, I think this operation should earn enough +1 vote.
|
+1 |
+1, I think data quality should be stripped of DS and maintained as an independent plug-in. |
+1, The current data quality task type is barely used in our team |
+1 |
-1 For me We use data quality in our scene, I can maintain or rewrite this module |
You can create a new DSIP issue and put full design detail in it for discussion. |
Reminder: this is not a vote now, it's currently a discussion. Please provide your detailed opinions as far as possible, and this will help the community make a better choice. By the way, voting should take place in the dev mailing list. |
+1 ,In fact, some things can be added as plug-ins, and DS should focus on the core functions. If DS want to build an ecosystem, but DS do what other projects are good at, and then DS are not as good as them,DS will be dissed by users. Moreover, other projects will not play withDS because of the overlap of functions. This is not a good idea. |
I have a question that users have used this module in their production environment. If we remove DQ module, how can they upgrade their environment? |
We'll add it to incompatible change docs. |
+1 |
+1,Supporting the removal of this module. The code quality is not good and there are multiple CVE vulnerabilities which are fatal problems. At the same time, data quality management is closely related to business data and is not considered as the core advantage function point of our platform. |
-1 For me |
+1 |
-1 For me |
-1 For me |
+1 Too dependent on external components, not belonging to its own field. Support removal. |
Our team use data quality, although it's hard to use. So I wish a rewrite instead of a removal. |
-1 For me I agree to move codebase of DQ from dolphinscheduler main project into another ecological project just like the relationship of apache flink cdc, flink connector kafka and apache flink. https://github.com/apache/flink-cdc The problem is NOT voting remove DQ module or not but we should refactor DQ module to improve code quality to make it extensible to support various DQ tasks. It cannot be denied that DQ is a good application scenarios for dolphinscheduler. Unfortunately current DQ task can NOT transfer result to downstream task, so it can NOT fulfil the requirement of automatic data reconciliation. |
-1,depend on hadoop/spark framework and too many bugs |
I think this module should not be removed, data quality is a very important part of big data, the current mainstream big data products have the data quality module, if removed, developers may need to write their own programs to complete the quality check task, which is a very painful thing for developers.our team is using data quality, although it's hard to use. So I hope it should be rewritten instead of deleted. |
Search before asking
Motivation
The current
data quality
task type can hardly be used normally. Since version 3.2.0, this task type has been unable to be used normally, which is equivalent to leaving it vacant.I search in the issue list found that
data quality
task has a lot of bug issue, and no one maintains them, and there are also a lot of CVEs.Most importantly,
data quality
is seriously coupled to the current code base. So that dependencies can't be optimized, binary package size can't be reduced, and code maintenance cost is extremely high.So I suggest remove it.
Design Detail
No response
Compatibility, Deprecation, and Migration Plan
No response
Test Plan
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: