-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pyTigerGraphLoading.py to add support on direct data loading #260
Conversation
Add function to support data loading from a string directly instead of a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we auto generate the docs from docstrings so we need to have full docstring on for runLoadingJobWithData. Is the data
parameter a true string or a bytestring (which is what we read the filepath as with runLoadingJobWithData)
FILENAME definition will be updated to point to the data received. | ||
|
||
NOTE: The argument `USING HEADER="true"` in the GSQL loading job may not be enough to | ||
load the file correctly. Remove the header from the data file before using this function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment is confusing, same for runLoadingJobWithFile
So header should be removed before calling these two functions. If loading job still has Using header=true, will the first line be ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original function does not support it hence I did not make any change on it yet.
Actually I'd prefer to support HEADER=true in these 2 functions hence user can provide the parameters according to the loading job. @parkererickson-tg do you have any background information on why it might not loading correctly with HEADER specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a long-time bug in the ddl system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
header=False will be required in df.to_csv() in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, make it explicit that USING HEADER=false in loading job definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HEADER=false is actually the default behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I mean user should not set USING HEADER=true in loading job in this case? otherwise they will lose 1 row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get some unit tests on this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit Test: FAILURE, Jenkins_job:http://192.168.99.101:30080/job/mlwb_build/1232/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit Test: SUCCESS, e2e Test: SKIPPED, Jenkins_job:http://192.168.99.101:30080/job/mlwb_build/1234/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QE Approved
@parkererickson-tg where do we put the unit test? Is there an example? |
Looks like we actually are missing tests on our entire loading job execution functionality... here is a test file for our vertex functions: https://github.com/tigergraph/pyTigerGraph/blob/master/tests/test_pyTigerGraphVertex.py. We put test fixtures like GSQL files here: https://github.com/tigergraph/pyTigerGraph/blob/master/tests/fixtures/create_query_simple.gsql. |
@chengbiao-jin It would be nice if you could add the support for async functionality as well, as I just merged that PR today, which was a pretty large refactor. If you don't have bandwidth, I can probably pick it up this week. |
I'll find some time work on it tomorrow.
…On Mon, Oct 28, 2024 at 1:27 PM Parker Erickson ***@***.***> wrote:
@chengbiao-jin <https://github.com/chengbiao-jin> It would be nice if you
could add the support for async functionality as well, as I just merged
that PR today, which was a pretty large refactor. If you don't have
bandwidth, I can probably pick it up this week.
—
Reply to this email directly, view it on GitHub
<#260 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKYX4JDXZTL7LTO4OEBFBPTZ52M25AVCNFSM6AAAAABQPTLEI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBSGU2TMMBXGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Add function to support data loading from a string directly instead of a file.