Skip to content

Add input_file() to stream input from a file object#120

Open
Mortal wants to merge 1 commit into
mwilliamson:masterfrom
Mortal:input_file
Open

Add input_file() to stream input from a file object#120
Mortal wants to merge 1 commit into
mwilliamson:masterfrom
Mortal:input_file

Conversation

@Mortal

@Mortal Mortal commented Jun 3, 2025

Copy link
Copy Markdown

I would love to use jq for streaming JSON input instead of using e.g. ijson, but the current jq.py interface only supports providing the entire input in one go.

Instead of calling jv_parser_set_buf once in __cinit__, call it several times like in jv_file.c.

Since the utf8 backtracking code in jv_unicode.h is not shipped with jq, the interface only supports text-mode files, such that Python ensures that we do not pass data with split utf8 codepoints to jq.

@Mortal

Mortal commented Jun 3, 2025

Copy link
Copy Markdown
Author

Hi there, besides text-mode files I looked into adding support for binary-mode files (commit 46e54e7) but I decided to keep it out of this PR since it requires copying the "utf8 backtracking" function from jq, and in Python contexts it's usually not a big problem to require the user to provide 'str' instead of 'bytes'.

PS. Sorry for not opening an issue first (per the contribution guidelines) - I didn't read the contribution guidelines before starting to implement this ...

@hansthen

hansthen commented Jun 3, 2025

Copy link
Copy Markdown

Does this support the inputs statement in jq? See https://jqlang.org/manual/#inputs.

@Mortal

Mortal commented Jun 3, 2025

Copy link
Copy Markdown
Author

Hi @hansthen, this PR doesn't implement support for the inputs statement, since that would require supplying an implementation for 'input' using the jq_set_input_cb function somehow. In jq's CLI program itself that's implemented using jq_util, which isn't really available right now in jq.py - so I think it would require a big inversion of control in jq.py to implement input (and thus inputs).

@hansthen

hansthen commented Jun 3, 2025

Copy link
Copy Markdown

Okay good to know then.

I would love to use jq for streaming JSON input instead of using e.g. ijson,
but the current jq.py interface only supports providing the entire input in one go.

Instead of calling jv_parser_set_buf once in __cinit__,
call it several times like in jv_file.c.

Since the utf8 backtracking code in jv_unicode.h is not shipped with jq,
the interface only supports text-mode files, such that Python ensures
that we do not pass data with split utf8 codepoints to jq.
@spbnick

spbnick commented Jun 4, 2025

Copy link
Copy Markdown
Contributor

Oh, nice, somebody else seems to need what we needed! We've been maintaining a fork with support for reading streams of JSON values for a while now after failing to come to a good agreement with @mwilliamson. It also doesn't implement support for inputs, but we've been using it quite a bit.

Here's the latest changeset on top of 1.7.0: https://github.com/kernelci/jq.py/releases/tag/1.7.0.post1

I've been trying sporadically to get some changes upstream, but running out of time/steam. Sorry, Michael! Now, though, I don't work on that project anymore, so that's not likely to continue. Hope you find something useful there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants