issue:998 Correctly identify the first chunk across pause/resume #1107

daniele-pini · 2025-09-02T14:40:29Z

Brief description

This PR fixes unexpected header recognitions across calls to parser.pause() / parser.resume().

Issue #998 was happening because the header recognition code was called multiple times for rows that were not the first, when the above mentioned functions were invoked.

daniele-pini · 2025-09-02T14:47:09Z

papaparse.js

 		this.parseChunk = function(chunk, isFakeChunk)
 		{
+			var notFirstChunk = !this.isFirstChunk;
+


Explaining the changes to facilitate the review. When parsing chunks, we want parsing to keep track of whether some previous chunk was already processed or not. This is basically to know whether header recognition should trigger.

We already have this information in the ChunkStreamer as the isFirstChunk property. We need to invert it before passing it down in order to make the default a falsy value. This is used for example at this point:

PapaParse/papaparse.js

Lines 1350 to 1355 in b10b87e

var preview = new Parser({

comments: comments,

delimiter: delim,

newline: newline,

preview: 10

}).parse(input);

NOTE: I think the first chunk may not necessarily contain the header, although it usually does. It depends on how big the header is and whether skipFirstNLines was used. In general, we could find the header on following chunks.

This is also a problem in current implementation, but fixing that would require further refactoring that I'm not comfortable carrying out myself. I would like to leave this problem to a future PR.

The correct way to refactor all this really right, I think, should be to move the headerParsed variable out of the Parser class somehow, because that one gets reinitialized all the time. A "parseContext" variable initialized in the highest level parse function could be used for that.

daniele-pini · 2025-09-02T14:54:10Z

papaparse.js

-				if (config.header && !baseIndex && data.length && !headerParsed)
+				if (config.header && !notFirstChunk && data.length && !headerParsed)


The notFirstChunk variable gets passed down the parsing call chain up to this point. Previously, the baseIndex variable was used to determine whether to stop header recognition, while we now use the explicit notFirstChunk parameter.

The baseIndex (i.e. the cursor in the streamed file at the start of the chunk) was probably intended to be used for this role, except this doesn't work for the "fake chunks" generated when pausing and resuming. In particular, the first real chunk has a baseIndex of 0, and "fake chunks" inside it would also use that - which caused the header recognition to trigger multiple times.

Correctly identify the first chunk across pause/resume

ef88fe4

daniele-pini commented Sep 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

issue:998 Correctly identify the first chunk across pause/resume #1107

issue:998 Correctly identify the first chunk across pause/resume #1107

Uh oh!

daniele-pini commented Sep 2, 2025

Uh oh!

daniele-pini Sep 2, 2025 •

edited

Loading

Uh oh!

daniele-pini Sep 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	var preview = new Parser({
	comments: comments,
	delimiter: delim,
	newline: newline,
	preview: 10
	}).parse(input);

		if (config.header && !baseIndex && data.length && !headerParsed)
		if (config.header && !notFirstChunk && data.length && !headerParsed)

issue:998 Correctly identify the first chunk across pause/resume #1107

Are you sure you want to change the base?

issue:998 Correctly identify the first chunk across pause/resume #1107

Uh oh!

Conversation

daniele-pini commented Sep 2, 2025

Brief description

Uh oh!

daniele-pini Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniele-pini Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

daniele-pini Sep 2, 2025 •

edited

Loading

daniele-pini Sep 2, 2025 •

edited

Loading