-
Notifications
You must be signed in to change notification settings - Fork 1.2k
issue:998 Correctly identify the first chunk across pause/resume #1107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -507,6 +507,8 @@ License: MIT | |
|
|
||
| this.parseChunk = function(chunk, isFakeChunk) | ||
| { | ||
| var notFirstChunk = !this.isFirstChunk; | ||
|
|
||
| // First chunk pre-processing | ||
| const skipFirstNLines = parseInt(this._config.skipFirstNLines) || 0; | ||
| if (this.isFirstChunk && skipFirstNLines > 0) { | ||
|
|
@@ -530,7 +532,7 @@ License: MIT | |
| // Rejoin the line we likely just split in two by chunking the file | ||
| var aggregate = this._partialLine + chunk; | ||
| this._partialLine = ''; | ||
| var results = this._handle.parse(aggregate, this._baseIndex, !this._finished); | ||
| var results = this._handle.parse(aggregate, this._baseIndex, !this._finished, notFirstChunk); | ||
|
|
||
| if (this._handle.paused() || this._handle.aborted()) { | ||
| this._halted = true; | ||
|
|
@@ -1080,7 +1082,7 @@ License: MIT | |
| * and ignoreLastRow parameters. They are used by streamers (wrapper functions) | ||
| * when an input comes in multiple chunks, like from a file. | ||
| */ | ||
| this.parse = function(input, baseIndex, ignoreLastRow) | ||
| this.parse = function(input, baseIndex, ignoreLastRow, notFirstChunk) | ||
| { | ||
| var quoteChar = _config.quoteChar || '"'; | ||
| if (!_config.newline) | ||
|
|
@@ -1111,7 +1113,7 @@ License: MIT | |
|
|
||
| _input = input; | ||
| _parser = new Parser(parserConfig); | ||
| _results = _parser.parse(_input, baseIndex, ignoreLastRow); | ||
| _results = _parser.parse(_input, baseIndex, ignoreLastRow, notFirstChunk); | ||
| processResults(); | ||
| return _paused ? { meta: { paused: true } } : (_results || { meta: { paused: false } }); | ||
| }; | ||
|
|
@@ -1458,7 +1460,7 @@ License: MIT | |
| var cursor = 0; | ||
| var aborted = false; | ||
|
|
||
| this.parse = function(input, baseIndex, ignoreLastRow) | ||
| this.parse = function(input, baseIndex, ignoreLastRow, notFirstChunk) | ||
| { | ||
| // For some reason, in Chrome, this speeds things up (!?) | ||
| if (typeof input !== 'string') | ||
|
|
@@ -1740,7 +1742,7 @@ License: MIT | |
| /** Returns an object with the results, errors, and meta. */ | ||
| function returnable(stopped) | ||
| { | ||
| if (config.header && !baseIndex && data.length && !headerParsed) | ||
| if (config.header && !notFirstChunk && data.length && !headerParsed) | ||
|
Comment on lines
-1743
to
+1745
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The The |
||
| { | ||
| const result = data[0]; | ||
| const headerCount = Object.create(null); // To track the count of each base header | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explaining the changes to facilitate the review. When parsing chunks, we want parsing to keep track of whether some previous chunk was already processed or not. This is basically to know whether header recognition should trigger.
We already have this information in the
ChunkStreameras theisFirstChunkproperty. We need to invert it before passing it down in order to make the default a falsy value. This is used for example at this point:PapaParse/papaparse.js
Lines 1350 to 1355 in b10b87e
NOTE: I think the first chunk may not necessarily contain the header, although it usually does. It depends on how big the header is and whether
skipFirstNLineswas used. In general, we could find the header on following chunks.This is also a problem in current implementation, but fixing that would require further refactoring that I'm not comfortable carrying out myself. I would like to leave this problem to a future PR.
The correct way to refactor all this really right, I think, should be to move the headerParsed variable out of the
Parserclass somehow, because that one gets reinitialized all the time. A "parseContext" variable initialized in the highest level parse function could be used for that.