Skip to content

BUG: Can't handle multiline strings? #2726

@alexkreidler

Description

@alexkreidler

Describe the bug
qsv select fails when it comes across a CSV multiline string (e.g. quote, then multiple newlines and endquote)
qsv stats fails with SIGSEGV (Address boundary error)

To Reproduce
Steps to reproduce the behavior:

Create a file simple.csv with this data:

id,date_created,date_modified,judges,date_filed,date_filed_is_approximate,slug,case_name_short,case_name,case_name_full,scdb_id,scdb_decision_direction,scdb_votes_majority,scdb_votes_minority,source,procedural_history,attorneys,nature_of_suit,posture,syllabus,headnotes,summary,disposition,history,other_dates,cross_reference,correction,citation_count,precedential_status,date_blocked,blocked,filepath_json_harvard,filepath_pdf_harvard,docket_id,arguments,headmatter
"1553297","2014-10-30 18:18:48.065126+00","2024-11-02 17:57:19.548229+00","Cecelia H. Goetz","1985-10-29","f","ambico-inc-v-aic-photo-inc-in-re-aic-photo-inc","In re AIC","Ambico, Inc. v. AIC Photo, Inc. (In Re AIC Photo, Inc.)","In the Matter of AIC PHOTO, INC., Et Al., Debtor. AMBICO, INC., Plaintiff, v. AIC PHOTO, INC., Defendant","",,,,"LU","","Patterson, Belknap, Webb & Tyler, New York City by Scott Horton, for debtors., Windels, Marx, Davies & Ives, New York City by Christopher T. Ragucci, for Ambi-co., Zalkin, Rodin & Goodman, New York City by Andrew D. Gottfried, for Manufacturers Hanover Trust.","","","","","","","","","","","5","Published",,"f","law.free.cap.br.57/56.6097565.json","harvard_pdf/1553297.pdf","1638530","","<parties id=\"b112-20\">
    In the Matter of AIC PHOTO, INC., et al., Debtor. AMBICO, INC., Plaintiff, v. AIC PHOTO, INC., Defendant.
   </parties><docketnumber id=\"Ag36\">
    Bankruptcy No. 185-50388-21.
   </docketnumber><docketnumber id=\"Aif\">
    Adv. No. 185-0055.
   </docketnumber><court id=\"ABf\">
    United States Bankruptcy Court, E.D. New York.
   </court><decisiondate id=\"AE5\">
    Oct. 29, 1985.
   </decisiondate><br><attorneys id=\"b113-8\">
<span citation-index=\"1\" class=\"star-pagination\" label=\"57\">
     *57
     </span>
    Patterson, Belknap, Webb &amp; Tyler, New York City by Scott Horton, for debtors.
   </attorneys><br><attorneys id=\"b113-9\">
    Windels, Marx, Davies &amp; Ives, New York City by Christopher T. Ragucci, for Ambi-co.
   </attorneys><br><attorneys id=\"b113-10\">
    Zalkin, Rodin &amp; Goodman, New York City by Andrew D. Gottfried, for Manufacturers Hanover Trust.
   </attorneys>"
"10154826","2024-10-22 16:55:01.985413+00","2024-10-22 16:55:02.172053+00","","2008-01-10","f","state-v-woods","Woods","State v. Woods","","",,,,"C","","","","","","","","","","","","","0","Unpublished","2024-10-22","t","","","69293379","",""
"10154827","2024-10-22 16:55:02.320424+00","2024-10-22 16:55:02.457429+00","","2008-01-10","f","state-v-weathersbee","Weathersbee","State v. Weathersbee","","",,,,"C","","","","","","","","","","","","","0","Unpublished","2024-10-22","t","","","69293380","",""
"10154828","2024-10-22 16:55:02.684449+00","2024-10-22 16:55:02.816383+00","","2008-01-10","f","state-v-walker","Walker","State v. Walker","","",,,,"C","","","","","","","","","","","","","0","Unpublished","2024-10-22","t","","","69293381","",""

Run and get this error:

qsv select -o simple_opinions.csv id,date_created,date_modified,judges,date_filed,date_filed_is_approximate,slug,case_name_short,case_name,case_name_full,scdb_id,scdb_decision_direction,scdb_votes_majority,scdb_votes_minority,source,nature_of_suit,other_dates,citation_count,precedential_status,date_blocked,blocked,filepath_json_harvard,filepath_pdf_harvard,docket_id simple.csv
csv error: CSV error: record 2 (line: 3, byte: 1232): found record with 9 fields, but the previous record has 36 fields

When I run

qsv select 1-4 simple.csv
id,date_created,date_modified,judges
1553297,2014-10-30 18:18:48.065126+00,2024-11-02 17:57:19.548229+00,Cecelia H. Goetz
csv error: CSV error: record 2 (line: 3, byte: 1232): found record with 9 fields, but the previous record has 36 fields

qsv table simple.csv works fine (but it isn't very pretty cause there's too many columns/data that it doesn't truncate)

qsv stats simple.csv
fish: Job 1, 'qsv stats simple.csv' terminated by signal SIGSEGV (Address boundary error)

No idea what that error is

Expected behavior
It should read the file properly and output the selected columns to the file

Screenshots/Backtrace/Sample Data
If applicable, add screenshots/backtraces/sample data to help explain your problem.

Desktop (please complete the following information):

  • OS: Ubuntu 24.04.2
  • qsv 4.0.0-mimalloc-apply;Luau 0.663;-4-4;18.73 GiB-0 B-22.75 GiB-23.42 GiB (aarch64-unknown-linux-gnu compiled with Rust 1.86) compiled

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions