Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling CSVs with only CR newlines #150

Open
lbordowitz opened this issue Aug 21, 2019 · 1 comment
Open

Handling CSVs with only CR newlines #150

lbordowitz opened this issue Aug 21, 2019 · 1 comment

Comments

@lbordowitz
Copy link

A user has uploaded a CSV which uses solely carriage return (CR, or \r) characters for their newlines. The current SourceLineReader.readLineWithTerminator handles this by successfully reading in the header row and then discarding the rest. We want it to display every row in the CSV.

We run something like this:

// get a blob from Google Cloud Platform storage
val spreadsheetSource = Source.fromInputStream(Channels.newInputStream(blob.reader()))
val reader = CsvReader.open(spreadsheetSource)
val lines = reader.all()
// lines: List[List[String]] = List(List("First Name", " Last Name", " email"))

This is despite the fact that the file we're reading from has five lines. I have also tried this with Source.fromFile, and there's no difference.

I created the file from a normal CSV with LF-style line endings, and then ran this bash command:

$ tr '\n' '\r' < fnln.csv >fnln.cr.csv

Side note: why can't we use Source's built-in getLines function? Is there a reason that we need the line terminator in each string?

@tototoshi
Copy link
Owner

The current implementation of scala-csv expects \n and \r\n as a newline code and does not support \r.
This was simply because I had never encountered a system that treated ' \r' as a newline code, and I thought it was enough. But I should probably support it.

Side note: why can't we use Source's built-in getLines function? Is there a reason that we need the line terminator in each string?

The difference between SourceLineReader.readLineWithTerminator and Source#getLines is whether it gets rid of newline codes or not.
To parse a csv field that contains multiline text, I need to preserve newline codes. Source#getLines doesn't fit this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants