Skip to content
This repository has been archived by the owner on Jan 22, 2019. It is now read-only.

Doesn't handle whitespace outside of quotes values correctly #19

Open
tomdz opened this issue Jun 4, 2013 · 6 comments
Open

Doesn't handle whitespace outside of quotes values correctly #19

tomdz opened this issue Jun 4, 2013 · 6 comments

Comments

@tomdz
Copy link

tomdz commented Jun 4, 2013

When parsing a CSV file like:

"foo", "bar", "baz"
"baz", "foo", "bar"

the CSV parser will get confused and give me back exactly two values:

foo

and

 bar, baz
baz, foo, bar

(note the leading space here).

According to RFC 4180, these spaces should be considered to be part of the value, e.g. it should return 'foo', ' bar',' baz', and 'baz', ' foo', ' bar'.
Alternatively - maybe via a feature - it could trim the whitespace outside of quoted strings, e.g. 'foo', 'bar','baz', and 'baz', 'foo', 'bar'.

@cowtowncoder
Copy link
Member

Quick note: trimming is already supported with CsvParser.TRIM_SPACES, see: http://fasterxml.github.io/jackson-dataformat-csv/javadoc/2.2.0/com/fasterxml/jackson/dataformat/csv/CsvParser.Feature.html#TRIM_SPACES

But I'll see what's up with eating of spaces...

@cowtowncoder
Copy link
Member

Hmmh. I am guessing that some spaces are missing from the example, due to Markdown?
If so, could you add an example that uses, say, underscores to denote where spaces are. I need to write a unit test to verify what gives, should be an easy thing to solve.

@cowtowncoder
Copy link
Member

Actually it looks like I can reproduce this on my own.

@cowtowncoder
Copy link
Member

Hmmh. Reading through RFC 4180, I do not see definition of whether spaces would be allowed in the way described, outside quotes. But I think it would make sense to handle them in intuitive way.

FWIW, enabled TRIM_SPACES should solve your specific problem I think, until I'll fix the issue for un-trimmed case.

I assume that spaces outside of quotes should be trimmed anyway; does not make sense to make to leave them.

@qrlodhi
Copy link

qrlodhi commented Mar 13, 2015

Any update on this issue?

I have same issue and even though this specific case (where delimiter is a comma) is solved by using CsvParser.TRIM_SPACES as stated above, it messes things up when input delimiter is a space. I can use two different mappers for different delimiters but then the indexes of fields change if the delimiter changes. So it'll be nice to see these spaces handled by Jackson CSV parser.

@cowtowncoder
Copy link
Member

Unfortunately no update yet. I realize this is an important feature, and hope to address it.
Interesting note on spaces, thank you for mentioning this; I hadn't thought this would be commonly done.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants