Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #157: split function accounting for quotes #158

Merged
merged 13 commits into from
Jul 10, 2024

Conversation

mikyll
Copy link
Contributor

@mikyll mikyll commented Jul 6, 2024

Added Util.splitCSVLine() function that splits a CSV-formatted string into an array of tokens, accounting for quotes (single and double) and escape characters.

Added Util.splitCSVLine() function that splits a CSV-formatted string into an array of tokens, accounting for quotes (single and double) and escape characters.
@casbin-bot
Copy link
Member

@Edmond-J-A @rushitote please review

@CLAassistant
Copy link

CLAassistant commented Jul 6, 2024

CLA assistant check
All committers have signed the CLA.

@hsluoyz
Copy link
Member

hsluoyz commented Jul 7, 2024

@mikyll fix CI errors:

image

Warning: "src/util/Util.lua:263:24: (W311) value assigned to variable sep is unused"
src/util/Util.lua Show resolved Hide resolved
mikyll added 7 commits July 8, 2024 19:40
- renamed from splitCSVLine to splitEnhanced, since it's not compliant to RFC 4180 and it could be  deceiving;
- added optional parameters;
- now throws error if quotes are not closed;
- fixed its behaviour to comply with Casbin documentation (see the note at https://casbin.org/docs/policy-storage#loading-policy-from-a-csv-file and issue 886 at casbin/casbin):
  "If your file contains commas and double quotes, you should enclose the field in double quotes and double any embedded double quotes."

Therefore I removed the extra behaviour related to single quotes ' and escape character \ and refactored the function.
@mikyll
Copy link
Contributor Author

mikyll commented Jul 8, 2024

@leeqvip I made some fixes to Util functions and added unit tests (also added an example use case and the related unit test about #157)

mikyll added 4 commits July 9, 2024 16:10
- check if the last field is a quoted field
- throwing error when there are extra characters after the double quote that closes the quoted field.
- support for quotes in last field;
- throws an exception if there are other characters after the double quote that closes the quoted field.
(uniform to Util.split() )
(uniform to Util.split() )
@mikyll
Copy link
Contributor Author

mikyll commented Jul 9, 2024

Summary

Differences

Old Util.split(str, delimiter, x) function

This function was used for simply splitting a CSV-formatted line, into an array of fields, separated by delimiter (up to x number of fields). Fields were also trimmed to remove leading/trailing whitespaces.

New Util.splitEnhanced(str, delimiter, trimFields) function

This function does the same as Util.split(), but it adds the following features:

  1. quoted fields - fields can be wrapped in double quotes (") to escape comma characters that are part of the field (useful for example in regexMatch, to handle {N,M} pattern quantifiers);

  2. double quotes in quoted fields - quoted fields can contain double quotes, which are escaped through another double quotes character ("").

    This complies with what is stated in the documentation (Policy Storage | Casbin):

    Show/Hide

    NOTE
    If your file contains commas, you should wrap them in double quotes. For example:

    p, alice, "data1,data2", read    --correct
    p, alice, data1,data2, read        --incorrect (the whole phrase "data1,data2" should be wrapped in double quotes)
    

    If your file contains commas and double quotes, you should enclose the field in double quotes and double any embedded double quotes.

    p, alice, data, "r.act in (""get"", ""post"")"        --correct
    p, alice, data, "r.act in ("get", "post")"            --incorrect (you should use "" to escape "")
    

    Related issue: casbin#886

  1. policy syntax errors:
    • if a quoted field presents extra trailing chracters, after the closing double quotes, the function throws the error Quoted fields cannot have extra characters outside double quotes.
    • if there are unmatched quotes, the function throws the error Unmatched quotes.

New Example

  • file examples/basic_model_with_regex.conf - implements a simple ACL model with regex for each field of the policy rules;
  • file examples/basic_policy_with_regex.csv - shows a simple usage of this model with a rule that includes the pattern quantifier {N,M} (it includes a comma escaped through the quoted field, so it's a way to show the correct behaviour of the Util.splitEnhanced() function).

Tests

util_spec

  • test isOnlyWhitespaces - verifies the correct behaviour of Util.isOnlyWhitespaces();
  • test splitEnhanced - verifies the correct behaviour of Util.splitEnhanced(). In particular, the usage of quotedFields, escaped double quotes and errors;

enforcer_spec

  • test regexMatch - verifies the regexMatch behaviour of the new example, which exploits the new "quoted fields" feature.

@leeqvip leeqvip merged commit f40b2ea into casbin:master Jul 10, 2024
9 checks passed
Copy link

🎉 This PR is included in version 1.41.9 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

mikyll added a commit to mikyll/apisix that referenced this pull request Jul 11, 2024
Allow to escape commas in policy rules.

Especially useful for {N,M} pattern quantifier, which would otherwise throw an error (invalid policy size), since it considers the comma as a CSV field separator.

Issue: casbin/lua-casbin#157
Fixed by: casbin/lua-casbin#158
@mikyll mikyll deleted the split_csv_line branch November 28, 2024 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants