Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add header option support to the CSV module #4295

Merged
merged 8 commits into from
Feb 4, 2025
Merged

Conversation

oleiade
Copy link
Member

@oleiade oleiade commented Jan 29, 2025

What?

Adds support for a header option to the csv module's option. When set to true, the header option leads the module's parse function as well as the Parser.next method to return records as objects with the header corresping row name as the property key, and value as the value.

const file = await open('data.csv');
const csvRecords = await csv.parse(file, { header: true })

export default async function() {
	console.log(csvRecords[scenario.iterationInTest])
}

Would print:

INFO[0000] {"age":"72","firstname":"fariha","lastname":"ehlenfeldt"}  source=console
INFO[0000] {"age":"50","firstname":"qudratullah","lastname":"gillfillan"}  source=console
INFO[0000] {"age":"41","firstname":"jeleah","lastname":"rodina"}  source=console
INFO[0000] {"age":"99","firstname":"thaisia","lastname":"nowalk"}  source=console
INFO[0000] {"age":"75","firstname":"joey-lynn","lastname":"wilsford"}  source=console
INFO[0000] {"age":"81","firstname":"tudur","lastname":"granville"}  source=console
INFO[0000] {"age":"56","firstname":"aytek","lastname":"umber"}  source=console
INFO[0000] {"age":"30","firstname":"aynoor","lastname":"hisaw"}  source=console
INFO[0000] {"age":"31","firstname":"fiadh-rose","lastname":"iravani"}  source=console
INFO[0000] {"age":"70","firstname":"annely","lastname":"ooley"}  source=console

As opposed to, for reference, treating records as array without the header option set:

INFO[0000] ["firstname","lastname","age"]                source=console
INFO[0000] ["fariha","ehlenfeldt","72"]                  source=console
INFO[0000] ["qudratullah","gillfillan","50"]             source=console
INFO[0000] ["jeleah","rodina","41"]                      source=console
INFO[0000] ["thaisia","nowalk","99"]                     source=console
INFO[0000] ["joey-lynn","wilsford","75"]                 source=console
INFO[0000] ["tudur","granville","81"]                    source=console
INFO[0000] ["aytek","umber","56"]                        source=console
INFO[0000] ["aynoor","hisaw","30"]                       source=console
INFO[0000] ["fiadh-rose","iravani","31"]                 source=console

Note: that the same behavior can be observed when using a Parser's next method with the header option set to true.

Why?

This PR tackles #4284, and directly addresses a request from the team behind k6 studio, and one of the project's issues k6-studio/#363.

It also nicely complements the feature set of the csv parser in general, and aligns its options-set with papaparse's.

Checklist

  • I have performed a self-review of my code.
  • I have added tests for my changes.
  • I have run linter locally (make lint) and all checks pass.
  • I have run tests locally (make tests) and all tests pass.
  • I have commented on my code, particularly in hard-to-understand areas.

Related PR(s)/Issue(s)

k6/#4284
k6-studio/#363

@oleiade oleiade requested a review from a team as a code owner January 29, 2025 09:56
@oleiade oleiade requested review from codebien and olegbespalov and removed request for a team January 29, 2025 09:56
@codebien
Copy link
Contributor

Adds support for a header option to the csv module's option.

I find the DX introduced here not optimal. To be honest, without reading the docs, my intuition would be that headers: true just includes the headers row.

I would prefer a clearer name, something like headerAsFields. Or, overall, I would prefer a different approach by using a different method. For example, parseAsObject().

@oleiade
Copy link
Member Author

oleiade commented Jan 29, 2025

I see where you're coming from @codebien, and I think I agree, I reused the name paparse uses, but in hindsight, I also find it a bit confusing 👍🏻

Giving this a thought, I propose we rename it to asObjects instead, and document explicitly that the header will be used as the "mapping" reference. I would prefer it to headerAsFields that doesn't express enough that the output type changes 🙇🏻

@oleiade
Copy link
Member Author

oleiade commented Jan 29, 2025

Also heads-up @going-confetti 👋🏻

@oleiade oleiade force-pushed the feature/csv-header branch from 1c41c35 to 9b66e3b Compare January 29, 2025 10:31
@codebien
Copy link
Contributor

codebien commented Jan 29, 2025

Giving this a thought, I propose we rename it to asObjects instead

Yes, this is better. I like it.

@oleiade Now, another question as a devil's advocate. Are we sure that it shouldn't be the default option? Why do we think an array of arrays is better for the default?

@allansson
Copy link
Contributor

Giving this a thought, I propose we rename it to asObjects instead

// users.csv
// Username,First name,Last name
// jdoe,john,doe
// immicky,mickey,mouse

const arrayOfArrays = csv.parse("./users.csv", { rowsAs: "array" })
const arrayOfObjects = csv.parse("./users.csv", { rowsAs: "object" })

// Future possibility:
const customMapping = csv.parse("./users.csv", {
  // Might seem similar to just doing `array.map` but this would avoid having to
  // iterate the list twice.
  rowsAs: ([userName, firstName, lastName]) => {
    return {
      userName,
      fullName: `${firstName} ${lastName}`
    }
  }
})

@oleiade Now, another question as a devil's advocate. Are we sure that it shouldn't be the default option? Why do we think an array of arrays is better for the default?

"If only this was an array of arrays", said no one ever.

@oleiade
Copy link
Member Author

oleiade commented Jan 31, 2025

@codebien @allansson

I think the approach you folks suggest makes a ton of sense, looking around about this, I actually noticed that Deno's 2.0 csv module took a similar approach:

import { parse, stringify } from "jsr:@std/csv";
let text = `
url,views,likes
https://deno.land,10,7
https://deno.land/x,20,15
https://deno.dev,30,23
`;
let data = parse(text, {
  skipFirstRow: true,
  strip: true,
});
console.log(data[0].url); // https://deno.land
console.log(data[0].views); // 10
console.log(data[0].likes); // 7
text = `
https://deno.land,10,7
https://deno.land/x,20,15
https://deno.dev,30,23
`;
data = parse(text, {
  columns: ["url", "views", "likes"],
});
console.log(data[0].url); // https://deno.land
console.log(data[0].views); // 10
console.log(data[0].likes); // 7
const obj = [
  { mascot: "dino", fans: { old: 100, new: 200 } },
  { mascot: "bread", fans: { old: 5, new: 2 } },
];
const csv = stringify(obj, {
  columns: [
    "mascot",
    ["fans", "new"],
  ],
});
console.log(csv);
// mascot,new
// dino,200
// bread,2

Note that by default rows are treated as maps (arrays are also objects in js after all), with the option to provide the column names yourself if you want. I find it aligns pretty well with your example @allansson too.

I really like this direction, but I suggest we open an issue, and address such a redesign separately, because that would be a big chunk of work, and as to not block this PR which would bring immediate value already.

What's your take, do you folks agree, or would you say we should completely change the design itself?

(bringing @Llandy3d in the loop to give him visibility on this)

@oleiade oleiade force-pushed the feature/csv-header branch from b3cf5b5 to b3208dc Compare January 31, 2025 15:23
@codebien
Copy link
Contributor

What's your take, do you folks agree, or would you say we should completely change the design itself?

@oleiade I'm fine with it. Please, remember to open an issue so we can link it into the epic issue to stabilize the csv module.

@oleiade
Copy link
Member Author

oleiade commented Feb 3, 2025

I have created #4507 as a result @codebien 🙇🏻 (with a couple of todos in the issue that are design aspects we should look into during the implementation)

@oleiade oleiade requested review from codebien and inancgumus and removed request for olegbespalov February 3, 2025 15:47
codebien
codebien previously approved these changes Feb 3, 2025
inancgumus
inancgumus previously approved these changes Feb 3, 2025
Copy link
Member

@inancgumus inancgumus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Just one suggestion.

internal/js/modules/k6/experimental/csv/module_test.go Outdated Show resolved Hide resolved
@inancgumus inancgumus added this to the v0.57.0 milestone Feb 3, 2025
@oleiade oleiade requested a review from inancgumus February 4, 2025 13:00
@oleiade oleiade merged commit 8799c81 into master Feb 4, 2025
28 checks passed
@oleiade oleiade deleted the feature/csv-header branch February 4, 2025 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants