-
Notifications
You must be signed in to change notification settings - Fork 6
Port melt to python datatable
datatable is a python package for data reading/manipulation/etc, which occupies the same niche as pandas, but is more geared towards large data sets. Python datatable
is a sibling of R’s data.table and attempts to mimic its core APIs and algorithms. However there are several functions in R data.table that have not yet been implemented in python datatable, including the melt function for wide-to-long data reshaping.
This project asks to implement the melt()
function for dataset reshaping. Similar functions exist in most data manipulation libraries:
- melt() in R data.table
- pivot_wider() in tidyr
- melt() in pandas
- UNPIVOT in Sql Server
- UNPIVOT in Snowflake
- reshape long in Stata
- proc transpose in SAS
- etc.
The basic premise of the melt()
function is that it takes a frame of size n×k and produces a new frame of size nk×2, where one column contains the column names from the original dataset, and the other column contains all the values from the original dataset.
This project requires knowledge of C++, since the majority of datatable code is written in C++.
Completing this project would require the author to submit a Pull Request (or a series of Pull Requests) directly to the datatable repository.
See also:
- the GitHub issue and discussion of the proposed functionality;
- Documentation relevant to the development process for datatable.
The melt()
function is one of the most frequently requested features for datatable, and will be therefore be a huge benefit to users of the library.
Please get in touch after completing at least one of the tests below.
- EVALUATING MENTOR: Pasha Stetsenko <[email protected]>
- Co-mentor: Toby Dylan Hocking <[email protected]>
Do one or several — doing more hard tests makes you more likely to be selected.
- Easy: Implement melting functionality and demonstrate that it works by adding the relevant unit tests;
- Medium: Search for wide-to-long transformation tutorials for different platforms / packages (for example see the links above), and make sure your
melt()
function is just as capable. Demonstrate this by creating a creating a datatable tutorial (as part of the official documentation) for using the function.
Students, please post a link to your test results here.