-
Notifications
You must be signed in to change notification settings - Fork 28.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-49383][SQL][PYTHON][CONNECT] Support Transpose DataFrame API
### What changes were proposed in this pull request? The PR is proposed to support Transpose as Scala/Python DataFrame API in both Spark Connect and Classic Spark. Please see https://docs.google.com/document/d/1QSmG81qQ-muab0UOeqgDAELqF7fJTH8GnxCJF4Ir-kA/edit for a detailed design. ### Why are the changes needed? Transposing data is a crucial operation in data analysis, enabling the transformation of rows into columns. This operation is widely used in tools like pandas and numpy, allowing for more flexible data manipulation and visualization. While Apache Spark supports unpivot and pivot operations, it currently lacks a built-in transpose function. Implementing a transpose operation in Spark would enhance its data processing capabilities, aligning it with the functionalities available in pandas and numpy, and further empowering users in their data analysis workflows. ### Does this PR introduce _any_ user-facing change? Yes Transpose is supported. **Scala** ```scala scala> df.show() +---+---+---+ | a| b| c| +---+---+---+ | x| y| z| +---+---+---+ scala> df.transpose().show() +---+---+ |key| x| +---+---+ | b| y| | c| z| +---+---+ scala> df.transpose($"b").show() +---+---+ |key| y| +---+---+ | a| x| | c| z| +---+---+ ``` **Python** ```py >>> df.show() +---+---+---+ | a| b| c| +---+---+---+ | x| y| z| +---+---+---+ >>> df.transpose().show() +---+---+ |key| x| +---+---+ | b| y| | c| z| +---+---+ >>> df.transpose(df.b).show() +---+---+ |key| y| +---+---+ | a| x| | c| z| +---+---+ ``` **Spark Plan** ```scala scala> df.show() +---+---+---+ | a| b| c| +---+---+---+ | x| y| z| +---+---+---+ scala> df.transpose().explain(true) == Parsed Logical Plan == 'UnresolvedTranspose a#48: string +- LocalRelation [a#48, b#49, c#50] == Analyzed Logical Plan == key: string, x: string Transpose [key#83, x#84], [[b,y], [c,z]], true == Optimized Logical Plan == LocalRelation [key#83, x#84] == Physical Plan == LocalTableScan [key#83, x#84] ``` ```python # empty frame with no column headers >>> empty_df.show() ++ || ++ ++ >>> empty_df.transpose().explain(True) == Parsed Logical Plan == 'UnresolvedTranspose +- LogicalRDD false == Analyzed Logical Plan == Transpose false == Optimized Logical Plan == LocalRelation <empty> == Physical Plan == LocalTableScan <empty> # empty frame with column headers >>> empty_df.show() +-------+-------+-------+ |column1|column2|column3| +-------+-------+-------+ +-------+-------+-------+ >>> empty_df.transpose().explain(True) == Parsed Logical Plan == 'UnresolvedTranspose column1#0: string +- LogicalRDD [column1#0, column2#1, column3#2], false == Analyzed Logical Plan == key: string Transpose [key#32], [[column2], [column3]], true == Optimized Logical Plan == LocalRelation [key#32] == Physical Plan == LocalTableScan [key#32] ``` ### How was this patch tested? **Spark Connect** - Python - doctest - module: python.pyspark.sql.tests.connect.test_parity_dataframe - case: test_transpose - Proto - suite: org.apache.spark.sql.PlanGenerationTestSuite - case: transpose index_column transpose no_index_colum, **Spark Classic** - Python - doctest - module: python.pyspark.sql.tests.test_dataframe - case: test_transpose - Scala - suite: org.apache.spark.sql.DataFrameTransposeSuite - case: all ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47884 from xinrong-meng/transpose_dataframe_api. Authored-by: Xinrong Meng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
- Loading branch information
1 parent
26e59f2
commit 23bea28
Showing
30 changed files
with
1,010 additions
and
155 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.