Skip to content

Releases: mars-project/mars

v0.5.0a1

23 May 06:47
ad6eac2
Compare
Choose a tag to compare
v0.5.0a1 Pre-release
Pre-release

This is the release notes of v0.5.0a1. See here for the complete list of solved issues and merged PRs.

Changes that break compatibility

  • Calling .execute() will no longer return numpy ndarray, pandas DataFrame and so forth, but will return Mars tensor, DataFrame itself instead. Only corner data will be fetched for display purpose. In order to explicitly convert to numpy ndarray, please call .to_numpy(), at the same time, call .to_pandas() to convert to pandas DataFrame. For more details, please refer to #1201.

Highlights

  • Remote API is introduced and preliminarily supported in #1238, for more details, refer to proposal #1227.
  • Running on Yarn is preliminarily supported in #1210.

New Features

  • Tensor
    • Implements mt.trapz (#1205)
  • DataFrame
    • Add support of {DataFrame,Series}.ewm (#1164)
    • Add dataframe.unique support (#1208)
    • Implements md.to_datetime, support __setitem__ for DataFrame as well (#1207)
    • Add support for Series.astype and DataFrame.astype (#1224)
  • Learn
    • Support Mars Series in PyTorch Dataset (#1190)
    • Implements mars.learn.metrics.{roc_curve, auc} (#1220)
  • Others
    • Add preliminary support for Yarn (#1210)
    • Add preliminary remote function support (#1238)

Enhancements

  • Make Tileable.execute() return tileable itself, fetching corner data only for correct repr (#1201)
  • Allow some operands to fail fast (#1229)
  • Rename LocalClusterSession to ClusterSession (#1230)

Bug fixes

  • Fix serialization for mars.learn.utils.shuffle (#1192)
  • Fix wrong result of column pruning (#1215)
  • Fix error in starting local cluster with IPython (#1232)

Documentation

  • Add learn docs (#1182)
  • Add translation for learn docs (#1183)
  • Add documentations for DataFrame arithmetic operands (#1191)
  • Add logo in readme and docs (#1213)

Tests

  • Workaround for upgraded tiledb (#1195)

v0.4.0

23 May 08:09
5fecd78
Compare
Choose a tag to compare

This is the release notes of v0.4.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v0.4.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:

Changes that break compatibility

  • Calling .execute() will no longer return numpy ndarray, pandas DataFrame and so forth, but will return Mars tensor, DataFrame itself instead. Only corner data will be fetched for display purpose. In order to explicitly convert to numpy ndarray, please call .to_numpy(), at the same time, call .to_pandas() to convert to pandas DataFrame. For more details, please refer to #1201.

Highlights

  • Remote API is introduced and preliminarily supported in #1239, for more details, refer to proposal #1227.

New Features

  • Tensor
    • Implements mt.trapz (#1223)
  • DataFrame
    • Add support of {DataFrame,Series}.ewm (#1198)
    • Add dataframe.unique support (#1225)
    • Implements md.to_datetime, support __setitem__ for DataFrame as well (#1226)
    • Add support for Series.astype and DataFrame.astype (#1237)
  • Learn
    • Support Mars Series in PyTorch Dataset (#1194)
    • Implements mars.learn.metrics.{roc_curve, auc} (#1233)
  • Others
    • Add preliminary remote function support (#1239)

Enhancements

  • Tileable.execute() now will return Tileable itself, repr will act correctly (#1202)
  • Rename LocalClusterSession to ClusterSession (#1236)

Bug fixes

  • Fix serialization for mars.learn.utils.shuffle (#1193)
  • Fix error in starting local cluster with IPython & latest gevent version (#1234)
  • Fix wrong result of column pruning (#1235)

v0.4.0rc1

25 Apr 14:13
f6ec9c5
Compare
Choose a tag to compare
v0.4.0rc1 Pre-release
Pre-release

This is the release notes of v0.4.0rc1. See here for the complete list of solved issues and merged PRs.

New Features

  • DataFrame
    • Add support for isna, notna and __dir__ (#1125)
    • Add support for md.dropna (#1129)
    • Support groupby.__getitem__ and group by level (#1136)
    • Implement DataFrame nunique (#1137)
    • Implements md.cut (#1139)
    • Add plot and relative functions for DataFrame and Series (#1143)
    • Implements {DataFrame, Series}.{shift, tshift} (#1157)
    • Add support of md.expanding (#1160)
    • Implements {DataFrame,Series}.diff (#1174)
    • Support modulo operand for DataFrame (#1176)
    • Add Series.value_counts() support (#1181)
  • Tensor
    • Add support for mt.union1d (#1147)
    • Support Tensor.__setitem__ with bool indexing (#1159)
  • Learn
    • Add support for NearestNeighbors.kneighbors_graph (#1152)
    • Add support for mars.learn.metrics.accuracy_score (#1150)
    • Implements mars.learn.metrics.pairwise.rbf_kernel (#1158)
    • Implements mars.learn.semi_supervised.LabelPropagation (#1163)

Enhancements

  • Refactor GroupBy objects (#1127)

Bug fixes

  • Support md.merge when on column is in df.index (#1132)
  • Fix tokenizing partial function (#1149)
  • Allow retrieving shape of a groupby object (#1155)

Documentation

  • Add DataFrame docs (#1130)
  • Fix requirements for doc (#1135)
  • Fix rendering numpy-style documentations (#1179)
  • Fix some mistakes in the doc. (#1161, thanks @ueshin!)

Tests

  • Check if tileable.nsplits and chunk.shape is consistent (#1108)
  • Add meta checks for groupby (#1144)
  • Allow using pyarrow==0.17.0 (#1172)

v0.3.4

24 Apr 17:43
aa5c436
Compare
Choose a tag to compare

This is the release notes of v0.3.4. See here for the complete list of solved issues and merged PRs.

New Features

  • DataFrame
    • Add support for isna, notna and __dir__ (#1126)
    • Add support for {DataFrame,Series}.agg (#1128)
    • Add support for md.dropna (#1131)
    • Implements {DataFrame, Series}.{shift, tshift} (#1168)
    • Add plot and relative functions for DataFrame and Series (#1166)
    • Implement DataFrame nunique (#1170)
    • Implements {DataFrame,Series}.diff (#1177)
    • Support modulo operand for DataFrame (#1180)
  • Tensor
    • Add support for mt.union1d (#1167)
    • Support Tensor.__setitem__ with bool indexing (#1169)

Bug fixes

  • Support md.merge when on column is in df.index (#1165)

Tests

  • Check if tileable.nsplits and chunk.shape is consistent (#1133)
  • Allow pyarrow to use 0.17.0 (#1173)

v0.3.3

31 Mar 02:59
66b6274
Compare
Choose a tag to compare

New Features

  • Implements at and iat for DataFrame (#1105)
  • Implements Series.isin (for Series type). (#1106)

Enhancements

  • Optimize performance of executor when running ops less than number of parallelism (#1099)

Bug fixes

  • Fix validate_axis when input tileable has unknown shape (#1092)
  • Support creating DataFrame from dict in which scalar exists (#1104)
  • Support slice that can be integer or other types on non-int64 index (#1109)

Tests

  • Check metadata consistency for output chunks and tileables (#1094)

v0.4.0b2

30 Mar 10:50
25971a6
Compare
Choose a tag to compare
v0.4.0b2 Pre-release
Pre-release

This is the release notes of v0.4.0b2. See here for the complete list of solved issues and merged PRs.

New Features

  • DataFrame
    • Support calling df.agg() with lists or dicts for transform (#1093)
    • Implements at and iat for DataFrame (#1101)
    • Implements Series.isin (for Series type). (#1058)

Enhancements

  • Optimize performance of executor when running ops less than number of parallelism (#1096)

Bug fixes

  • Fix validate_axis when input tileable has unknown shape (#1091)
  • Support creating DataFrame from dict in which scalar exists (#1098)
  • Support slice that can be integer or other types on non-int64 index (#1103)

Tests

  • Check metadata consistency for output chunks and tileables (#1071)

v0.3.2

21 Mar 09:10
ad5292f
Compare
Choose a tag to compare

This is the release notes of v0.3.2. See here for the complete list of solved issues and merged PRs.

New Features

  • DataFrame
    • Implement md.{cummax, cummin, cumprod, cumsum} (#1022)
    • Add support for md.fillna (#1031)
    • Add DataFrame.loc support (#1060)
    • Add DataFrame.rolling support (#1061)
    • Add support for GroupBy.{cumcount, cummin, cummax, cumprod, cumsum} (#1072)
    • Support string and datetime methods via Series.str and Series.dt accessor (#1074)
    • Implement dataframe append (#1075)
    • Implement DataFrame.concat and Series.concat (#1078)
    • Add support for DataFrame.sort_values (#1081)
    • Support sort_index for DataFrame and Series (#1082)
    • Add md.date_range support (#1086)
    • Logical operators on DataFrame and Series. (#1088)
    • Implements head/tail based on iloc, and fixes bug in getitem. (#1089)

Enhancements

  • Use mapjoin to optimize df.merge (#1023)
  • Refactor tiling of DataFrame.iloc with index_lib (#1043)
  • Add sort_range_index parameter in readcsv (#1067)

Bug fixes

  • Standardize RangeIndex for unknown shape DataFrame (#1066)
  • Fix failed cases in distributed mode (#1079)
  • Fix wrong dtypes in df.rechunk (#1083)
  • Fix consistency between tensor metadata and real outputs (#1087)

Tests

  • Fix tests under Python 3.6 as VS2015 is preinstalled (#1015)

v0.4.0b1

20 Mar 19:13
0a02b23
Compare
Choose a tag to compare
v0.4.0b1 Pre-release
Pre-release

This is the release notes of v0.4.0b1. See here for the complete list of solved issues and merged PRs.

New Features

  • DataFrame
    • Implement md.{cummax, cummin, cumprod, cumsum} (#1019)
    • Implement dataframe append (#1026)
    • Add support for md.fillna (#1029)
    • Implement DataFrame.concat and Series.concat (#1040)
    • Support groupby.agg with list of functions (#1030)
    • Implement md.{DataFrame,Series,GroupBy}.apply (#1038)
    • Add support for DataFrame.sort_values (#1046)
    • Add DataFrame.loc support (#1042)
    • Add DataFrame.rolling support (#1045)
    • Add support for {DataFrame,Series}.agg (#1054)
    • Support string and datetime methods via Series.str and Series.dt accessor (#1063)
    • Add support for GroupBy.{cumcount, cummin, cummax, cumprod, cumsum} (#1069)
    • Support sort_index for DataFrame and Series (#1053)
    • Add md.date_range support (#1073)
    • Logical operators on DataFrame and Series. (#1056)
    • Implements head/tail based on iloc, and fixes bug in getitem. (#1057)
  • Others
    • Add support for function serialization (#1048)

Enhancements

  • Use mapjoin to optimize df.merge (#1021)
  • Add sort_range_index parameter in read_csv (#1024)
  • Refactor tiling of DataFrame.iloc with index_lib (#1016)

Bug fixes

  • Fix KNN so that it can accept input with unknown shape (#1033)
  • Support serializing pd.Timestamp and pd.Timedelta (#1065)
  • Fix failed cases in distributed mode (#1062)
  • Fix wrong dtypes in df.rechunk (#1080)
  • Fix failed fit method selection for KNN when input has unknown shape (#1050)
  • Fix consistency between tensor metadata and real outputs (#1085)

Tests

  • Fix tests under Python 3.6 as VS2015 is preinstalled (#1014)

v0.4.0a2

22 Feb 04:55
900cbea
Compare
Choose a tag to compare
v0.4.0a2 Pre-release
Pre-release

This is the release notes of v0.4.0a2. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor:
    • Add ability to read and write HDF5 file for tensor (#962)
    • Implements mt.{topk, argsort, argpartition, argtopk} (#946)
    • Support reading and writing in zarr format (#963)
    • Implement imread to read from images (#988)
  • DataFrame
    • Support ufunc for Mars DataFrame (#957)
    • Implements DataFrame.to_csv (#966)
    • Implement dataframe var and std (#977)
    • Implements Series.map (#979)
    • Implements DataFrame dot, mul and pow (#980)
    • Implements describe for DataFrame (#981)
    • Implements md.read_sql_table (#986)
  • Learn
    • Implement PyTorch sampler to improve dataset performance (#970)
    • Support mars.learn.neighbors.NearestNeighbors (#961)
    • Leverage faiss to accelerate k-nearest neighbors calculation (#984)
    • Implement pytorch sampler for local training (#1010)

Enhancements

  • Refactor tensor indexing (#1011)

Bug fixes

  • Fix tile in nonzero that tensor instead of tensor data should be used during the process (#954)
  • Fixes cdist(x, y) that creates tensor with wrong nsplits (#960)
  • Fix the wrong RangeIndex in read_csv (#930)
  • Stop detecting GPU when no cuda devices are configured (#973)
  • Fix wrong behavior of mt.random.choice (#976)
  • Make sure all kwargs are numpy types when inferring dtypes (#987)
  • Fix error when chunk_size not provided for md.read_sql_table (#990)
  • Fix wrong result of count_nonzero (#1002)
  • Add dtype property for TensorImread (#1004)
  • Fix error when no device detected by CUDA driver (#1007)

Tests

  • Fix failed unittests due to release of pandas 1.0 (#964)
  • Hotfix opcodes that conflict (#968)

v0.3.1

22 Feb 14:18
30f33a2
Compare
Choose a tag to compare

This is the release notes of v0.3.1. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
    • Implements mt.{topk, argsort, argpartition, argtopk} (#991)
    • Implement imread to read from images (#997
  • DataFrame
    • Support ufunc for Mars DataFrame (#967)
    • Implements DataFrame.to_csv (#992)
    • Implements DataFrame dot, mul and pow (#994)
    • Implement dataframe var and std (#996)
    • Implements describe for DataFrame (#998)

Enhancements

  • Refactor tensor indexing (#1012)

Bug fixes

  • Stop detecting GPU when no cuda devices are configured (#975)
  • Fix wrong behavior of choice (#993)
  • Make sure all kwargs are numpy types when inferring dtypes (#995)
  • Fix wrong result of count_nonzero (#1003)
  • Add dtype property for TensorImread (#1005)
  • Fix error when no device detected by CUDA driver (#1008)

Tests

  • Fix failures in Windows tests (#939)
  • Fix failed unittests due to release of pandas 1.0 (#965)