You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
added new keyword argument tfidf_matrix_dtype (the datatype for the
tf-idf values of the matrix components). Allowed values are np.float32
and np.float64 (used by sparse_dot_topn v0.3.1). Default is np.float32:
np.float32 often leads to faster processing but less precision than
np.float64
Copy file name to clipboardExpand all lines: CHANGELOG.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -10,13 +10,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
10
10
## [0.4.1?] - 2021-06-11
11
11
12
12
### Added
13
-
[No additions were made]
13
+
14
+
* Added new keyword argument **`tfidf_matrix_dtype`** (the datatype for the tf-idf values of the matrix components). Allowed values are `np.float32` and `np.float64` (used by the required external package `sparse_dot_topn` version 0.3.1). Default is `np.float32`. (Note: `np.float32` often leads to faster processing and a smaller memory footprint albeit less numerical precision than `np.float64`.)
14
15
15
16
### Changed
16
17
17
18
* Changed dependency on `sparse_dot_topn` from version 0.2.9 to 0.3.1
18
-
* Changed the default value of the keyword argument `max_n_matches` from 20 to the number of strings in `duplicates` (or `master`, if
19
-
`duplicates` is not given).
19
+
* Changed the default value of the keyword argument `max_n_matches` from 20 to the number of strings in `duplicates` (or `master`, if `duplicates` is not given).
20
20
* Changed warning issued when the condition \[`include_zeroes=True` and `min_similarity`≤ 0 and `max_n_matches` is not sufficiently high to capture all nonzero-similarity-matches\] is met to an exception.
Copy file name to clipboardExpand all lines: README.md
+1
Original file line number
Diff line number
Diff line change
@@ -134,6 +134,7 @@ All functions are built using a class **`StringGrouper`**. This class can be use
134
134
All keyword arguments not mentioned in the function definitions above are used to update the default settings. The following optional arguments can be used:
135
135
136
136
***`ngram_size`**: The amount of characters in each n-gram. Default is `3`.
137
+
***`tfidf_matrix_dtype`**: The datatype for the tf-idf values of the matrix components. Allowed values are `np.float32` and `np.float64`. Default is `np.float32`. (Note: `np.float32` often leads to faster processing and a smaller memory footprint albeit less numerical precision than `np.float64`.)
137
138
***`regex`**: The regex string used to clean-up the input string. Default is `"[,-./]|\s"`.
138
139
***`max_n_matches`**: The maximum number of matches allowed per string in `master`. Default is the number of strings in `duplicates` (or `master`, if `duplicates` is not given).
139
140
***`min_similarity`**: The minimum cosine similarity for two strings to be considered a match.
0 commit comments