Skip to content

Conversation

@Lightning11wins
Copy link
Contributor

@Lightning11wins Lightning11wins commented Nov 25, 2025

Add additional testing for the newmalloc, util, xarray, xstring, and clusters libraries.
Add test_utils.h to make tests cleaner to read and write.
Fix a couple of minor bugs in libraries.

Israel and others added 28 commits October 13, 2025 09:53
Improve edge case logic in comparison functions.
Remove unregister driver function.
Clean up exp_functions.c.
Simplify dataqa_duplicates component in preparation for making it the boundary into our new duplicate system.
Add exp functions: sparse_eql(), ln(), and logn().
Fix bugs in comparison functions.
Make minor tweaks to objdrv_cluster.c.
Modify cluster files to use string keys.
Build vectors fully sparsely.
Add ca_fprint_vector().
Add snprint_llu().
Add exp_fn_trim().
Update exp_fn_cmp().
Organize exp function definitions by group.
Add statistics tracking to cluster driver.
Reduce minimum hint threshold.
Add array handling to ci_xaToTrimmedArray().
Update timer to handle multiple starts and stops properly.
Re-add Levenshtein to exp_functions.
Publish edit_dist() in the cluster library.
Fix mistakes in cluster driver function signatures.
Fix spelling mistakes.
Add detail to an error message in the lexer.
Remove unused .cluster files.
Clean up cluster-schema.cluster.
Clean up other unused junk.
Add known issues to string similarity documentation.
Clean up and organize todos.
Clean up testing code in several files.
…ast commit).

Update tests to pass with this modification.
… caches).

Fix a formatting issue with the stat method.
Fix a missing include in the util.c library.
…le hundred bytes.

Add check_double() to handle functions that return NAN on failure.
Clean up.
…rary.

Round similarity results to avoid floating point errors.
Enable caching for memory allocated in get_cluster_size().
Rename edit_dist() to ca_edit_dist() to match format for public functions.
Rename print_diagnostics() to print_err().
Fix a possible uninitialized read.
Fix memset() not initializing data.
Improve documentation.
Add test_utils.h to make tests cleaner to read and write.
Make UTIL_USE_METRIC in util public.
Add tests for util.
Add tests for xarray.
Add tests for xstring.
Fix undefined behavior in xstring.
Fix a typo in test_util_00.c.
@Lightning11wins Lightning11wins self-assigned this Nov 25, 2025
@Lightning11wins
Copy link
Contributor Author

This PR partially addresses #28.

@Lightning11wins
Copy link
Contributor Author

Lightning11wins commented Nov 25, 2025

I just realized I missed cluster.h. I'll get to that soon.

Got it, everything has tests except for the k-means algorithm. That's definitely the hardest to test.

Fix edge case: Tests taking longer than 1 second (centrallix-lib test driver).
Expand test_utils.h.
Handle missing edge cases in clusters.c.
Fix memstr_00 test.
@Lightning11wins
Copy link
Contributor Author

Lightning11wins commented Nov 26, 2025

That should handle most of the functionality in clusters.h, but I'm still missing the clustering and searching functions themselves. Hopefully I'll get to those soon.

Everything is tested except for the k-means algorithm.

Fix missing edge cases in clusters.h.
@Lightning11wins
Copy link
Contributor Author

I need to do another code review on this branch before Greg reviews it.

@Lightning11wins Lightning11wins marked this pull request as draft December 11, 2025 17:39
Add tests for nmMalloc().
Add EXPECT_NOT_NULL() to test_utils.h.
Modify test runner to use 5s timeout normally, but 90s timeout in Valgrind.
Improve documentation, style, and formatting.
@Lightning11wins
Copy link
Contributor Author

Styling issues are now corrected; this branch should be ready for review.

@Lightning11wins Lightning11wins marked this pull request as ready for review December 15, 2025 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants