Skip to content

Conversation

@ck37
Copy link

@ck37 ck37 commented Sep 6, 2025

What does this PR do?

Minor fix to fit_transform() to fix the issue of topic_sizes_ not being updated in zero-shot topic modeling when using the nr_topics parameter. Adds tests to confirm the fix and provides a small changelog update.

Happy to revise further if any revisions are desired.

Fixes #2384

Before submitting

  • This PR fixes a typo or improves the docs (if yes, ignore all other checks!).
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes (if applicable)?
  • Did you write any new necessary tests?

@ck37 ck37 force-pushed the fix-topic-sizes-zeroshot-bug branch from 97e4d13 to fec0041 Compare September 6, 2025 17:55
@ck37 ck37 force-pushed the fix-topic-sizes-zeroshot-bug branch from fec0041 to 07b4428 Compare September 6, 2025 18:02
@MaartenGr
Copy link
Owner

Thank you for the PR! With respect to the tests, can you explain a bit more on why those were added like this? Specifically, I'm seeing UMAP parameters sometimes being added whilst that is not the case for every test. Moreover, there are several calls to BERTopic, which can be quite expensive in a testing pipeline especially if you could run it once instead. There are conftest available where you can initialize a given model once and re-use it across tests:

https://github.com/MaartenGr/BERTopic/blob/master/tests/conftest.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zero-shot topic modeling with nr_topics parameter results in empty topic_sizes_ and get_topic_info()

2 participants