Clarification on probs from fit_transform: Single Topic Confidence or Multi-Topic Distribution? #2456

SongLin99 · 2025-11-10T12:47:21Z

SongLin99
Nov 10, 2025

Hi all,

I'm using BERTopic for a bibliometric study and have a core question about interpreting the probs output.

I ran: topics, probs = topic_model.fit_transform(docs, embeddings)

In my paper, my method is to calculate a topic's trend (e.g., for "Topic A") by summing the probabilities of all documents for that specific "Topic A." I justified this by treating probs as a "soft-clustering" distribution (like LDA), allowing one document to contribute to multiple topics.

A reviewer challenged this, arguing that:

The probs value returned by fit_transform is not a distribution across all topics.

It is merely the membership confidence score for the single topic assigned in the parallel topics array.

Therefore, my method is just a "weighted count" (summing confidences only for documents assigned to "Topic A"), not a true soft assignment (summing all documents' contributions to "Topic A").

My Question: Is the reviewer correct? Is the probs value from fit_transform strictly the confidence for the single assigned topic?

If so, is there another way (perhaps calculate_probabilities=True?) to get a full probability matrix, allowing me to validly sum all documents' probability scores for a single, specific topic?

Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification on probs from fit_transform: Single Topic Confidence or Multi-Topic Distribution? #2456

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Clarification on probs from fit_transform: Single Topic Confidence or Multi-Topic Distribution? #2456

Uh oh!

SongLin99 Nov 10, 2025

Replies: 0 comments

SongLin99
Nov 10, 2025