Skip to content

Conversation

@colinmegill
Copy link
Member

This PR adds endpoints and user interface for topic prioritization, which allows collective agenda setting and collective attention management in polis. The hierarchical topics are computed from comments in the delphi pipeline, and user selections are persisted (pending auth integration) to the DB. It will be fed to comment routing.

image

@colinmegill
Copy link
Member Author

Foundations of group informed consensus over mathematically rigorous emergent topics for collective attention management, in lieu of paper for the moment :)

@colinmegill
Copy link
Member Author

Taking some notes about the intention of this powerful direction... allowing highly expressive, highly detailed, affordances for participants to set their own, and conversation, priorities, is the only path to getting into the millions and tens of millions of comments per conversation, given the limited number of votes. It's possible that with increased relevancy polis will see more votes -- that's possible, and even likely, but it will still be, over time, proportional to submitted comments.

@colinmegill
Copy link
Member Author

Allowing group informed consensus over ranked topics allows for densifying parts of the polis vote matrix, which allows for prioritizing certain reports.

Note that this is not importance, which is meant to operate along a different, complementary axis -- for those discussing "the arts" and "music", those reports may have 1000s of comments and will still benefit from importance and ranking for substance, far too fine grained than setting the agenda for a city of millions. This coarse graining bridges that gap in the math of scaling.

@colinmegill
Copy link
Member Author

The question around comment routing, and bias, is of course foundational to the technology. We'll always want random comments, and new comments, and globally divisive comments, to get in front of users. But still, to scale to millions, there must be tradeoffs. Perhaps after tuning we'll find that 30% topics personalized to the user, 30% globally important, and 30% random with weights (existing algo) is the right ratio. But this is something that would be a good test for a paper using data gathered from this feature.

@colinmegill
Copy link
Member Author

The topic names will change across runs, as will the umap. So, the only option is to project down into the comment space and we'll store three archetypal comments at the centroid. As more comments are added, there will be drift, so users can prioritize and re-prioritize as the conversation grows, when they see new comments. These "anchors" in the umap space will help comment routing stay relevant, and is durable for the lifetime of a years long conversation, especially since, as they are reprojected into the topic space later on, the user can remove that topic and drill back down into others.

@colinmegill
Copy link
Member Author

Since what's being stored is comments, the right intuition is group informed consensus over zones of the vector embedding space, semantically, as "priority". This would be in contrast, to, say, managing attention for advertising.

Copy link
Collaborator

@tevko tevko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall, just a few default urls to remove and table scans to refactor. Are we at all concerned about pagination in dynamo for these new queries?

colinmegill and others added 23 commits July 4, 2025 09:59
- Remove priority calculation from math pipeline (conversation.py)
  * Delete _importance_metric and _priority_metric methods
  * Remove priority computation from recompute() method
  * Math pipeline now focuses on PCA, clustering, and representativeness

- Add dedicated priority calculation script (502_calculate_priorities.py)
  * Implements PriorityCalculator class with group-based extremity
  * Matches Clojure priority formula: (importance * scaling_factor)^2
  * Retrieves extremity values from Delphi_CommentExtremity table
  * Updates priorities in Delphi_CommentRouting table

- Update pipeline execution order (run_delphi.py, run_delphi.sh)
  * Math pipeline → UMAP pipeline → Extremity calculation → Priority calculation
  * Ensures priorities use group-based extremity instead of PCA-based
  * Maintains separation of concerns between mathematical and priority calculations

This fixes the priority calculation bug where all priorities were 0 due to
missing extremity values, and implements proper group-based extremity usage
as requested for the Pakistan conversation analysis.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Extract priority formulas into polismath/conversation/priority.py
  * Create PriorityCalculator class with static methods for core formulas
  * Port importance_metric and priority_metric from Clojure implementation
  * Add convenience methods: calculate_comment_priority, validate_inputs, explain_priority
  * Pure mathematical logic with no I/O dependencies for better testability

- Refactor umap_narrative/502_calculate_priorities.py to use extracted formulas
  * Rename class from PriorityCalculator to PriorityService (clearer distinction)
  * Import and use PriorityCalculator.calculate_comment_priority()
  * Remove duplicate formula implementations (38 lines removed)
  * Focus service on DynamoDB operations and data orchestration

Benefits:
- Better separation of concerns: formulas vs data processing
- Improved testability: mathematical logic can be unit tested independently
- Enhanced reusability: priority formulas can be used in other contexts
- Cleaner maintainability: formula changes only need to happen in one place

Tested successfully with conversation 36324:
- 807 comments processed
- Priority statistics: min=0, max=3, avg=2.36
- All priorities calculated using group-based extremity values

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@tevko tevko force-pushed the colinmegill/topicmod branch from 7157689 to e993f44 Compare July 4, 2025 15:22
@colinmegill
Copy link
Member Author

A reasonable "advanced mode" default for topic prioritization is an odd number of "points" to distribute, where the number is proportional to the topic hierarchy, and it costs more to vote more in the spirit of QV but with some adaptation to it being hierarchical and across rounds. Spiritually, that direction. It's a lot more clicking though, so, it's likely an advanced opt in mode when implemented, and not implemented yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants