-
Notifications
You must be signed in to change notification settings - Fork 230
Topic prioritization for emergent collective agenda setting across scaleably many comments #2083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: edge
Are you sure you want to change the base?
Conversation
|
Foundations of group informed consensus over mathematically rigorous emergent topics for collective attention management, in lieu of paper for the moment :) |
|
Taking some notes about the intention of this powerful direction... allowing highly expressive, highly detailed, affordances for participants to set their own, and conversation, priorities, is the only path to getting into the millions and tens of millions of comments per conversation, given the limited number of votes. It's possible that with increased relevancy polis will see more votes -- that's possible, and even likely, but it will still be, over time, proportional to submitted comments. |
|
Allowing group informed consensus over ranked topics allows for densifying parts of the polis vote matrix, which allows for prioritizing certain reports. Note that this is not importance, which is meant to operate along a different, complementary axis -- for those discussing "the arts" and "music", those reports may have 1000s of comments and will still benefit from importance and ranking for substance, far too fine grained than setting the agenda for a city of millions. This coarse graining bridges that gap in the math of scaling. |
|
The question around comment routing, and bias, is of course foundational to the technology. We'll always want random comments, and new comments, and globally divisive comments, to get in front of users. But still, to scale to millions, there must be tradeoffs. Perhaps after tuning we'll find that 30% topics personalized to the user, 30% globally important, and 30% random with weights (existing algo) is the right ratio. But this is something that would be a good test for a paper using data gathered from this feature. |
|
The topic names will change across runs, as will the umap. So, the only option is to project down into the comment space and we'll store three archetypal comments at the centroid. As more comments are added, there will be drift, so users can prioritize and re-prioritize as the conversation grows, when they see new comments. These "anchors" in the umap space will help comment routing stay relevant, and is durable for the lifetime of a years long conversation, especially since, as they are reprojected into the topic space later on, the user can remove that topic and drill back down into others. |
|
Since what's being stored is comments, the right intuition is group informed consensus over zones of the vector embedding space, semantically, as "priority". This would be in contrast, to, say, managing attention for advertising. |
tevko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall, just a few default urls to remove and table scans to refactor. Are we at all concerned about pagination in dynamo for these new queries?
- Remove priority calculation from math pipeline (conversation.py) * Delete _importance_metric and _priority_metric methods * Remove priority computation from recompute() method * Math pipeline now focuses on PCA, clustering, and representativeness - Add dedicated priority calculation script (502_calculate_priorities.py) * Implements PriorityCalculator class with group-based extremity * Matches Clojure priority formula: (importance * scaling_factor)^2 * Retrieves extremity values from Delphi_CommentExtremity table * Updates priorities in Delphi_CommentRouting table - Update pipeline execution order (run_delphi.py, run_delphi.sh) * Math pipeline → UMAP pipeline → Extremity calculation → Priority calculation * Ensures priorities use group-based extremity instead of PCA-based * Maintains separation of concerns between mathematical and priority calculations This fixes the priority calculation bug where all priorities were 0 due to missing extremity values, and implements proper group-based extremity usage as requested for the Pakistan conversation analysis. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Extract priority formulas into polismath/conversation/priority.py * Create PriorityCalculator class with static methods for core formulas * Port importance_metric and priority_metric from Clojure implementation * Add convenience methods: calculate_comment_priority, validate_inputs, explain_priority * Pure mathematical logic with no I/O dependencies for better testability - Refactor umap_narrative/502_calculate_priorities.py to use extracted formulas * Rename class from PriorityCalculator to PriorityService (clearer distinction) * Import and use PriorityCalculator.calculate_comment_priority() * Remove duplicate formula implementations (38 lines removed) * Focus service on DynamoDB operations and data orchestration Benefits: - Better separation of concerns: formulas vs data processing - Improved testability: mathematical logic can be unit tested independently - Enhanced reusability: priority formulas can be used in other contexts - Cleaner maintainability: formula changes only need to happen in one place Tested successfully with conversation 36324: - 807 comments processed - Priority statistics: min=0, max=3, avg=2.36 - All priorities calculated using group-based extremity values 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
… logic" This reverts commit cfd829d.
* add polis mod access and remove table scans * add comments back * more comment fixes
7157689 to
e993f44
Compare
|
A reasonable "advanced mode" default for topic prioritization is an odd number of "points" to distribute, where the number is proportional to the topic hierarchy, and it costs more to vote more in the spirit of QV but with some adaptation to it being hierarchical and across rounds. Spiritually, that direction. It's a lot more clicking though, so, it's likely an advanced opt in mode when implemented, and not implemented yet. |
This PR adds endpoints and user interface for topic prioritization, which allows collective agenda setting and collective attention management in polis. The hierarchical topics are computed from comments in the
delphipipeline, and user selections are persisted (pending auth integration) to the DB. It will be fed to comment routing.