Topic prioritization for emergent collective agenda setting across scaleably many comments #2083

colinmegill · 2025-06-28T15:56:37Z

This PR adds endpoints and user interface for topic prioritization, which allows collective agenda setting and collective attention management in polis. The hierarchical topics are computed from comments in the delphi pipeline, and user selections are persisted (pending auth integration) to the DB. It will be fed to comment routing.

colinmegill · 2025-06-28T16:06:48Z

Foundations of group informed consensus over mathematically rigorous emergent topics for collective attention management, in lieu of paper for the moment :)

colinmegill · 2025-06-28T16:09:11Z

Taking some notes about the intention of this powerful direction... allowing highly expressive, highly detailed, affordances for participants to set their own, and conversation, priorities, is the only path to getting into the millions and tens of millions of comments per conversation, given the limited number of votes. It's possible that with increased relevancy polis will see more votes -- that's possible, and even likely, but it will still be, over time, proportional to submitted comments.

colinmegill · 2025-06-28T16:11:03Z

Allowing group informed consensus over ranked topics allows for densifying parts of the polis vote matrix, which allows for prioritizing certain reports.

Note that this is not importance, which is meant to operate along a different, complementary axis -- for those discussing "the arts" and "music", those reports may have 1000s of comments and will still benefit from importance and ranking for substance, far too fine grained than setting the agenda for a city of millions. This coarse graining bridges that gap in the math of scaling.

colinmegill · 2025-06-28T16:14:43Z

The question around comment routing, and bias, is of course foundational to the technology. We'll always want random comments, and new comments, and globally divisive comments, to get in front of users. But still, to scale to millions, there must be tradeoffs. Perhaps after tuning we'll find that 30% topics personalized to the user, 30% globally important, and 30% random with weights (existing algo) is the right ratio. But this is something that would be a good test for a paper using data gathered from this feature.

colinmegill · 2025-06-28T16:20:35Z

The topic names will change across runs, as will the umap. So, the only option is to project down into the comment space and we'll store three archetypal comments at the centroid. As more comments are added, there will be drift, so users can prioritize and re-prioritize as the conversation grows, when they see new comments. These "anchors" in the umap space will help comment routing stay relevant, and is durable for the lifetime of a years long conversation, especially since, as they are reprojected into the topic space later on, the user can remove that topic and drill back down into others.

colinmegill · 2025-06-28T16:22:00Z

Since what's being stored is comments, the right intuition is group informed consensus over zones of the vector embedding space, semantically, as "priority". This would be in contrast, to, say, managing attention for advertising.

tevko

Looks great overall, just a few default urls to remove and table scans to refactor. Are we at all concerned about pagination in dynamo for these new queries?

client-report/webpack.dev.js

delphi/create_topic_agenda_table.py

delphi/umap_narrative/502_calculate_priorities.py

- Remove priority calculation from math pipeline (conversation.py) * Delete _importance_metric and _priority_metric methods * Remove priority computation from recompute() method * Math pipeline now focuses on PCA, clustering, and representativeness - Add dedicated priority calculation script (502_calculate_priorities.py) * Implements PriorityCalculator class with group-based extremity * Matches Clojure priority formula: (importance * scaling_factor)^2 * Retrieves extremity values from Delphi_CommentExtremity table * Updates priorities in Delphi_CommentRouting table - Update pipeline execution order (run_delphi.py, run_delphi.sh) * Math pipeline → UMAP pipeline → Extremity calculation → Priority calculation * Ensures priorities use group-based extremity instead of PCA-based * Maintains separation of concerns between mathematical and priority calculations This fixes the priority calculation bug where all priorities were 0 due to missing extremity values, and implements proper group-based extremity usage as requested for the Pakistan conversation analysis. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Extract priority formulas into polismath/conversation/priority.py * Create PriorityCalculator class with static methods for core formulas * Port importance_metric and priority_metric from Clojure implementation * Add convenience methods: calculate_comment_priority, validate_inputs, explain_priority * Pure mathematical logic with no I/O dependencies for better testability - Refactor umap_narrative/502_calculate_priorities.py to use extracted formulas * Rename class from PriorityCalculator to PriorityService (clearer distinction) * Import and use PriorityCalculator.calculate_comment_priority() * Remove duplicate formula implementations (38 lines removed) * Focus service on DynamoDB operations and data orchestration Benefits: - Better separation of concerns: formulas vs data processing - Improved testability: mathematical logic can be unit tested independently - Enhanced reusability: priority formulas can be used in other contexts - Cleaner maintainability: formula changes only need to happen in one place Tested successfully with conversation 36324: - 807 comments processed - Priority statistics: min=0, max=3, avg=2.36 - All priorities calculated using group-based extremity values 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

… logic" This reverts commit cfd829d.

* add polis mod access and remove table scans * add comments back * more comment fixes

colinmegill · 2025-07-18T21:06:11Z

A reasonable "advanced mode" default for topic prioritization is an odd number of "points" to distribute, where the number is proportional to the topic hierarchy, and it costs more to vote more in the spirit of QV but with some adaptation to it being hierarchical and across rounds. Spiritually, that direction. It's a lot more clicking though, so, it's likely an advanced opt in mode when implemented, and not implemented yet.

tevko requested changes Jul 1, 2025

View reviewed changes

client-report/webpack.dev.js Show resolved Hide resolved

delphi/create_topic_agenda_table.py Outdated Show resolved Hide resolved

delphi/umap_narrative/502_calculate_priorities.py Show resolved Hide resolved

delphi/umap_narrative/502_calculate_priorities.py Outdated Show resolved Hide resolved

colinmegill and others added 23 commits July 4, 2025 09:59

admin mod, prio stub

8acce8e

hclust route checkpoitn

d50edc5

labels working

9102048

remove logging, get towards hclust

5fc1ee4

hclust explration 22

d8b9f29

contour plot

dd7445f

contours

e213ef8

prio take 1

b52a93d

distances

e468ae9

remove dead 600 file

4111e6b

add more defense for small models

8087fc1

topic prioritize mock

8dc48dd

webpack build size

8c7fce8

topic sections

c1d6627

merge fix

8656208

add simple route

46dea73

Revert "Refactor priority calculation: separate formulas from service…

98d9227

… logic" This reverts commit cfd829d.

topic agenda collapse prototype tetris-y

fdfa009

topic agenda setting demo

4ddb689

begin basic bulk upload capability

79c946b

save state

c3e6dab

colinmegill and others added 11 commits July 4, 2025 10:15

split by distance doesn't owrk

6987ce1

sentence transformer model

7ca6b5c

componetize

d2cfed2

drive detail

b84971e

more specifics

92d4e6e

log archetypal comments

d678cc2

dynamo table and node services

486f3ac

Don't crash when we have not votes.

5ef5e15

suppress logging

923cb12

differnet approach for prepended labesl

e5ec4ef

Te arch updates topicmod (#2086)

e993f44

* add polis mod access and remove table scans * add comments back * more comment fixes

tevko force-pushed the colinmegill/topicmod branch from 7157689 to e993f44 Compare July 4, 2025 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Topic prioritization for emergent collective agenda setting across scaleably many comments #2083

Topic prioritization for emergent collective agenda setting across scaleably many comments #2083

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

tevko left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

colinmegill commented Jul 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Topic prioritization for emergent collective agenda setting across scaleably many comments #2083

Are you sure you want to change the base?

Topic prioritization for emergent collective agenda setting across scaleably many comments #2083

Uh oh!

Conversation

colinmegill commented Jun 28, 2025

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

colinmegill commented Jun 28, 2025

Uh oh!

tevko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

colinmegill commented Jul 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants