Skip to content

tapdb: optimize QueryUniverseAssetStats Performance with ANALYZE and Materialized Views #1633

@Roasbeef

Description

@Roasbeef

Background

The QueryUniverseAssetStats query in tapdb/sqlc/queries/universe.sql:225-312 is a complex analytical query that:

  • Calculates asset supply by aggregating universe leaves
  • Computes group supply from MSSMT nodes
  • Joins multiple tables including views (genesis_info_view, key_group_info_view)
  • Performs dynamic sorting across 8 different columns with ASC/DESC options

Performance Concerns

  1. Multiple CTEs with aggregations: The query uses asset_supply and group_supply CTEs that perform SUM() operations across large datasets
  2. Complex joins: Joins across 7+ tables/views including universe_leaves, universe_roots, mssmt_nodes, genesis_info_view, key_group_info_view, and universe_stats
  3. Dynamic sorting: The ORDER BY clause has 16 different CASE statements for flexible sorting

Proposed Optimizations

1. ANALYZE Strategy

Implement regular ANALYZE operations to ensure PostgreSQL has accurate statistics:

-- Run after bulk imports or significant data changes
ANALYZE universe_leaves;
ANALYZE universe_roots;
ANALYZE mssmt_nodes;
ANALYZE mssmt_roots;
ANALYZE universe_stats;

-- Consider more frequent targeted ANALYZE for high-churn tables
ANALYZE universe_leaves (leaf_node_key, leaf_node_namespace, universe_root_id);
ANALYZE mssmt_nodes (key, namespace, sum);

2. Materialized View Approach

Create a materialized view to pre-compute the expensive aggregations:

CREATE MATERIALIZED VIEW universe_asset_stats_mv AS
WITH asset_supply AS (
    SELECT SUM(nodes.sum) AS supply, gen.asset_id AS asset_id
    FROM universe_leaves leaves
    JOIN universe_roots roots ON leaves.universe_root_id = roots.id
    JOIN mssmt_nodes nodes ON leaves.leaf_node_key = nodes.key 
        AND leaves.leaf_node_namespace = nodes.namespace
    JOIN genesis_info_view gen ON leaves.asset_genesis_id = gen.gen_asset_id
    WHERE roots.proof_type = 'issuance'
    GROUP BY gen.asset_id
),
group_supply AS (
    SELECT sum AS num_assets, uroots.group_key AS group_key
    FROM mssmt_nodes nodes
    JOIN mssmt_roots roots ON nodes.hash_key = roots.root_hash 
        AND nodes.namespace = roots.namespace
    JOIN universe_roots uroots ON roots.namespace = uroots.namespace_root
    WHERE uroots.proof_type = 'issuance'
)
SELECT 
    asset_supply.supply,
    group_supply.num_assets AS group_supply,
    gen.asset_id,
    gen.asset_tag AS asset_name,
    gen.asset_type,
    gen.block_height AS genesis_height,
    gen.prev_out AS genesis_prev_out,
    group_info.tweaked_group_key AS group_key,
    gen.output_index AS anchor_index,
    gen.anchor_txid,
    universe_stats.total_asset_syncs AS total_syncs,
    universe_stats.total_asset_proofs AS total_proofs
FROM genesis_info_view gen
JOIN asset_supply ON asset_supply.asset_id = gen.asset_id
LEFT JOIN key_group_info_view group_info ON gen.gen_asset_id = group_info.gen_asset_id
LEFT JOIN group_supply ON group_supply.group_key = group_info.x_only_group_key
JOIN universe_stats ON gen.asset_id = universe_stats.asset_id
WHERE universe_stats.proof_type = 'issuance';

-- Create indexes for common access patterns
CREATE INDEX idx_universe_asset_stats_mv_asset_id ON universe_asset_stats_mv(asset_id);
CREATE INDEX idx_universe_asset_stats_mv_asset_name ON universe_asset_stats_mv(asset_name);
CREATE INDEX idx_universe_asset_stats_mv_asset_type ON universe_asset_stats_mv(asset_type);

3. Refresh Strategy

-- Option 1: Periodic refresh (e.g., every hour)
REFRESH MATERIALIZED VIEW CONCURRENTLY universe_asset_stats_mv;

-- Option 2: Event-driven refresh after universe updates
-- Could be triggered after sync operations or new proof insertions

Benefits

  1. Reduced query complexity: Pre-computed aggregations eliminate expensive runtime calculations
  2. Improved response times: Direct queries against materialized view vs. complex joins
  3. Predictable performance: Less variance in query execution times
  4. Concurrent refresh: Using CONCURRENTLY allows reads during refresh

Considerations

  • Storage overhead for materialized view
  • Staleness window between refreshes
  • Need to modify application code to query the materialized view when appropriate

Next Steps

  1. Benchmark current query performance with production-like data volumes
  2. Test materialized view approach in staging environment
  3. Implement ANALYZE automation in maintenance routines
  4. Consider partial indexes on sorting columns if specific sort patterns dominate

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions