Skip to content

Postgres introspector is too slow for databases with many schemas and tables #1707

@rubenferreira97

Description

@rubenferreira97

When using Kysely with a Postgres database containing many schemas and tables, migrateToLatest() takes ~500 seconds to start up, even when no migrations need to be applied.

My production database has ~40 schemas with ~200 tables each (~8,000 tables total). Just calling migrateToLatest() blocks the application for minutes.

Root cause

After profiling, I found two main bottlenecks:

1. #parseTableMetadata is O(n²)

PostgresIntrospector.#parseTableMetadata uses Array.find() to look up each table by name and schema for every column row:

return columns.reduce<TableMetadata[]>((tables, it) => {
  let table = tables.find(
    (tbl) => tbl.name === it.table && tbl.schema === it.schema,
  )
  // ...
}, [])

For each of the n columns, it linearly scans the growing tables array, resulting in O(n²) time complexity. With ~50,000 columns, this in-memory step alone takes ~183 ms per call.

2. #doesTableExist fetches all tables in the database

The migrator's #doesTableExist calls getTables({ withInternalKyselyTables: true }) which introspects every table in every schema, then filters in JavaScript:

const tables = await this.#props.db.introspection.getTables({
  withInternalKyselyTables: true,
})

return tables.some(
  (it) => it.name === tableName && (!schema || it.schema === schema),
)

With 8,000 tables, this fetches tens of thousands of column rows from pg_catalog twice (once for the migration table, once for the lock table), just to check if two specific tables exist. This is the dominant contributor to the ~500s startup.

Benchmark: #parseTableMetadata (in-memory only)

Scenario Total Columns Before (ms) After (ms) Speedup
Small (1 schema, 5 tables, 5 cols) 25 0.002 0.002 1.1x
Medium (2 schemas, 20 tables, 10 cols) 400 0.051 0.026 2.0x
Large (3 schemas, 50 tables, 15 cols) 2,250 0.731 0.133 5.5x
XL (5 schemas, 100 tables, 20 cols) 10,000 11.46 0.61 18.9x
XXL (10 schemas, 200 tables, 25 cols) 50,000 182.69 2.85 64x

Benchmark: getTables() full roundtrip (SQL → parse)

Scenario Specs Fetch All (ms) Fetch Filtered, 1 table (ms) Speedup
S 1 schema, 10 tables 4.8 2.5 1.9x
M 5 schemas, 50 tables 5.1 2.4 2.1x
L 10 schemas, 100 tables 8.2 3.5 2.3x
XL 20 schemas, 500 tables 22.4 3.3 6.8x
XXL 50 schemas, 1,000 tables 44.1 2.0 22x

Proposed fix

I've split the fix into two PRs:

  1. PR perf: optimize PostgresIntrospector #parseTableMetadata with Map-based lookups #1705: perf/pg-parse-table-metadata: Replace Array.find() with a nested Map<schema, Map<table, TableMetadata>> for O(1) lookups → O(n) total complexity. No API changes.

  2. PR feat: add introspector filtering #1706: feat/add-introspector-filtering: Add optional schemas?: string[] and tables?: string[] filters to DatabaseMetadataOptions. This allows pushing filtering down to SQL (via WHERE ... IN (...)) in all 4 dialect introspectors. The migrator's #doesTableExist is updated to query only the specific table it needs instead of fetching everything.

Environment

  • Kysely 0.28.x
  • PostgreSQL 16
  • Node.js v25
  • Database: ~40 schemas, ~200 tables each (~8,000 tables, ~50,000+ columns)

Disclosure: This issue was written with AI assistance. The problem identification, solution design, benchmarks, and code review were done manually.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions