Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing/Incorrect Index Statistics #132

Open
abdelfattahradwan opened this issue Sep 11, 2024 · 0 comments
Open

Missing/Incorrect Index Statistics #132

abdelfattahradwan opened this issue Sep 11, 2024 · 0 comments

Comments

@abdelfattahradwan
Copy link

abdelfattahradwan commented Sep 11, 2024

I noticed that when using pgvectorscale's indices, they never report an idx_scans number greater than zero.

For example, using the following table:

create table snippets
(
    id          integer                  not null primary key generated always as identity,
    public_id   uuid                     not null unique,
    name        text                     not null,
    description text                     not null,
    content     text                     not null,
    embedding   vector(1536)             not null,
    created_at  timestamp with time zone not null default now(),
    modified_at timestamp with time zone not null default now()
);

create index on snippets (name asc);

create index on snippets using diskann (embedding);

create index on snippets (created_at);

create index on snippets (modified_at);

And running the following query:

explain analyse
select snippets.public_id,
       snippets.name,
       snippets.description,
       snippets.content,
       snippets.created_at,
       snippets.modified_at,
       (snippets.embedding <=> '[-0.03248204,0.00062958454,-0.021850731,...]'::vector(1536)) as distance
from snippets
order by distance
limit 10;

Outputs the following query plan:

Limit  (cost=2.21..4.17 rows=10 width=1133) (actual time=0.248..0.370 rows=10 loops=1)
  ->  Index Scan using snippets_embedding_idx on snippets  (cost=2.21..6.91 rows=24 width=1133) (actual time=0.247..0.368 rows=10 loops=1)
"        Order By: (embedding <=> '[-0.03248204,0.00062958454,-0.021850731,...]'::vector)"
Planning Time: 0.170 ms
Execution Time: 0.390 ms

That tells me that the index is being used. But when I run the following query:

select *
from pg_stat_user_indexes
where indexrelname = 'snippets_embedding_idx'
order by idx_scan desc;

I get the following result:

relid indexrelid schemaname relname indexrelname idx_scan last_idx_scan idx_tup_read idx_tup_fetch
21333 21344 public snippets snippets_embedding_idx 0 null 5269 5269

That, contradictory to the previously mentioned query plan, indicates that the index has never been used.

I observe this behaviour in all my database instances. I even spun up a new container running timescale/timescaledb-ha:pg16 and observed the same behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant