Skip to content

Add "needle in the haystack" queries #8

Open
@valyala

Description

@valyala

The JSONBench dataset contains fields with big number of unique values (aka high-cardinality fields):

  • did (aka user_id)
  • commit.cid (aka commit_id)
  • commit.record.subject.cid

Sometimes it is needed to find all the rows for the particular rarely seen value of some field. For example, to find all the rows generated by some user. Then the following query can be used for JSONBench data:

SELECT count(*) FROM bluesky WHERE data.did = 'did:plc:stwikwzlk2mepaagokthylry'

Another practical query is to select a row for the given commit_id:

SELECT * FROM bluesky WHERE data.commit.cid = 'bafyreielfqkpggsdqwtbtg5tyh7iqytp64paevfjbeufnw6kc7sgmjemhm'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions