Skip to content

Support for MariaDB database #548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

HugoWenTD
Copy link

@HugoWenTD HugoWenTD commented Oct 18, 2024

MariaDB supports Vector now. https://mariadb.com/kb/en/vector-overview/
Add new module for benchmark against MariaDB 11.8 database server.

@maumueller
Copy link
Collaborator

Thanks @HugoWenTD. Please make sure to add a test case to the CI as well.

@erikbern
Copy link
Owner

Are you planning to finish this @HugoWenTD ?

@HugoWenTD
Copy link
Author

Are you planning to finish this @HugoWenTD ?

Yes Erik, I'll work on it this week.

@erikbern
Copy link
Owner

Ok, nice!

@HugoWenTD HugoWenTD force-pushed the mariadb-11.6-vector-preview branch 2 times, most recently from 547c78b to e59fc18 Compare March 23, 2025 07:51
@HugoWenTD HugoWenTD marked this pull request as ready for review March 31, 2025 01:47
if batch:
self._cur.executemany(self._sql_insert, batch)

insert_time = time.time() - start_time

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would make sense to time just the "executemany" calls, not the python overhead of creating the batch?

If one is to compare these numbers to other DBs, it should time the operation the same. Is it with python overhead, or without in other DB's cases?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really benchmark insertions & building the index, so I don't think it's material!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's just a print, but I'll see if can change it.

@erikbern
Copy link
Owner

erikbern commented Apr 3, 2025

This is failing in CI for some reason, any idea why? https://github.com/erikbern/ann-benchmarks/actions/runs/14016901547/job/39890033550?pr=548

@HugoWenTD
Copy link
Author

This is failing in CI for some reason, any idea why? https://github.com/erikbern/ann-benchmarks/actions/runs/14016901547/job/39890033550?pr=548

Weird. Looks it's failing with --batch option. I'll look into it.

ERROR - double free or corruption (!prev)

@HugoWenTD
Copy link
Author

This is failing in CI for some reason, any idea why? https://github.com/erikbern/ann-benchmarks/actions/runs/14016901547/job/39890033550?pr=548

I suspect it's related to the ThreadPool and mariadb cursor.

    def batch_query(self, X: numpy.array, n: int) -> None:
        pool = ThreadPool()
        self.res = pool.map(lambda q: self.query(q, n), X)
    def query(self, v, n):
        self._cur.execute(self._sql_search, (self.vector_to_hex(v), n))

        return [id for id, in self._cur.fetchall()]

MariaDB supports Vector now. Add new module for benchmark against MariaDB
11.8 database server.
@HugoWenTD HugoWenTD force-pushed the mariadb-11.6-vector-preview branch from e59fc18 to 0291a06 Compare April 10, 2025 16:43
@HugoWenTD
Copy link
Author

Updated the batch_query. It should work now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants