Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistence Refactor POC #1011

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

flyrain
Copy link
Contributor

@flyrain flyrain commented Feb 17, 2025

Experiment:

  1. Added a DAO layer for the business entity namespace(except the read).
  2. Integrated with existing DAO components (PolarisMetaStoreManager and PolarisMetaStoreSession).
  3. All tests passed successfully, including manual local run with Spark sql.

Benefits:

  1. Compatible with the existing backend(FDB), as we hide them behind the new DAO.
  2. Adding new backends(Postgres/MongoDB) is much easier now, esp for Postgres, we could be able to use a similar model as Iceberg Jdbc catalog.
  3. Allows gradual refactoring to remove old DAO dependencies (PolarisMetaStoreManager and PolarisMetaStoreSession).
  4. Enables parallel development of new backend implementations.

Next Steps:

  1. Define business entities one by one to decouple them from FDB.
  2. Create DAO interfaces for each entity to standardize operations (e.g., CRUD, ID generation).
  3. Expand DAO implementations to support additional backends over time.

Please check the detailed design doc: https://docs.google.com/document/d/1Vuhw5b9-6KAol2vU3HUs9FJwcgWtiVVXMYhLtGmz53s/edit?usp=sharing

import org.apache.polaris.core.entity.PolarisEntityCore;
import org.apache.polaris.core.persistence.dao.NamespaceDao;

public class PostgresNamespaceDao implements NamespaceDao {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move them into different module if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree, FDB and PostgreSQL should be in their own module.

*/
package org.apache.polaris.core.persistence.dao;

public interface CatalogDao {}
Copy link
Member

@jbonofre jbonofre Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For persistence DAO, my idea was more to have concrete DAO definition to help the implementer with associated record. The DAO deals with storage operation using a record carriage (to avoid coupliing with entity).

Something like:

Suggested change
public interface CatalogDao {}
public record CatalogRecord(String id, String name, String location, ...) {
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. We should refactor the existing service entities instead of creating a new set of them as I said in the mailing list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flyrain you are right for DAO, my point is more for using entity in the DAO. I would rather use a record in the DAO to decouple from Entity: it will force us to decouple service and storage layers, and the record should be "obvious" for DAO implementer.

import org.apache.polaris.core.entity.PolarisEntityCore;

public interface NamespaceDao {
void save(NamespaceEntity namespace, List<PolarisEntityCore> catalogPath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to have operation definitions in the DAO. The DAO could just a record (public record NamespaceDao), the storage logic for a DAO is in the persistence implementation.

The idea of DAO is to "map" business logic object to persistence object, I think it's a paradigm we should keep here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of DAO is to "map" business logic object to persistence object

To be clear, NamespaceEntity and catalogPath here should service-layer entities. I put them here to demo how it works with existing code. But they are not necessary to be the final form. The idea of the DAO is to provide an interface so that the impl. can do two-way conversion like the following, while keep the interface only with upper-layer entities.

  1. service entities -> persistence entities
  2. persistence entities -> service entities

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes sense. As said in previous comment, the DAO as storage operation definition can do the mapping but I think he should use only storage records (persistence entities) that can be converted to service entitiy. Seperation here is welcome.

import org.apache.polaris.core.entity.PolarisEntityCore;
import org.apache.polaris.core.persistence.dao.NamespaceDao;

public class PostgresNamespaceDao implements NamespaceDao {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree, FDB and PostgreSQL should be in their own module.

@flyrain
Copy link
Contributor Author

flyrain commented Feb 17, 2025

The regrest test failure should be fixed by #1015.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants