-
Notifications
You must be signed in to change notification settings - Fork 8
Implement a concrete AtomDB to support remote connection to multiples AtomDBs #1037
Copy link
Copy link
Open
Open
Copy link
Labels
PublicShould be cc'ed to the public boardShould be cc'ed to the public board
Milestone
Description
This issue description is a WIP. The text is not ready to be used yet.
This is also related to #915.
The high level idea is to allow the user to setup a DAS which, instead of having a DB backend like redis+mongo or mork+mongo, is backed by a list of remote DASes, each one with its own DB backend.
In order to instantiate such a DAS, the user will need to select a new concrete AtomDB type (the one being defined here) which is a RemoteAtomDB. So this idea is to delegate to this RemoteAtomDB everything that's required in order to manipulate data from the various remote DASes.
So what we need to do here is to design how this RemoteAtomDB should be/work. Lets make a pseudo C++ code for it:
class RemoteAtomDB : public AtomDB {
// Constructor expects a JSON file with everything required to configure the RemoteAtomDB.
// Basically this means:
// * connection information for each of the remote DBs it's going to connect to.
RemoteAtomDB(const string& json_file_path);
// Remote AtomDBs this AtomDB is connected to. Instead of
// using AtomDB objects directly here, we use RemoteAtomDBPeer objects
// because we need to keep more state regarding each remote DB
// other than the AtomDB itself. This RemotePeer is basically what we
// have been calling "CachedAtomDB".
//
// Each RemoteAtomDBPeer should have a UID based in the remote AtomDB's connection info.
map<string, RemoteAtomDBPeer> remote_db;
};
class RemoteAtomDBPeer : public AtomDB {
// Each remote peer has 3 AtomDB objects:
// * One to held the remote AtomDB itself
// * One in-memory cache
// * One regular DB-backed (e.g. redis+mongo) AtomDB to persist newly added atoms.
// ---------------------------------------------
// State
// Cache with relevant Atoms. The idea is that all public AtomDB query API should be redirected to
// this cache except for query_for_patterns() which should redirected to the AtomDB below.
// In order to be able to answer for these queries, the cache must be properly fed up with relevant
// atoms.
InMemoryDB cache;
// Remote DB being connected to. This AtomDB is supposed to be used to READ-ONLY.
// query_for_patterns() is delegated to this AtomDB.
AtomDB atomdb;
// Any new atom addition/deletion should be redirected to local_persistence as well as to the AtomDB above.
AtomDB local_persistence;
// This HandleTrie is supposed to be used as a set (no need to have a Value object). It keeps track of
// which LinkSchemas has already been fetched.
HandleTrie fetched_link_templates;
// ---------------------------------------------
// AtomDB API
get_atom() get_node() get_link() {
// Try to get the atom from cache. If found, return it.
// Otherwise try to get it from local_persistence. If found, return it.
// Otherwise, try to get it from atomdb. If found, return it.
// Otherwise return failure to retrieve the atom.
}
// This not exactly the AtomDB API method as it has an extra flag parameter (which can't be defaulted!!).
get_matching_atoms(Atom key, bool local_only) {
// redirect the call to cache and local_persistence, merging results properly (mind performance)
// if local_only flag is set, return.
// Otherwise, redirect the call to atomdb as well and, again, merge results properly
}
// This is the actual AtomDB API method which calls the one above passing "false"
get_matching_atoms(Atom key) {
get_matching_atoms(key, false);
}
query_for_patterns(LinkSchema schema) {
// Check if schema has already been fetched. If yes, redirect the query to cache and return.
// Otherwise, redirect the query to atomdb, feed the result to cache, update fetched_link_templates
// and return.
//
// Feeding the answer to cache means iterating through all query answers extracting the handles of
// each answer, fetching the corresponding atoms and then feeding these atoms to the cache. This way,
// if the same query is made again, it's guaranteed to be answered the same way when redirected to
// the cache. This feeding also allow the cache to answer queries such as get_atom() for elements that
// are present in the query answers.
}
atom_exists() node_exists() link_exists() {
// Try to find the atom in cache. If found, return true.
// Otherwise try to find it in local_persistence. If found, return true.
// Otherwise, try to find it in atomdb and return properly
}
atoms_exists() nodes_exists() links_exists() {
// Similar to the above but each deeper level should be searched for
// only if at leas one atom hasn't been found in the level above.
}
add_*() {
// redirect calls to cache
}
delete_*() {
// redirect calls to cache and local_persistence
}
re_index_patterns() {
// redirect to cache and local_persistence
}
// ---------------------------------------------
// Cache policy API
// There should be an API to fetch atoms explicitly instead of relying on the
// query answers for query_for_patterns(). This API should be public in order to allow caching
// policies be kept either from internal calls (i.e. called automatically inside this class methods
// as queries are made) or from external calls (i.e. some external entity can control the caching
// policies.
fetch(LinkSchema);
release(LinkSchema);
// Return a number in (0..1) which represents how much RAM is still available to use.
double available_ram();
// There should be a thread running in background which wakes up once per second or so.
// When awake, this thread should check for available RAM and cleanup cache automatically when
// available RAM is below a critical value (defined as a static variable parameter).
// Cleanup means removing things from the cache and eventually adding them to local_persistence
auto_cleanup();
};
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PublicShould be cc'ed to the public boardShould be cc'ed to the public board
Type
Projects
Status
In Progress