Skip to content

Commit 277cc46

Browse files
chhabrakadabraadchia
authored andcommitted
feat: BigTable online store (feast-dev#3140)
* Initial implementation of BigTable online store. Signed-off-by: Abhin Chhabra <[email protected]> * Attempt to run bigtable integration tests. Currently focusing on just getting the tests running locally. I've only build python3.8 requirements. Signed-off-by: Abhin Chhabra <[email protected]> * Got the BigTable tests running in local containers Signed-off-by: Abhin Chhabra <[email protected]> * Set serialization version when computing entity ID Signed-off-by: Abhin Chhabra <[email protected]> * Switch to the recommended layout in bigtable. This was recommended by the BigTable dev team. Details of this layout will be added to the documentation in a future commit. Signed-off-by: Abhin Chhabra <[email protected]> * Minor bugfixes. - If a row is empty when fetching data, don't process it more. - If a task in the threadpool fails, bubble up that failure. - If a `created_ts` is not available, use an empty string. `None` does not automatically serialize to bytes. Signed-off-by: Abhin Chhabra <[email protected]> * Move BigTable online store out of contrib As per feedback on the PR. Signed-off-by: Abhin Chhabra <[email protected]> * Attempt to run integration tests in CI. Provide the GCP project and the bigtable instance ID for the tests to connect to. Signed-off-by: Abhin Chhabra <[email protected]> * Delete tables for entity-less feature views. Signed-off-by: Abhin Chhabra <[email protected]> * Table names should be smaller than 50 characters This is BigTable's table length limit and it's causing test failures. Signed-off-by: Abhin Chhabra <[email protected]> * Optimize bigtable reads. - Fetch all the rows in one bigtable fetch. - Get only the columns that are necessary (using a column regex filter). Signed-off-by: Abhin Chhabra <[email protected]> * dynamodb: switch to `mock_dynamodb` The latest rebuilding of requirements has upgraded the `moto` library past the `4.0.0` release, which has a couple of breaking changes. Specifically, the `mock_dynamodb2` decorator has been deprecated. See https://github.com/spulec/moto/blob/master/CHANGELOG.md#400 for more details. The actual PR (getmoto/moto#4919) mentions that it's because the `mock_dynamodb` decorator is now equivalent to the `mock_dynamodb2` decorator. Signed-off-by: Abhin Chhabra <[email protected]> * minor: rename `BigTable` to `Bigtable` This matches the GCP docs. Signed-off-by: Abhin Chhabra <[email protected]> * Wrote some Bigtable documentation. Closely mirrors the docs for the other online stores. Signed-off-by: Abhin Chhabra <[email protected]> * Bugfix: Deal with missing row keys. It looks like the bigtable client will just skip over non-existent row keys. Signed-off-by: Abhin Chhabra <[email protected]> * Fix linting issues. Signed-off-by: Abhin Chhabra <[email protected]> * Generate requirements files. - As of version `1.49`, the various python packages in the [grpc repo](https://github.com/grpc/grpc/tree/master/src/python) require `protobuf>=4.21.3`. Unfortunately, this is incompatible with all versions of `tensorflow-metadata` (see [this issue](tensorflow/metadata#37)). And since `piptools` doesn't backtrack during dependency resolution, the requirement files cannot be regenerated without adding an upper limit on these grpc libraries directly in `setup.py`. - The previous attempt to upgrade usages of the `mock_dynamodb2` decorator to the newest version failed. Since I'm not an expert in dynamodb, it made sense to just cap the test tool to the version already being used in CI. Signed-off-by: Abhin Chhabra <[email protected]> * Don't bother materializing created timestamp. Had a discussion with Danny about whether it's useful to copy this column. He agreed that there's no value to storing this in the online store. Signed-off-by: Abhin Chhabra <[email protected]> * Remove `tensorflow-metadata`. Turns out that this dependency is not required. We removed all references to it in [this PR](feast-dev#2063), but did not remove it from `setup.py`. Removing it has caused many of the restrictions imposed in previous commits to be unnecessary. Signed-off-by: Abhin Chhabra <[email protected]> * Minor fix to Bigtable documentation. Feedback from Danny mentioned that Bigtable should be able to store multiple versions of the same key and fetch the latest at read time. This makes sense and means that concurrent writes should work just fine. Signed-off-by: Abhin Chhabra <[email protected]> * update roadmap docs Signed-off-by: Danny Chiao <[email protected]> * Fix roadmap doc Signed-off-by: Danny Chiao <[email protected]> * Change link to point to roadmap page Signed-off-by: Danny Chiao <[email protected]> * change order in roadmap Signed-off-by: Danny Chiao <[email protected]> Signed-off-by: Abhin Chhabra <[email protected]> Signed-off-by: Abhin Chhabra <[email protected]> Signed-off-by: Danny Chiao <[email protected]> Co-authored-by: Danny Chiao <[email protected]>
1 parent 3a37242 commit 277cc46

File tree

19 files changed

+900
-407
lines changed

19 files changed

+900
-407
lines changed

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -173,12 +173,12 @@ The list below contains the functionality that contributors are planning to deve
173173
* [x] [DynamoDB](https://docs.feast.dev/reference/online-stores/dynamodb)
174174
* [x] [Redis](https://docs.feast.dev/reference/online-stores/redis)
175175
* [x] [Datastore](https://docs.feast.dev/reference/online-stores/datastore)
176+
* [x] [Bigtable](https://docs.feast.dev/reference/online-stores/bigtable)
176177
* [x] [SQLite](https://docs.feast.dev/reference/online-stores/sqlite)
177178
* [x] [Azure Cache for Redis (community plugin)](https://github.com/Azure/feast-azure)
178179
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/online-stores/postgres)
180+
* [x] [Cassandra / AstraDB (contrib plugin)](https://docs.feast.dev/reference/online-stores/cassandra)
179181
* [x] [Custom online store support](https://docs.feast.dev/how-to-guides/adding-support-for-a-new-online-store)
180-
* [x] [Cassandra / AstraDB](https://docs.feast.dev/reference/online-stores/cassandra)
181-
* [ ] Bigtable (in progress)
182182
* **Feature Engineering**
183183
* [x] On-demand Transformations (Alpha release. See [RFC](https://docs.google.com/document/d/1lgfIw0Drc65LpaxbUu49RCeJgMew547meSJttnUqz7c/edit#))
184184
* [x] Streaming Transformations (Alpha release. See [RFC](https://docs.google.com/document/d/1UzEyETHUaGpn0ap4G82DHluiCj7zEbrQLkJJkKSv4e8/edit))

docs/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@
9292
* [Redis](reference/online-stores/redis.md)
9393
* [Datastore](reference/online-stores/datastore.md)
9494
* [DynamoDB](reference/online-stores/dynamodb.md)
95+
* [Bigtable](reference/online-stores/bigtable.md)
9596
* [PostgreSQL (contrib)](reference/online-stores/postgres.md)
9697
* [Cassandra + Astra DB (contrib)](reference/online-stores/cassandra.md)
9798
* [MySQL (contrib)](reference/online-stores/mysql.md)

docs/getting-started/third-party-integrations.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Don't see your offline store or online store of choice here? Check out our guide
1111

1212
## Integrations
1313

14-
See [Functionality and Roadmap](../../#-functionality-and-roadmap)
14+
See [Functionality and Roadmap](../roadmap.md)
1515

1616
## Standards
1717

docs/reference/online-stores/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,10 @@ Please see [Online Store](../../getting-started/architecture-and-components/onli
2626
[dynamodb.md](dynamodb.md)
2727
{% endcontent-ref %}
2828

29+
{% content-ref url="bigtable.md" %}
30+
[bigtable.md](mysql.md)
31+
{% endcontent-ref %}
32+
2933
{% content-ref url="postgres.md" %}
3034
[postgres.md](postgres.md)
3135
{% endcontent-ref %}
+56
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Bigtable online store
2+
3+
## Description
4+
5+
The [Bigtable](https://cloud.google.com/bigtable) online store provides support for
6+
materializing feature values into Cloud Bigtable. The data model used to store feature
7+
values in Bigtable is described in more detail
8+
[here](../../specs/online_store_format.md#google-bigtable-online-store-format).
9+
10+
## Getting started
11+
12+
In order to use this online store, you'll need to run `pip install 'feast[gcp]'`. You
13+
can then get started with the command `feast init REPO_NAME -t gcp`.
14+
15+
## Example
16+
17+
{% code title="feature_store.yaml" %}
18+
```yaml
19+
project: my_feature_repo
20+
registry: data/registry.db
21+
provider: gcp
22+
online_store:
23+
type: bigtable
24+
project_id: my_gcp_project
25+
instance: my_bigtable_instance
26+
```
27+
{% endcode %}
28+
29+
The full set of configuration options is available in
30+
[BigtableOnlineStoreConfig](https://rtd.feast.dev/en/latest/#feast.infra.online_stores.bigtable.BigtableOnlineStoreConfig).
31+
32+
## Functionality Matrix
33+
34+
The set of functionality supported by online stores is described in detail [here](overview.md#functionality).
35+
Below is a matrix indicating which functionality is supported by the Bigtable online store.
36+
37+
| | Bigtable |
38+
|-----------------------------------------------------------|----------|
39+
| write feature values to the online store | yes |
40+
| read feature values from the online store | yes |
41+
| update infrastructure (e.g. tables) in the online store | yes |
42+
| teardown infrastructure (e.g. tables) in the online store | yes |
43+
| generate a plan of infrastructure changes | no |
44+
| support for on-demand transforms | yes |
45+
| readable by Python SDK | yes |
46+
| readable by Java | no |
47+
| readable by Go | no |
48+
| support for entityless feature views | yes |
49+
| support for concurrent writing to the same key | yes |
50+
| support for ttl (time to live) at retrieval | no |
51+
| support for deleting expired data | no |
52+
| collocated by feature view | yes |
53+
| collocated by feature service | no |
54+
| collocated by entity key | yes |
55+
56+
To compare this set of functionality against other online stores, please see the full [functionality matrix](overview.md#functionality-matrix).

docs/roadmap.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,12 @@ The list below contains the functionality that contributors are planning to deve
3131
* [x] [DynamoDB](https://docs.feast.dev/reference/online-stores/dynamodb)
3232
* [x] [Redis](https://docs.feast.dev/reference/online-stores/redis)
3333
* [x] [Datastore](https://docs.feast.dev/reference/online-stores/datastore)
34+
* [x] [Bigtable](https://docs.feast.dev/reference/online-stores/bigtable)
3435
* [x] [SQLite](https://docs.feast.dev/reference/online-stores/sqlite)
3536
* [x] [Azure Cache for Redis (community plugin)](https://github.com/Azure/feast-azure)
3637
* [x] [Postgres (contrib plugin)](https://docs.feast.dev/reference/online-stores/postgres)
38+
* [x] [Cassandra / AstraDB (contrib plugin)](https://docs.feast.dev/reference/online-stores/cassandra)
3739
* [x] [Custom online store support](https://docs.feast.dev/how-to-guides/adding-support-for-a-new-online-store)
38-
* [x] [Cassandra / AstraDB](https://docs.feast.dev/reference/online-stores/cassandra)
39-
* [ ] Bigtable (in progress)
4040
* **Feature Engineering**
4141
* [x] On-demand Transformations (Alpha release. See [RFC](https://docs.google.com/document/d/1lgfIw0Drc65LpaxbUu49RCeJgMew547meSJttnUqz7c/edit#))
4242
* [x] Streaming Transformations (Alpha release. See [RFC](https://docs.google.com/document/d/1UzEyETHUaGpn0ap4G82DHluiCj7zEbrQLkJJkKSv4e8/edit))

docs/specs/online_store_format.md

+24-1
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,29 @@ Other types of entity keys are not supported in this version of the specificatio
9292

9393
![Datastore Online Example](datastore_online_example.png)
9494

95+
## Google Bigtable Online Store Format
96+
97+
[Bigtable storage model](https://cloud.google.com/bigtable/docs/overview#storage-model)
98+
consists of massively scalable tables, with each row keyed by a "row key". The rows in a
99+
table are stored lexicographically sorted by this row key.
100+
101+
We use the following structure to store feature data in Bigtable:
102+
103+
* All feature data for an entity or a specific group of entities is stored in the same
104+
table. The table name is derived by concatenating the lexicographically sorted names
105+
of entities.
106+
* This implementation only uses one column family per table, named `features`.
107+
* Each row key is created by concatenating a hash derived from the specific entity keys
108+
and the name of the feature view. Each row only stores feature values for a specific
109+
feature view. This arrangement also means that feature values for a given group of
110+
entities are colocated.
111+
* The columns used in each row are named after the features in the feature view.
112+
Bigtable is perfectly content being sparsely populated.
113+
* By default, we store 1 historical value of each feature value. This can be configured
114+
using the `max_versions` setting in `BigtableOnlineStoreConfig`. This implementation
115+
of the online store does not have the ability to revert any given value to its old
116+
self. To use the historical version, you'll have to use custom code.
117+
95118
## Cassandra/Astra DB Online Store Format
96119

97120
### Overview
@@ -250,4 +273,4 @@ message BoolList {
250273
repeated bool val = 1;
251274
}
252275
253-
```
276+
```

0 commit comments

Comments
 (0)