You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: featurestore.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ The Abacus.AI platform allows you to process, join and transform raw tabular dat
12
12
|Concept|Definition |
13
13
|--------|--|
14
14
| Datasets |A dataset is a named table definition consisting of a data source (an external system connection, a blob storage URI, or a file upload) and a schema (list of column names along with their data types). A dataset version represents actual materialized data created from this definition. Dataset versions are immutable. Datasets can be setup to refresh periodically - which will result in new versions being created automatically from the data source (not applicable for uploads). Every dataset has a table name that is unique to the organization.|
15
-
| Feature Groups |A feature group is a named table definition which is based on a transformation of the features from datasets or other feature groups. Feature group definitions can be specified using ANSI SQL transformations which reference other dataset and feature group table names directly in the SQL statement. Feature group definitions can also be specified using a user-provided Python function which returns a Pandas Dataframe. Similar to datasets, Feature Groups are just a definition of the transormations and aren't actually applied until you create a Feature Group Version to materialize the data. This can be done via the API or on a refresh schedule. |
15
+
| Feature Groups |A feature group is a named table definition which is based on a transformation of the features from datasets or other feature groups. Feature group definitions can be specified using ANSI SQL transformations which reference other dataset and feature group table names directly in the SQL statement. Feature group definitions can also be specified using a user-provided Python function which returns a Pandas Dataframe. Similar to datasets, Feature Groups are just a definition of the transformations and aren't actually applied until you create a Feature Group Version to materialize the data. This can be done via the API or on a refresh schedule. |
16
16
| Feature | A column in a feature group. |
17
17
| Nested Feature Group | A type of Feature Group that supports time-based windowing of data |
18
18
| Feature Group Version | A materialized snapshot of a Feature Group's data |
@@ -29,7 +29,7 @@ project = client.create_project(name='My first Feature Store Project', use_case=
29
29
30
30
Datasets can be created via uploads [\[example\]](https://github.com/abacusai), file connectors [\[example\]](https://github.com/abacusai) (blob storage providers such as S3 or GCP Storage), or database connectors [\[example\]](https://github.com/abacusai) (Salesforce, Snowflake, BigQuery, etc.).
31
31
32
-
We'll be using the file connector for the demo purposes as we support reading from publicly accesible buckets, however you can verify your own private buckets on the [Connected Services Page](https://abacus.ai/app/profile/connected_services)
32
+
We'll be using the file connector for the demo purposes as we support reading from publicly accessible buckets, however you can verify your own private buckets on the [Connected Services Page](https://abacus.ai/app/profile/connected_services)
33
33
34
34
When creating a dataset, you must assign a **Feature Group Table Name** which is unique to your organization and used when building derivative Feature Groups.
35
35
We'll create two datasets, one containing an event log and the other containing item metadata
Finally, we can create a feature group from these datasets, sepcifying what columns we want as features, and how to join the two tables together. We can do this via ANSI SQL statements or python functions:
48
+
Finally, we can create a feature group from these datasets, specifying what columns we want as features, and how to join the two tables together. We can do this via ANSI SQL statements or python functions:
Data can be added to this dataset using the append_data api call. If the `updateTimestampKey` attribute is not set, we use the server recieve timestamp as the value for the `updateTimestampKey`
248
+
Data can be added to this dataset using the append_data api call. If the `updateTimestampKey` attribute is not set, we use the server receive timestamp as the value for the `updateTimestampKey`
@@ -271,7 +271,7 @@ We can specify a `mergeType` option, which can be a `UNION` or an `INTERSECTION`
271
271
272
272
Concatenation is useful in production settings when we either want to evolve streaming feature groups, or add online updates to a specific table of a feature group that has been developed an initially deployed with offline datasets.
273
273
274
-
- If a feature group was developed starting with a streaming feature group and we want to replace past data, we can concatenate data upto a certan point with a new batch data feature group.
274
+
- If a feature group was developed starting with a streaming feature group and we want to replace past data, we can concatenate data up to a certain point with a new batch data feature group.
0 commit comments