Fix handling of timestamps when ingesting from CSV #16

daniel-thom · 2024-11-12T21:04:21Z

No description provided.

daniel-thom · 2024-11-12T21:05:16Z

src/chronify/csv_io.py

+    time_config = schema.time_config
+    exprs = []
+    for i, column in enumerate(rel.columns):
+        expr = column
+        if isinstance(time_config, DatetimeRange) and column == time_config.time_column:
+            time_type = rel.types[i]
+            if time_type == duckdb.typing.TIMESTAMP and time_config.start.tzinfo is not None:  # type: ignore
+                expr = f"timezone('{time_config.start.tzinfo.key}', {column}) AS {column}"  # type: ignore
+        exprs.append(expr)


This was the main problem. Now we are applying time zones.

It seems like we may not need any of the timezone enums we created in time.py since we can use tzinfo directly.

That might be the case. We might need them if we accept user input definitions as command line arguments or in configuration files.

daniel-thom · 2024-11-12T21:06:29Z

src/chronify/store.py

            table.create(self._engine)

        with self._engine.begin() as conn:
            write_database(rel.to_df(), conn, dst_schema)
+            try:
+                check_timestamps(conn, table, dst_schema)


Previous behavior was incorrect. We need to perform checks on the final table and rollback on error.

src/chronify/time_configs.py

daniel-thom · 2024-11-12T21:07:59Z

src/chronify/time_series_checker.py


 from chronify.exceptions import InvalidTable
 from chronify.models import TableSchema
 from chronify.sqlalchemy.functions import read_database
 from chronify.utils.sql import make_temp_view_name


+def check_timestamps(conn: Connection, table: Table, schema: TableSchema) -> None:


The semantics of this are now different. The user calls this function instead of instantiating the class. The function receives the connection instead of the engine and metadata because this needs to happen within a transaction so that erroneous changes can be rolled back.

codecov-commenter · 2024-11-12T22:44:11Z

Codecov Report

Attention: Patch coverage is 91.54930% with 12 lines in your changes missing coverage. Please review.

Project coverage is 93.50%. Comparing base (8e4a674) to head (ba28060).

Files with missing lines	Patch %	Lines
src/chronify/models.py	70.00%	9 Missing ⚠️
tests/test_store.py	91.42%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
- Coverage   94.24%   93.50%   -0.75%     
==========================================
  Files          18       18              
  Lines         782      831      +49     
==========================================
+ Hits          737      777      +40     
- Misses         45       54       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lixiliu · 2024-11-13T21:38:15Z

src/chronify/time_series_checker.py

-        filters = [f"{x} IS NOT NULL" for x in schema.time_config.list_time_columns()]
+    def _run_timestamp_checks_on_tmp_table(self, table_name: str) -> None:
+        id_cols = ",".join(self._schema.time_array_id_columns)
+        filters = [f"{x} IS NOT NULL" for x in self._schema.time_config.list_time_columns()]
        where_clause = "AND ".join(filters)


This needs to be " AND ", space in front of and behind.

Thanks. Fixed.

lixiliu · 2024-11-13T21:47:48Z

src/chronify/csv_io.py

+    time_config = schema.time_config
+    exprs = []
+    for i, column in enumerate(rel.columns):
+        expr = column
+        if isinstance(time_config, DatetimeRange) and column == time_config.time_column:
+            time_type = rel.types[i]
+            if time_type == duckdb.typing.TIMESTAMP and time_config.start.tzinfo is not None:  # type: ignore
+                expr = f"timezone('{time_config.start.tzinfo.key}', {column}) AS {column}"  # type: ignore
+        exprs.append(expr)


It seems like we may not need any of the timezone enums we created in time.py since we can use tzinfo directly.

daniel-thom requested review from pesap and lixiliu November 12, 2024 21:04

daniel-thom commented Nov 12, 2024

View reviewed changes

src/chronify/time_configs.py Show resolved Hide resolved

daniel-thom commented Nov 12, 2024

View reviewed changes

daniel-thom mentioned this pull request Nov 12, 2024

Fix DatetimeRange timestamp iteration #17

Merged

lixiliu reviewed Nov 13, 2024

View reviewed changes

daniel-thom added 2 commits November 13, 2024 17:29

Fix handling of timestamps when ingesting from CSV

92ed172

Fix SQL syntax

ba28060

daniel-thom force-pushed the fix/ingest-csv branch from f70ccfb to ba28060 Compare November 14, 2024 00:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of timestamps when ingesting from CSV #16

Fix handling of timestamps when ingesting from CSV #16

daniel-thom commented Nov 12, 2024

daniel-thom Nov 12, 2024

lixiliu Nov 13, 2024

daniel-thom Nov 13, 2024

daniel-thom Nov 12, 2024

daniel-thom Nov 12, 2024

codecov-commenter commented Nov 12, 2024 •

edited

Loading

lixiliu Nov 13, 2024

daniel-thom Nov 13, 2024

lixiliu Nov 13, 2024

Fix handling of timestamps when ingesting from CSV #16

Are you sure you want to change the base?

Fix handling of timestamps when ingesting from CSV #16

Conversation

daniel-thom commented Nov 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Nov 12, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Nov 12, 2024 •

edited

Loading