Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process is exporting the same node/day multiple times per day #1

Open
caparker opened this issue Sep 23, 2022 · 1 comment
Open

Process is exporting the same node/day multiple times per day #1

caparker opened this issue Sep 23, 2022 · 1 comment

Comments

@caparker
Copy link
Collaborator

I think this is due to the way that we are updating the open_data_export_log with new node/days to export. Right now we are doing an upsert that will update an existing row with new data as it comes in. This updates the 'modified_on` and would trigger a new export. For the realtime there is a lot of overlap in the data that comes in for each file. So we are probably updating that export log needlessly many of the time.

@caparker
Copy link
Collaborator Author

I am going to test a solution where we only update the export log using only the values that were actually imported. To do this I am going to add another temp table to the flow. So the new measurements would look like this

WITH inserts AS (
  INSERT INTO measurements (sensors_id, datetime, value)
  SELECT sensors_id
  , datetime
  , value
  FROM tempfetchdata
  ON CONFLICT DO NOTHING
  RETURNING sensors_id, datetime
), inserted as (
   INSERT INTO temp_inserted_measurements (sensors_id, datetime)
   SELECT sensors_id
   , datetime
   FROM inserts
)
SELECT MIN(datetime)
, MAX(datetime)
, COUNT(1)
INTO __inserted_start_datetime
, __inserted_end_datetime
, __inserted_measurements
FROM inserts;

And then we use the inserted data to update the export logs like so

WITH e AS (
INSERT INTO open_data_export_logs (sensor_nodes_id, day, records, measurands, modified_on)
SELECT sn.sensor_nodes_id
, ((m.datetime - '1sec'::interval) AT TIME ZONE (COALESCE(sn.metadata->>'timezone', 'UTC'))::text)::date as day
, COUNT(1)
, COUNT(DISTINCT p.measurands_id)
, MAX(now())
FROM temp_inserted_measurements m --tempfetchdata m
JOIN sensors s ON (m.sensors_id = s.sensors_id)
JOIN measurands p ON (s.measurands_id = p.measurands_id)
JOIN sensor_systems ss ON (s.sensor_systems_id = ss.sensor_systems_id)
JOIN sensor_nodes sn ON (ss.sensor_nodes_id = sn.sensor_nodes_id)
GROUP BY sn.sensor_nodes_id
, ((m.datetime - '1sec'::interval) AT TIME ZONE (COALESCE(sn.metadata->>'timezone', 'UTC'))::text)::date
ON CONFLICT (sensor_nodes_id, day) DO UPDATE
SET records = EXCLUDED.records
, measurands = EXCLUDED.measurands
, modified_on = EXCLUDED.modified_on
RETURNING 1)
SELECT COUNT(1) INTO __exported_days
FROM e;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant