Skip to content

Reduce bookmark rewind window#21

Open
amyfain wants to merge 1 commit intosinger-io:masterfrom
amyfain:reduce-bookmark-rewind
Open

Reduce bookmark rewind window#21
amyfain wants to merge 1 commit intosinger-io:masterfrom
amyfain:reduce-bookmark-rewind

Conversation

@amyfain
Copy link
Contributor

@amyfain amyfain commented Nov 20, 2018

This reduces the bookmark rewind. When it is set at 24 hours and we run the tap in 1 hour intervals, we see the same data pass to us 24 times as it's set right now.

@amyfain amyfain mentioned this pull request Nov 20, 2018

date_obj = datetime.datetime.strptime(raw, DATE_FORMAT)
date_obj = date_obj - datetime.timedelta(days=1)
date_obj = date_obj - datetime.timedelta(hours=6)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am reluctant to make this "default" change. If we need to do this, I think the best course of action is to make this a sort of config change where users can pass in their desired time and the tap adjusts accordingly.

@luandy64
Copy link

@cmcarthur Do you know why this was set to 1 day?

@luandy64
Copy link

@amyfain Would it be possible to lower the replication frequency- like run the tap on 3 hour intervals- to combat the duplicate data issue?

@amyfain
Copy link
Contributor Author

amyfain commented Nov 20, 2018

@luandy64 while that's possible, we have our replication frequency set to 1 hour because we want the data as quickly as possible. I'm not sure how to make this a config change, but I think that's a great idea!

@luandy64
Copy link

luandy64 commented Nov 20, 2018

@amyfain We can iterate on this idea- and I invite you to check out the Singer Slack if you aren't there already- but I think it could go like:

  1. Change get_last_record_value_for_table() to have a third argument that defaults to None
  2. In get_last_record_value_for_table() check if the config is present.
  3. If so, pass the value into the datetime.timedelta()
  4. If not, use date_obj = date_obj - datetime.timedelta(days=1)

There might be a better way of doing this, but I'll sleep on that 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants