Sometimes it's useful to run an EL pipeline with zero rows in order to create the target tables but not yet load data into them.
In the SDK for taps, we recently added --test=schema to emit only the SCHEMA messages without emitting any RECORD messages.
For non-SDK taps, it would be helpful to have a similar option that excludes all rows or all rows past the first n records.
Note:
- For safety, we should probably also not pass along any
STATE messages when running in this mode. Since we'd be dropping records intentionally, we would not want to pass a bookmark that implied records had been written which were not actually passed.
- To the extent that the tap is still having to process records, there's still a performance hit from having to read all the records from source. However, most pipelines are constrained on target performance, so by skipping the write process, there should still be significant gains for many/most use cases.
Sometimes it's useful to run an EL pipeline with zero rows in order to create the target tables but not yet load data into them.
In the SDK for taps, we recently added
--test=schemato emit only the SCHEMA messages without emitting any RECORD messages.For non-SDK taps, it would be helpful to have a similar option that excludes all rows or all rows past the first
nrecords.Note:
STATEmessages when running in this mode. Since we'd be dropping records intentionally, we would not want to pass a bookmark that implied records had been written which were not actually passed.