This repository was archived by the owner on May 4, 2021. It is now read-only.

Description
When running baseline/filter_hunalign_bitext.py , e.g. like this
nohup cat en-de.sent | ~/DataCollection/baseline/filter_hunalign_bitext.py - en-de.filtered --lang1 en --lang2 de -cld2 -deleted en-de.deleted 2> filter.log &
and the process runs out of memory cleaning will stop, filter.log will be empty and there will be no en-de.deleted file. The root cause of this is that the deleted segments are stored in memory.
For now the only indicator of the failure is the missing .deleted file and the work around is to allocate more memory and re-run the cleaning.