Skip to content

Commit

Permalink
fititnt/hxltm-action#5: data-normalization #1
Browse files Browse the repository at this point in the history
  • Loading branch information
fititnt committed Nov 11, 2021
1 parent f813581 commit 8ae6918
Show file tree
Hide file tree
Showing 5 changed files with 61 additions and 7 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,7 @@ data/original/terminologies.zip
data/original/tico19-testset.zip
!.gitignore
!README.md
tmp/
tmp/

# temp
data/original/terminology/facebook/*.csv
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Changelog

## [Unreleased]
### Added
- TODO

## [0.9.0] - 2020-11-11
### Added
- **Fiat lux!**
- Draft of scripts to download data from TICO-19 original sources
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# tico-19-hxltm
**[draft] Public domain datasets from Translation Initiative for COVID-19
on the format HXLTM (Multilingual Terminology in Humanitarian Language Exchange)**
**[draft] Public domain datasets from
[Translation Initiative for COVID-19](tico-19.github.io) on the format
HXLTM (Multilingual Terminology in Humanitarian Language Exchange).**

> TODO: move to @EticaAI organization
> TODO: move to [@EticaAI](https://github.com/EticaAI) organization and
publish on a subdomain.

## License

Expand Down
2 changes: 1 addition & 1 deletion scripts/data-original-download.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#
# OPTIONS: ---
#
# REQUIREMENTS: ---
# REQUIREMENTS: - git
# BUGS: ---
# NOTES: ---
# AUTHORS: Emerson Rocha <rocha[at]ieee.org>
Expand Down
43 changes: 41 additions & 2 deletions scripts/data-original-prepare.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
#
# OPTIONS: ---
#
# REQUIREMENTS: ---
# REQUIREMENTS: - rename
# - rsync
# BUGS: ---
# NOTES: ---
# AUTHORS: Emerson Rocha <rocha[at]ieee.org>
Expand All @@ -22,14 +23,52 @@
# ==============================================================================
set -e

PWD_NOW=$(pwd)
TMP_DIR="tmp"
DATA_DIR="data"
DATA_ORIGINAL_DIR="data/original"
DATA_ORIGINAL_GIT_DIR="tmp/original-git"

set -x
rsync --archive --verbose "${DATA_ORIGINAL_GIT_DIR}/data/" "$DATA_ORIGINAL_DIR/"
set +x
# set +x


# cd "$DATA_ORIGINAL_DIR/terminologies"
# pwd

# Copy
find "$DATA_ORIGINAL_DIR/terminologies/" -name 'f_*' -type f -exec cp "{}" "$DATA_ORIGINAL_DIR/terminology/facebook" \;

# Rename
# find "$DATA_ORIGINAL_DIR/terminology/facebook/" -name 'f_*' -type f -exec ls "{}" \;

# rename 's/f_//' "$DATA_ORIGINAL_DIR/terminology/facebook/*.csv"

# find "$DATA_ORIGINAL_DIR/terminology/facebook/" -name 'f_*' -type f -exec rename 's/f_//_' "{}" \;


# echo 'oi'
# find f_* -type f | sed -n "s/f_//" | xargs print
# echo 'bye'
# echo 'oi2'
# find f_* -type f -exec sed -n "s/f_//" {} \;
# echo 'bye2'
# echo 'oi2'
# find ./ -type f -exec sed -i -e 's/f_//g' {} \;
# echo 'bye2'
# find f_* -type f -print0
# # ecfind f_* -type f -print0 | xargs --null -I{} mv {} {}_renamed
# echo 'bye3'



# find . -type f |
# sed -n "s/\(.*\)factory\.py$/& \1service\.py/p" |
# xargs -p -n 2 mv

# for


# if [ ! -d "${DATA_ORIGINAL_DIR}/terminology" ]; then
# mkdir "${DATA_ORIGINAL_DIR}/terminology"
Expand Down

0 comments on commit 8ae6918

Please sign in to comment.