Skip to content

Releases: EticaAI/tico-19-hxltm

v1.5.0

12 Nov 17:42
Compare
Choose a tag to compare

Full Changelog: v1.0.0...v1.5.0

Added

Changed

Fixed

  • data/original/terminology/facebook/{.*-XX.csv -> .*.csv}: Removed unknown
    language suffix _XX / -XX used on filenames for Facebook terminology.
    If this is to mean "no specific region" can be simply omitted when
    exchanging data.
    :
    • en_es-XX, en_fr-XX, en_ja-XX, en_nl-XX, en_no-XX, en_pt-XX,
      en_tl-XX
  • scripts/patch/data-terminology-facebook.diff for
    data/original/terminology/facebook/. As per
    RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files
    the manually applied DQUOTE " to non-optional fields with data containing
    , as text.
  • data/original/terminology/facebook/: field targetLang data content
    replaced _ with -:
  • data/original/terminology/facebook/: field targetLang data content
    removed unknown language suffix _XX / -XX

v1.0.0

11 Nov 21:51
Compare
Choose a tag to compare

[1.0.0] - 2021-11-11

Added

  • Fiat lux!
  • Draft of scripts to download data from TICO-19 original sources
  • data/original/terminology/facebook: TICO-19 terminology from Facebook
    • Uses data from tico-19/tico-19.github.io/data/terminologies/f_*, with
      following data normalizations, using as example f_en-pt_XX.csv to
      en_pt-XX.csv:
      • Restrict - language tags delimiter, as per
        IETF Best Current Practice 47
        an common usage in industry.
      • Use single _ for other types of delimiter when necessary. No known
        industry convention on this decision.
        • In the case of language pair on file names this means unambiguously
          separating one language code from another.
      • Remove prefix f_, since now is inferred from folder path.
  • data/original/terminology/google: TICO-19 terminology from Google
    • Uses data from tico-19/tico-19.github.io/data/terminologies/g_*, with
      following data normalizations, using as example g_en_pt-BR.csv to
      en_pt-BR.csv:
      • Remove prefix g_, since now is inferred from folder path.