Skip to content

ekomateas/decode-html-entities

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

decode-html-entities

A Python utility script to detect and decode HTML entities in text-based files (e.g. .html, .txt).
It supports recursive directory traversal, selective decoding, backup creation, and dry-run preview.

✨ Features

  • Detects files containing HTML entities (named or numeric)
  • Converts HTML entities to UTF-8 characters
  • Optionally includes "safe" entities like  , <, & in detection/decoding
  • Creates timestamped backup folder next to the target directory
  • Supports dry-run mode to preview changes without modifying files

📦 Installation

Clone the repository and ensure you have Python 3 installed:

git clone https://github.com/ekomateas/decode-html-entities.git
cd decode-html-entities

No external dependencies are required — only Python’s standard library.

🚀 Usage

Run the script with the desired file extension and options:

Preview files that would be modified (excluding safe entities)

python decode_html_entities.py .html --dry-run -d ./site -r

Preview including safe entities

python decode_html_entities.py .html --dry-run --include-safe-entities -d ./site -r

Decode all entities and overwrite files (with backup)

python decode_html_entities.py .html -r --include-safe-entities -d ./site

⚙️ Options

  • extension — File extension to process (e.g. .html)
  • -d, --directory — Target directory (default: current)
  • -r, --recursive — Process subdirectories
  • --include-safe-entities — Also decode safe HTML entities like  , <, etc.
  • --dry-run — Preview changes without modifying files

🛡️ Backup

When not in dry-run mode, the script creates a timestamped backup folder next to the target directory, ensuring safe recovery of original files.

👨‍💻 Author

Developed by Euthymios Komateas with Copilot assistance.

📄 License

MIT License — feel free to use, modify, and distribute.

About

A Python utility script to detect and decode HTML entities in text-based files

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages