A Python utility script to detect and decode HTML entities in text-based files (e.g. .html, .txt).
It supports recursive directory traversal, selective decoding, backup creation, and dry-run preview.
- Detects files containing HTML entities (named or numeric)
- Converts HTML entities to UTF-8 characters
- Optionally includes "safe" entities like
,<,&in detection/decoding - Creates timestamped backup folder next to the target directory
- Supports dry-run mode to preview changes without modifying files
Clone the repository and ensure you have Python 3 installed:
git clone https://github.com/ekomateas/decode-html-entities.git
cd decode-html-entitiesNo external dependencies are required — only Python’s standard library.
Run the script with the desired file extension and options:
python decode_html_entities.py .html --dry-run -d ./site -rpython decode_html_entities.py .html --dry-run --include-safe-entities -d ./site -rpython decode_html_entities.py .html -r --include-safe-entities -d ./siteextension— File extension to process (e.g..html)-d, --directory— Target directory (default: current)-r, --recursive— Process subdirectories--include-safe-entities— Also decode safe HTML entities like ,<, etc.--dry-run— Preview changes without modifying files
When not in dry-run mode, the script creates a timestamped backup folder next to the target directory, ensuring safe recovery of original files.
Developed by Euthymios Komateas with Copilot assistance.
MIT License — feel free to use, modify, and distribute.