Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converter minifier / optimizations overlap #37

Open
CosmicHorrorDev opened this issue Dec 1, 2024 · 3 comments · May be fixed by #38
Open

Converter minifier / optimizations overlap #37

CosmicHorrorDev opened this issue Dec 1, 2024 · 3 comments · May be fixed by #38
Assignees

Comments

@CosmicHorrorDev
Copy link
Collaborator

Currently some converter optimizations are always applied through a minifier on the parsed representation that discards duplicate styles from being applied

pub fn ansi_to_html(
mut input: &str,
ansi_regex: &Regex,
four_bit_var_prefix: Option<String>,
) -> Result<String, Error> {
let mut minifier = minifier::Minifier::new(four_bit_var_prefix);

While even more are applied through two regexes on the converted text where the first removes empty tags and the second removes the reapplication of the same tag

const OPT_REGEX_1: &str = r"<span \w+='[^']*'></span>|<b></b>|<i></i>|<u></u>|<s></s>";
const OPT_REGEX_2: &str = "</b><b>|</i><i>|</u><u>|</s><s>";


Three different things of note...

If we teach the minifier how to remove empty tags then it should apply all of the optimizations that the regexes apply (technically it could be even better since the regexes only apply once despite being able to optimize further after the first pass)

Well the fact that the minifier currently always applies some optimizations makes Converter::skip_optimize into only a half-truth. Really we should only apply the minifier when optimizations are enabled, but considering that no one has complained maybe it's a good sign to enable optimizations by default or always enable optimizations (removing the toggle)

If the minifer took the place of both of the optimization regexes then ansi-to-html would be down to just one regex usage (ANSI_REGEX)

const ANSI_REGEX: &str = r"\u{1b}(\[[0-9;?]*[A-HJKSTfhilmnsu]|\(B)";

which is normally my hint to re-evaluate if regex is carrying its weight as a dependency or if it can be replaced with some simpler parser instead of the (relatively heavy) regex crate

@Aloso
Copy link
Owner

Aloso commented Dec 1, 2024

Fully agree. It would be good if we can remove the regex dependency, and writing a parser for the ANSI_REGEX regex by hand should be fairly simple.

@Aloso
Copy link
Owner

Aloso commented Dec 1, 2024

Re. Converter::skip_optimize, I think we can just mark it as #[deprecated]

@CosmicHorrorDev CosmicHorrorDev self-assigned this Dec 1, 2024
@CosmicHorrorDev CosmicHorrorDev linked a pull request Dec 3, 2024 that will close this issue
@CosmicHorrorDev
Copy link
Collaborator Author

On second thought I think it'd be fine to keep the option to skip optimizations. It's not that hard to thread things through to have it working properly again, and it could prove to be a useful toggle for fuzzing to ensure that that the minified output is semantically equivalent to the full output

Applying optimizations without using regexes I'm still 100% on-board with. It'll just be a bit before I have spare time to work on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants