Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache the md5 sum in Style/Supervisor.php and subclasses to make getHashCode cheaper #4426

Closed
wants to merge 1 commit into from

Conversation

mikkokoo
Copy link
Contributor

This pull request introduces a way to cache the md5 sum in Style/Supervisor.php and its subclasses to try and make getHashCode() less expensive to call.

This also introduces the necessity to call updateHashBeforeUse() on those classes when there is a change that affects the hash code to make the hash code update when needed.

Whether this is an acceptable trade-off will be decided by the maintainers.

I assume this is tested by the existing unit tests.

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

The getHashCode function in PhpOffice\PhpSpreadsheet\Style\Supervisor and its subclasses is expensive. Conditional formatting causes new Styles to be created more than in other cases. Cacheing the md5sum of those classes and marking it to be updated upon a change makes the getHashCode function way less expensive.

Below are comparisons of execution times without and with this pull request. (the ep.xlsx contains a lot of conditional formatting (it also exits with a FormulaError)

The gain is ~15% for the conditional formatting- heavy xlsx->html and ~7.5% when all the unit tests are run.

git checkout HEAD^
time php test.php -f ../phpspreadsheet-test/stuff/ep.xlsx -p -c -a -t -o e.html

real	1m38,819s
user	1m38,624s
sys	0m0,077s

git checkout master
time php test.php -f ../phpspreadsheet-test/stuff/ep.xlsx -p -c -a -t -o e.html

real	1m23,607s
user	1m23,539s
sys	0m0,064s

git checkout HEAD^
time ./vendor/bin/phpunit --color=always

real	1m4,151s
user	0m55,052s
sys	0m7,844s

git checkout master
time ./vendor/bin/phpunit --color=always

real	0m59,519s
user	0m50,573s
sys	0m7,937s

@oleibman
Copy link
Collaborator

It's an interesting idea, but it cannot be used in its present form. It is a breaking change, and users will have to rewrite all their code to accommodate it. That will not be acceptable.

If you can find a way to make it opt-in, so that users who want this can use it without affecting users who don't want it or are ignorant of it, I might consider it. No guarantee that I will implement it - I don't want to have to deal with problems where user opts in and forgets to update hash.

Or, perhaps you can add a "hash needs updating" property to each class and set it whenever a style property is changed. That is probably more difficult than it seems - if, for example, you change a font color, you would need to set "needs updating" for the color (easy), but also for the font (possibly difficult).

@mikkokoo
Copy link
Contributor Author

No worries. The gain went down after finalizing the conditional formatting (due to stopping at first match as it should) and this feels somewhat dangerous like you said.

@mikkokoo mikkokoo closed this Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants