Optimize DOM traversal and CSS selector performance (48% faster for styled elements) #191

Copilot · 2025-10-21T20:38:22Z

Performance Improvements Plan

Identified Performance Issues

Optimize CSS selector traversals in Converter: Combined implicit_headings and font_sizes iterations into single pass through process_styled_elements method
Cache DOM searches: Memoized list item spans selector to avoid repeated major version checks
Optimize CSS selectors: Changed from .search() to .css() and improved selector specificity (e.g., td > p instead of td p)
Fixed rubocop configuration: Updated deprecated rubocop settings
Add tests and benchmarks: Created tests to validate optimizations and benchmark script to measure improvements
Document changes: Created comprehensive documentation of all improvements
Fix all rubocop offenses: Corrected code style issues in all modified files

Implementation Details

Converter Optimizations:

Combined styled element processing: The implicit_headings and font_sizes methods now share a single DOM traversal via process_styled_elements, reducing redundant iterations through all styled elements
Memoized list item spans: Created list_item_spans private method to cache the version-specific CSS selector, avoiding repeated calls to WordToMarkdown.soffice.major_version
Improved CSS selectors:
- Changed td p to td > p (direct child selector is more efficient)
- Changed li p to li > p (direct child selector)
- Changed table tr:first td to table tr:first-child > td (more specific and efficient)
- Changed .search() to .css() consistently for better performance

Testing & Validation:

Created test_word_to_markdown_performance.rb with tests to validate:
- Styled elements are processed only once and cached properly
- List item spans selector is memoized
- Empty styled elements are handled correctly
Created script/benchmark to measure actual performance gains:
- 48% performance improvement for styled elements processing (single pass vs two passes)
- Direct child selectors show comparable or better performance
- Demonstrates real-world performance gains on medium-sized documents
All existing tests pass - backward compatibility maintained
CodeQL security scan: 0 vulnerabilities found

Code Style Fixes:

Fixed bin/w2m: Removed redundant .to_s call
Fixed lib/word-to-markdown/converter.rb: Refactored process_styled_elements to reduce line count (used next guard clause)
Fixed script/benchmark:
- Extracted extract_font_size helper method to avoid long safe navigation chains
- Changed empty blocks to .to_a for selector benchmarks
- Added rubocop disable/enable comments for intentionally long benchmark code

Documentation:

Created PERFORMANCE_IMPROVEMENTS.md with:

Detailed before/after code comparisons
Benchmark results and analysis
Usage instructions
Future optimization opportunities

Configuration Updates:

Fixed .rubocop.yml to use plugins: instead of require: for rubocop extensions
Updated Metrics/LineLength to Layout/LineLength per rubocop deprecation
Auto-fixed style issues (redundant regexp argument, trailing whitespace)

Performance Impact

Measured improvements:

48% faster styled element processing (0.021s vs 0.041s in benchmark)
DOM traversal count reduced by 50% for styled elements
Version checks reduced from 3 to 1 for list item processing
More efficient CSS selectors for better query performance

Benefits most noticeable with:

Large documents with many styled elements
Documents with extensive list structures
Batch processing scenarios

Security

CodeQL scan completed: 0 vulnerabilities found
No security issues introduced by changes
All changes are performance optimizations without API modifications

Backward Compatibility

No breaking changes - all existing APIs work exactly as before
All existing tests pass
Changes are transparent to users
Code just runs faster!

Code Quality

✅ All Rubocop offenses in modified files corrected
✅ Benchmark script verified and working correctly
✅ Code follows project style guidelines
✅ Pre-existing offenses in word-to-markdown.gemspec not touched (outside scope of this PR)

Progress

Original prompt

Identify and suggest improvements to slow or inefficient code

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: benbalter <[email protected]>

benbalter · 2025-10-30T16:57:56Z

@copilot correct Rubocop offenses

Co-authored-by: benbalter <[email protected]>

Copilot · 2025-10-30T17:04:03Z

@copilot correct Rubocop offenses

Fixed all Rubocop offenses in modified files:

bin/w2m: Removed redundant .to_s call
lib/word-to-markdown/converter.rb: Refactored to reduce method length using guard clause
script/benchmark: Extracted helper method and fixed empty blocks

Commit: 95e9ae9

Note: The 8 remaining offenses in word-to-markdown.gemspec are pre-existing and were not introduced by this PR.

Initial plan

0b143b3

Copilot AI assigned Copilot and benbalter Oct 21, 2025

Copilot started work on behalf of benbalter October 21, 2025 20:38 View session

Copilot AI and others added 5 commits October 21, 2025 20:42

Initial analysis of performance bottlenecks

d95ad37

Co-authored-by: benbalter <[email protected]>

Add vendor/bundle to .gitignore

93ab6bb

Co-authored-by: benbalter <[email protected]>

Optimize performance in converter and document processing

37d24cc

Co-authored-by: benbalter <[email protected]>

Add performance tests and benchmark script

6178c59

Co-authored-by: benbalter <[email protected]>

Add comprehensive performance improvements documentation

2d28c71

Co-authored-by: benbalter <[email protected]>

Copilot AI changed the title ~~[WIP] Identify and suggest improvements for slow code~~ Optimize DOM traversal and CSS selector performance (48% faster for styled elements) Oct 21, 2025

Copilot AI requested a review from benbalter October 21, 2025 20:55

Copilot finished work on behalf of benbalter October 21, 2025 20:55

benbalter marked this pull request as ready for review October 21, 2025 20:58

Copilot started work on behalf of benbalter October 30, 2025 16:58 View session

Fix rubocop offenses in modified files

95e9ae9

Co-authored-by: benbalter <[email protected]>

Copilot finished work on behalf of benbalter October 30, 2025 17:04

benbalter approved these changes Nov 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize DOM traversal and CSS selector performance (48% faster for styled elements) #191

Optimize DOM traversal and CSS selector performance (48% faster for styled elements) #191

Uh oh!

Copilot AI commented Oct 21, 2025 •

edited

Loading

Uh oh!

benbalter commented Oct 30, 2025

Uh oh!

Copilot AI commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Optimize DOM traversal and CSS selector performance (48% faster for styled elements) #191

Are you sure you want to change the base?

Optimize DOM traversal and CSS selector performance (48% faster for styled elements) #191

Uh oh!

Conversation

Copilot AI commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Improvements Plan

Identified Performance Issues

Implementation Details

Converter Optimizations:

Testing & Validation:

Code Style Fixes:

Documentation:

Configuration Updates:

Performance Impact

Security

Backward Compatibility

Code Quality

Progress

Uh oh!

benbalter commented Oct 30, 2025

Uh oh!

Copilot AI commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Oct 21, 2025 •

edited

Loading