⚡️ Speed up method `ElementHtml._get_children_html` by 234% #4087

aseembits93 · 2025-08-26T16:56:54Z

📄 234% (2.34x) speedup for `ElementHtml._get_children_html` in `unstructured/partition/html/convert.py`

⏱️ Runtime : 12.3 milliseconds → 3.69 milliseconds (best of 101 runs)

📝 Explanation and details

Here is a faster rewrite of your program, based on your line profiling results, the imported code constraints, and the code logic.

Key optimizations.

Avoid repeated parsing: The hotspot is in recursive calls to child.get_html_element(**kwargs), each of which is re-creating a new BeautifulSoup object in every call.
Solution: Pass down and reuse a single BeautifulSoup instance when building child HTML elements.
Minimize object creation: Create soup once at the topmost call and reuse for all children and subchildren.
Reduce .get_text_as_html use: Optimize to only use the soup instance when really necessary and avoid repeated blank parses.
Avoid double wrapping: Only allocate wrappers and new tags if absolutely required.
General micro-optimizations: Use None instead of or [], fast-path checks on empty children, etc.
Preserve all comments and signatures as specified.

Below is the optimized version.

Explanation of improvements

Soup passing: The get_html_element method now optionally receives a _soup kwarg. At the top of the tree, it is None, so a new one is created. Then, for all descendants, the same soup instance is passed via _soup, avoiding repeated parsing and allocation.
Children check: self.children is checked once, and the attribute itself is kept as a list (not or-ed with empty list at every call).
No unnecessary soup parsing: get_text_as_html() doesn't need a soup argument, since it only returns a Tag (from the parent module).
No changes to existing comments, new comments added only where logic was changed.
Behavior (output and signature) preserved.

This avoids creating thousands of BeautifulSoup objects recursively, which was the primary bottleneck found in the profiler. The result is vastly improved performance, especially for large/complex trees.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 768 Passed
⏪ Replay Tests	✅ 1 Passed
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

from abc import ABC
from typing import Any, List, Optional, Union

# imports
import pytest  # used for our unit tests
from bs4 import BeautifulSoup, Tag
from unstructured.partition.html.convert import ElementHtml

# --- Minimal stubs for dependencies ---

class Metadata:
    def __init__(self, text_as_html: Optional[str] = None):
        self.text_as_html = text_as_html

class Element:
    def __init__(self, text="", category="default", id="0", metadata=None):
        self.text = text
        self.category = category
        self.id = id
        self.metadata = metadata or Metadata()

# --- The function and class under test ---

HTML_PARSER = "html.parser"

# --- Test helpers ---

class DummyElementHtml(ElementHtml):
    """A concrete subclass for testing, with optional custom tag."""
    def __init__(self, element, children=None, html_tag="div"):
        super().__init__(element, children)
        self._html_tag = html_tag

# --- Unit tests for _get_children_html ---

@pytest.fixture
def soup():
    # Fixture for a BeautifulSoup object
    return BeautifulSoup("", HTML_PARSER)

def make_tag(soup, name, text=None, **attrs):
    tag = soup.new_tag(name)
    if text:
        tag.string = text
    for k, v in attrs.items():
        tag[k] = v
    return tag

# 1. BASIC TEST CASES

def test_single_child_basic(soup):
    """Single child: Should wrap parent and child in a div, in order."""
    parent_el = Element("Parent", category="parent", id="p1")
    child_el = Element("Child", category="child", id="c1")
    child = DummyElementHtml(child_el)
    parent = DummyElementHtml(parent_el, children=[child])
    # Prepare the parent tag
    parent_tag = make_tag(soup, "div", "Parent", **{"class": "parent", "id": "p1"})
    # Call _get_children_html
    codeflash_output = parent._get_children_html(soup, parent_tag); result = codeflash_output
    divs = result.find_all("div", recursive=False)

def test_multiple_children_basic(soup):
    """Multiple children: All children should be appended in order."""
    parent_el = Element("Parent", category="parent", id="p1")
    child1_el = Element("Child1", category="child", id="c1")
    child2_el = Element("Child2", category="child", id="c2")
    child1 = DummyElementHtml(child1_el)
    child2 = DummyElementHtml(child2_el)
    parent = DummyElementHtml(parent_el, children=[child1, child2])
    parent_tag = make_tag(soup, "div", "Parent", **{"class": "parent", "id": "p1"})
    codeflash_output = parent._get_children_html(soup, parent_tag); result = codeflash_output
    divs = result.find_all("div", recursive=False)

def test_no_children_returns_parent_wrapped(soup):
    """No children: Should still wrap parent in a div."""
    parent_el = Element("Parent", category="parent", id="p1")
    parent = DummyElementHtml(parent_el, children=[])
    parent_tag = make_tag(soup, "div", "Parent", **{"class": "parent", "id": "p1"})
    codeflash_output = parent._get_children_html(soup, parent_tag); result = codeflash_output
    inner_divs = result.find_all("div", recursive=False)

def test_children_with_different_tags(soup):
    """Children with different HTML tags should be preserved."""
    parent_el = Element("Parent", category="parent", id="p1")
    child_el = Element("Child", category="child", id="c1")
    child = DummyElementHtml(child_el, html_tag="span")
    parent = DummyElementHtml(parent_el, children=[child])
    parent_tag = make_tag(soup, "div", "Parent", **{"class": "parent", "id": "p1"})
    codeflash_output = parent._get_children_html(soup, parent_tag); result = codeflash_output

# 2. EDGE TEST CASES

def test_empty_element_text_and_children(soup):
    """Parent and children have empty text."""
    parent_el = Element("", category="parent", id="p1")
    child_el = Element("", category="child", id="c1")
    child = DummyElementHtml(child_el)
    parent = DummyElementHtml(parent_el, children=[child])
    parent_tag = make_tag(soup, "div", "", **{"class": "parent", "id": "p1"})
    codeflash_output = parent._get_children_html(soup, parent_tag); result = codeflash_output
    divs = result.find_all("div", recursive=False)

def test_deeply_nested_children(soup):
    """Test with deep nesting (e.g., 5 levels)."""
    # Build a chain: root -> c1 -> c2 -> c3 -> c4 -> c5
    el = Element("root", category="cat0", id="id0")
    node = DummyElementHtml(el)
    for i in range(1, 6):
        el = Element(f"c{i}", category=f"cat{i}", id=f"id{i}")
        node = DummyElementHtml(el, children=[node])
    # At the top, node is the outermost parent
    parent_tag = make_tag(soup, "div", "c5", **{"class": "cat5", "id": "id5"})
    codeflash_output = node._get_children_html(soup, parent_tag); result = codeflash_output
    # Should have one child at each level
    current = result
    for i in range(6):
        divs = [c for c in current.contents if isinstance(c, Tag)]
        current = divs[0]

def test_html_injection_in_text(soup):
    """Child text that looks like HTML should be escaped, not parsed as HTML."""
    parent_el = Element("Parent", category="parent", id="p1")
    child_el = Element("<b>bold</b>", category="child", id="c1")
    child = DummyElementHtml(child_el)
    parent = DummyElementHtml(parent_el, children=[child])
    parent_tag = make_tag(soup, "div", "Parent", **{"class": "parent", "id": "p1"})
    codeflash_output = parent._get_children_html(soup, parent_tag); result = codeflash_output
    # The child div should have literal text, not a <b> tag inside
    child_div = result.find_all("div", recursive=False)[1]

def test_children_with_duplicate_ids(soup):
    """Multiple children with the same id."""
    parent_el = Element("Parent", category="parent", id="p1")
    child1_el = Element("Child1", category="child", id="dup")
    child2_el = Element("Child2", category="child", id="dup")
    child1 = DummyElementHtml(child1_el)
    child2 = DummyElementHtml(child2_el)
    parent = DummyElementHtml(parent_el, children=[child1, child2])
    parent_tag = make_tag(soup, "div", "Parent", **{"class": "parent", "id": "p1"})
    codeflash_output = parent._get_children_html(soup, parent_tag); result = codeflash_output
    # Both children should be present, even with duplicate ids
    divs = result.find_all("div", recursive=False)

def test_children_with_none(soup):
    """Children list contains None (should ignore or raise)."""
    parent_el = Element("Parent", category="parent", id="p1")
    child_el = Element("Child", category="child", id="c1")
    child = DummyElementHtml(child_el)
    parent = DummyElementHtml(parent_el, children=[child, None])
    parent_tag = make_tag(soup, "div", "Parent", **{"class": "parent", "id": "p1"})
    # Should raise AttributeError when trying to call get_html_element on None
    with pytest.raises(AttributeError):
        parent._get_children_html(soup, parent_tag)

# 3. LARGE SCALE TEST CASES

def test_many_children_performance(soup):
    """Test with 500 children: structure and order."""
    parent_el = Element("Parent", category="parent", id="p1")
    children = [DummyElementHtml(Element(f"Child{i}", category="child", id=f"c{i}")) for i in range(500)]
    parent = DummyElementHtml(parent_el, children=children)
    parent_tag = make_tag(soup, "div", "Parent", **{"class": "parent", "id": "p1"})
    codeflash_output = parent._get_children_html(soup, parent_tag); result = codeflash_output
    divs = result.find_all("div", recursive=False)

def test_large_tree_width_and_depth(soup):
    """Test with a tree of width 10 and depth 3 (total 1 + 10 + 100 = 111 nodes)."""
    def make_tree(depth, width):
        if depth == 0:
            return []
        return [
            DummyElementHtml(
                Element(f"Child{depth}_{i}", category="cat", id=f"id{depth}_{i}"),
                children=make_tree(depth-1, width)
            )
            for i in range(width)
        ]
    parent_el = Element("Root", category="root", id="root")
    children = make_tree(2, 10)  # depth=2, width=10 at each node
    parent = DummyElementHtml(parent_el, children=children)
    parent_tag = make_tag(soup, "div", "Root", **{"class": "root", "id": "root"})
    codeflash_output = parent._get_children_html(soup, parent_tag); result = codeflash_output
    # The first level should have 1 parent + 10 children
    divs = result.find_all("div", recursive=False)
    # Each child should have its own children (10 each)
    for child_div in divs[1:]:
        sub_divs = child_div.find_all("div", recursive=False)

def test_large_text_content(soup):
    """Test with a single child with a very large text string."""
    large_text = "A" * 10000
    parent_el = Element("Parent", category="parent", id="p1")
    child_el = Element(large_text, category="child", id="c1")
    child = DummyElementHtml(child_el)
    parent = DummyElementHtml(parent_el, children=[child])
    parent_tag = make_tag(soup, "div", "Parent", **{"class": "parent", "id": "p1"})
    codeflash_output = parent._get_children_html(soup, parent_tag); result = codeflash_output
    # The child div should contain the large text exactly
    child_div = result.find_all("div", recursive=False)[1]

def test_children_with_varied_tags_and_attributes(soup):
    """Test children with different tags and extra attributes."""
    parent_el = Element("P", category="parent", id="p")
    child1_el = Element("C1", category="c1", id="c1")
    child2_el = Element("C2", category="c2", id="c2")
    child1 = DummyElementHtml(child1_el, html_tag="section")
    child2 = DummyElementHtml(child2_el, html_tag="article")
    parent = DummyElementHtml(parent_el, children=[child1, child2])
    parent_tag = make_tag(soup, "header", "P", **{"class": "parent", "id": "p"})
    codeflash_output = parent._get_children_html(soup, parent_tag); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from abc import ABC
from typing import Any, Optional, Union

# imports
import pytest  # used for our unit tests
from bs4 import BeautifulSoup, Tag
from unstructured.partition.html.convert import ElementHtml


# Minimal stub for Element and its metadata
class Metadata:
    def __init__(self, text_as_html=None):
        self.text_as_html = text_as_html

class Element:
    def __init__(self, text="", category=None, id=None, metadata=None):
        self.text = text
        self.category = category or "default-category"
        self.id = id or "default-id"
        self.metadata = metadata or Metadata()

HTML_PARSER = "html.parser"

# ---------------------------
# Unit tests for _get_children_html
# ---------------------------

# Helper subclass to expose _get_children_html for testing
class TestElementHtml(ElementHtml):
    def public_get_children_html(self, soup, element_html, **kwargs):
        return self._get_children_html(soup, element_html, **kwargs)

    # Override get_html_element to avoid recursion issues in tests
    def get_html_element(self, **kwargs: Any) -> Tag:
        soup = BeautifulSoup("", HTML_PARSER)
        element_html = self.get_text_as_html()
        if element_html is None:
            element_html = soup.new_tag(name=self.html_tag)
            self._inject_html_element_content(element_html, **kwargs)
        element_html["class"] = self.element.category
        element_html["id"] = self.element.id
        self._inject_html_element_attrs(element_html)
        if self.children:
            return self._get_children_html(soup, element_html, **kwargs)
        return element_html

# ---- BASIC TEST CASES ----

def test_single_child_basic():
    # Test with one parent and one child
    parent_elem = Element(text="Parent", category="parent-cat", id="parent-id")
    child_elem = Element(text="Child", category="child-cat", id="child-id")
    child = TestElementHtml(child_elem)
    parent = TestElementHtml(parent_elem, children=[child])

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent_elem.text
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    # Call the function
    result = parent.public_get_children_html(soup, parent_html)

def test_multiple_children_basic():
    # Parent with two children
    parent_elem = Element(text="P", category="p-cat", id="p-id")
    child1_elem = Element(text="C1", category="c1-cat", id="c1-id")
    child2_elem = Element(text="C2", category="c2-cat", id="c2-id")
    child1 = TestElementHtml(child1_elem)
    child2 = TestElementHtml(child2_elem)
    parent = TestElementHtml(parent_elem, children=[child1, child2])

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent_elem.text
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    result = parent.public_get_children_html(soup, parent_html)

def test_no_children_returns_wrapper_with_only_parent():
    # Parent with no children, should still wrap parent_html in a div
    parent_elem = Element(text="Solo", category="solo-cat", id="solo-id")
    parent = TestElementHtml(parent_elem, children=[])

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent_elem.text
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    result = parent.public_get_children_html(soup, parent_html)

def test_children_are_nested():
    # Test with a deeper hierarchy: parent -> child -> grandchild
    grandchild_elem = Element(text="GC", category="gc-cat", id="gc-id")
    grandchild = TestElementHtml(grandchild_elem)
    child_elem = Element(text="C", category="c-cat", id="c-id")
    child = TestElementHtml(child_elem, children=[grandchild])
    parent_elem = Element(text="P", category="p-cat", id="p-id")
    parent = TestElementHtml(parent_elem, children=[child])

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent_elem.text
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    result = parent.public_get_children_html(soup, parent_html)
    child_div = result.contents[1]
    grandchild_div = child_div.contents[1]

# ---- EDGE TEST CASES ----

def test_empty_text_and_attributes():
    # Parent and child with empty text and missing attributes
    parent_elem = Element(text="", category="", id="")
    child_elem = Element(text="", category="", id="")
    child = TestElementHtml(child_elem)
    parent = TestElementHtml(parent_elem, children=[child])

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent_elem.text
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    result = parent.public_get_children_html(soup, parent_html)

def test_child_with_html_content():
    # Child with HTML in text_as_html, should parse as HTML element
    child_elem = Element(text="ignored", category="cat", id="cid",
                         metadata=Metadata(text_as_html="<span>HTMLChild</span>"))
    child = TestElementHtml(child_elem)
    parent_elem = Element(text="Parent", category="pcat", id="pid")
    parent = TestElementHtml(parent_elem, children=[child])

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent_elem.text
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    result = parent.public_get_children_html(soup, parent_html)
    child_html = result.contents[1]

def test_parent_with_html_content_and_children():
    # Parent with HTML in text_as_html, children as normal
    parent_elem = Element(text="ignored", category="pcat", id="pid",
                          metadata=Metadata(text_as_html="<h1>Header</h1>"))
    child_elem = Element(text="Child", category="ccat", id="cid")
    child = TestElementHtml(child_elem)
    parent = TestElementHtml(parent_elem, children=[child])

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = parent.get_text_as_html()
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    result = parent.public_get_children_html(soup, parent_html)

def test_children_with_duplicate_ids():
    # Children with the same id, should not raise errors, but both ids should be present
    child_elem1 = Element(text="A", category="cat", id="dup")
    child_elem2 = Element(text="B", category="cat", id="dup")
    child1 = TestElementHtml(child_elem1)
    child2 = TestElementHtml(child_elem2)
    parent_elem = Element(text="P", category="pcat", id="pid")
    parent = TestElementHtml(parent_elem, children=[child1, child2])

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent_elem.text
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    result = parent.public_get_children_html(soup, parent_html)

def test_children_with_various_html_tags():
    # Children with different html_tag settings
    class CustomElementHtml(TestElementHtml):
        _html_tag = "section"

    child_elem = Element(text="Sec", category="cat", id="cid")
    child = CustomElementHtml(child_elem)
    parent_elem = Element(text="P", category="pcat", id="pid")
    parent = TestElementHtml(parent_elem, children=[child])

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent_elem.text
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    result = parent.public_get_children_html(soup, parent_html)

def test_html_tag_property_override():
    # Test that html_tag property is respected
    class CustomElementHtml(TestElementHtml):
        @property
        def html_tag(self):
            return "article"

    child_elem = Element(text="Art", category="cat", id="cid")
    child = CustomElementHtml(child_elem)
    parent_elem = Element(text="P", category="pcat", id="pid")
    parent = TestElementHtml(parent_elem, children=[child])

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent_elem.text
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    result = parent.public_get_children_html(soup, parent_html)

def test_inject_html_element_attrs_is_called():
    # Test that _inject_html_element_attrs is called (by side effect)
    class AttrElementHtml(TestElementHtml):
        def _inject_html_element_attrs(self, element_html: Tag) -> None:
            element_html["data-test"] = "called"

    child_elem = Element(text="Child", category="cat", id="cid")
    child = AttrElementHtml(child_elem)
    parent_elem = Element(text="P", category="pcat", id="pid")
    parent = AttrElementHtml(parent_elem, children=[child])

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent_elem.text
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    result = parent.public_get_children_html(soup, parent_html)

# ---- LARGE SCALE TEST CASES ----

def test_large_number_of_children():
    # Test with 500 children
    num_children = 500
    children = [TestElementHtml(Element(text=f"Child{i}", category="cat", id=f"id{i}")) for i in range(num_children)]
    parent_elem = Element(text="Parent", category="pcat", id="pid")
    parent = TestElementHtml(parent_elem, children=children)

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent_elem.text
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    result = parent.public_get_children_html(soup, parent_html)

def test_large_depth_of_nesting():
    # Test with 100 nested single-child levels
    depth = 100
    current = TestElementHtml(Element(text=f"Level{depth}", category="cat", id=f"id{depth}"))
    for i in range(depth-1, 0, -1):
        current = TestElementHtml(Element(text=f"Level{i}", category="cat", id=f"id{i}"), children=[current])
    parent = current

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent.element.text
    parent_html["class"] = parent.element.category
    parent_html["id"] = parent.element.id

    result = parent.public_get_children_html(soup, parent_html)
    # Traverse down the nesting, checking text at each level
    node = result
    for i in range(1, depth+1):
        if len(node.contents) > 1:
            node = node.contents[1]
        else:
            break

def test_large_tree_with_breadth_and_depth():
    # 10 children, each with 10 children (total 1 + 10 + 100 = 111 nodes)
    children = []
    for i in range(10):
        grandchildren = [TestElementHtml(Element(text=f"GC{i}-{j}", category="gcat", id=f"gid{i}-{j}")) for j in range(10)]
        child = TestElementHtml(Element(text=f"C{i}", category="ccat", id=f"cid{i}"), children=grandchildren)
        children.append(child)
    parent_elem = Element(text="P", category="pcat", id="pid")
    parent = TestElementHtml(parent_elem, children=children)

    soup = BeautifulSoup("", HTML_PARSER)
    parent_html = soup.new_tag("div")
    parent_html.string = parent_elem.text
    parent_html["class"] = parent_elem.category
    parent_html["id"] = parent_elem.id

    result = parent.public_get_children_html(soup, parent_html)
    for i, child_div in enumerate(result.contents[1:]):
        for j, gc_div in enumerate(child_div.contents[1:]):
            pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-ElementHtml._get_children_html-mcsd67co and push.

Here is a **faster rewrite** of your program, based on your line profiling results, the imported code constraints, and the code logic. ### Key optimizations. - **Avoid repeated parsing:** The hotspot is in recursive calls to `child.get_html_element(**kwargs)`, each of which is re-creating a new `BeautifulSoup` object in every call. Solution: **Pass down and reuse a single `BeautifulSoup` instance** when building child HTML elements. - **Minimize object creation:** Create `soup` once at the *topmost* call and reuse for all children and subchildren. - **Reduce .get_text_as_html use:** Optimize to only use the soup instance when really necessary and avoid repeated blank parses. - **Avoid double wrapping:** Only allocate wrappers and new tags if absolutely required. - **General micro-optimizations:** Use `None` instead of `or []`, fast-path checks on empty children, etc. - **Preserve all comments and signatures as specified.** Below is the optimized version. ### Explanation of improvements - **Soup passing**: The `get_html_element` method now optionally receives a `_soup` kwarg. At the top of the tree, it is `None`, so a new one is created. Then, for all descendants, the same `soup` instance is passed via `_soup`, avoiding repeated parsing and allocation. - **Children check**: `self.children` is checked once, and the attribute itself is kept as a list (not or-ed with empty list at every call). - **No unnecessary soup parsing**: `get_text_as_html()` doesn't need a soup argument, since it only returns a Tag (from the parent module). - **No changes to existing comments, new comments added only where logic was changed.** - **Behavior (output and signature) preserved.** This **avoids creating thousands of BeautifulSoup objects recursively**, which was the primary bottleneck found in the profiler. The result is vastly improved performance, especially for large/complex trees.

…_html-mcsd67co

codeflash-ai bot and others added 5 commits July 7, 2025 00:29

cleanup

9a70c97

revert formatting changes

e484742

cleaning up comments

479fef2

Merge branch 'main' into codeflash/optimize-ElementHtml._get_children…

54fb0da

…_html-mcsd67co

cragwolfe requested a review from mpolomdeepsense August 26, 2025 23:58

Merge branch 'main' into codeflash/optimize-ElementHtml._get_children…

5048adc

…_html-mcsd67co

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up method `ElementHtml._get_children_html` by 234% #4087

⚡️ Speed up method `ElementHtml._get_children_html` by 234% #4087

Uh oh!

aseembits93 commented Aug 26, 2025

Uh oh!

Uh oh!

⚡️ Speed up method ElementHtml._get_children_html by 234% #4087

Are you sure you want to change the base?

⚡️ Speed up method ElementHtml._get_children_html by 234% #4087

Uh oh!

Conversation

aseembits93 commented Aug 26, 2025

📄 234% (2.34x) speedup for ElementHtml._get_children_html in unstructured/partition/html/convert.py

📝 Explanation and details

Key optimizations.

Explanation of improvements

Uh oh!

Uh oh!

⚡️ Speed up method `ElementHtml._get_children_html` by 234% #4087

⚡️ Speed up method `ElementHtml._get_children_html` by 234% #4087

📄 234% (2.34x) speedup for `ElementHtml._get_children_html` in `unstructured/partition/html/convert.py`