You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
it('should preserve ordered list numbering with long items that get split',()=>{
465
+
constsplitter=chunkdown({
466
+
chunkSize: 200,
467
+
maxOverflowRatio: 1.5,
468
+
});
469
+
consttext=`1. **First item with very long content.** This item contains substantial text that will exceed the chunk size limit and force the splitter to break it into multiple chunks, which can cause numbering issues if not handled correctly.
470
+
471
+
2. **Second item with moderate content.** This item has enough content to potentially cause issues but should fit in a single chunk.
472
+
473
+
3. **Third item with short content.**
474
+
475
+
4. **Fourth item with extremely long content that will definitely be split.** This is a very detailed item that contains multiple sentences with comprehensive explanations and examples. It includes technical details, step-by-step instructions, and various formatting elements that make it substantially longer than the configured chunk size, ensuring it will be split across multiple chunks during processing.
476
+
477
+
5. **Fifth item with another very long section.** Similar to item 4, this contains extensive content that will cause the text splitter to break it into multiple chunks, testing whether the ordered list numbering is preserved correctly across these splits.
478
+
479
+
6. **Sixth item with normal content.**
480
+
481
+
7. **Seventh item with more long content.** This item also has substantial text that will likely exceed the chunk size and test the numbering preservation functionality in various scenarios.
482
+
483
+
8. **Eighth item is short.**
484
+
485
+
9. **Ninth and final item.**`;
486
+
487
+
constchunks=splitter.splitText(text);
488
+
489
+
// Extract all list item numbers from all chunks (not just those that start chunks)
it('should not merge if combined size exceeds maxAllowedSize',()=>{
655
+
constsplitter=chunkdown({
656
+
chunkSize: 200,
657
+
maxOverflowRatio: 1.2,// Only 240 chars allowed
658
+
});
659
+
660
+
consttext=`## Main Section
661
+
662
+
This is a longer main section with substantial introductory content that explains what this section is about in great detail with many words and explanations.
663
+
664
+
### Child Section 1
665
+
666
+
This child section also has substantial content that would make the combined size exceed the maximum allowed size when merged with the parent section.
it('should merge sibling sections when parent is too large to merge',()=>{
688
+
constsplitter=chunkdown({
689
+
chunkSize: 300,
690
+
maxOverflowRatio: 1.5,// 450 chars allowed
691
+
});
692
+
693
+
consttext=`# Large Parent Section
694
+
695
+
This is a large parent section with substantial content that takes up significant space. It contains multiple sentences with detailed explanations and examples. This content is designed to be large enough that it cannot merge with its child sections due to size constraints. The parent section alone should be close to or exceed the base chunk size to prevent parent-child merging but allow sibling merging of the children.
696
+
697
+
## First Child Section
698
+
699
+
Short content for first child.
700
+
701
+
## Second Child Section
702
+
703
+
Short content for second child.
704
+
705
+
## Third Child Section
706
+
707
+
Short content for third child.`;
708
+
709
+
constchunks=splitter.splitText(text);
710
+
711
+
// Should create 2 chunks: large parent separate, siblings merged
712
+
expect(chunks.length).toBe(2);
713
+
714
+
// First chunk should be the large parent alone
715
+
expect(chunks[0]).toContain('# Large Parent Section');
716
+
expect(chunks[0]).not.toContain('## First Child Section');
717
+
718
+
// Second chunk should contain merged siblings
719
+
expect(chunks[1]).toContain('## First Child Section');
720
+
expect(chunks[1]).toContain('## Second Child Section');
721
+
expect(chunks[1]).toContain('## Third Child Section');
it('should merge some siblings but not others based on size constraints',()=>{
730
+
constsplitter=chunkdown({
731
+
chunkSize: 150,
732
+
maxOverflowRatio: 1.3,// 195 chars allowed
733
+
});
734
+
735
+
// Create scenario where parent can't merge with children,
736
+
// and siblings have mixed sizes preventing complete merging
737
+
consttext=`# Parent Section
738
+
739
+
This is a parent section with substantial content that is designed to be large enough to prevent merging with any child sections. The parent section contains multiple detailed sentences with comprehensive explanations and examples that ensure its size exceeds the merge threshold when combined with any child section.
740
+
741
+
## Small Sibling A
742
+
743
+
Short content A.
744
+
745
+
## Small Sibling B
746
+
747
+
Short content B.
748
+
749
+
## Large Sibling Section
750
+
751
+
This is a much larger sibling section with substantial content that contains multiple sentences and detailed explanations that make it too large to merge with the small siblings.
752
+
753
+
## Small Sibling C
754
+
755
+
Short content C.`;
756
+
757
+
constchunks=splitter.splitText(text);
758
+
759
+
// Parent gets split due to size, siblings show selective merging behavior
760
+
// Small siblings A+B merge together, large sibling separate, small sibling C separate
761
+
expect(chunks.length).toBe(7);
762
+
763
+
// Key behavior to test: Small siblings A+B merged, but sibling C separate
764
+
// Find the chunk containing small siblings A and B (merged together)
765
+
constsiblingABChunk=chunks.find(
766
+
(chunk)=>
767
+
chunk.includes('## Small Sibling A')&&
768
+
chunk.includes('## Small Sibling B'),
769
+
);
770
+
expect(siblingABChunk).toBeDefined();
771
+
expect(siblingABChunk).not.toContain('## Large Sibling Section');
772
+
expect(siblingABChunk).not.toContain('## Small Sibling C');
773
+
774
+
// Large sibling should be in separate chunk(s)
775
+
constlargeSiblingChunks=chunks.filter((chunk)=>
776
+
chunk.includes('## Large Sibling Section'),
777
+
);
778
+
expect(largeSiblingChunks.length).toBeDefined();
779
+
expect(largeSiblingChunks).not.toContain('## Small Sibling A');
780
+
expect(largeSiblingChunks).not.toContain('## Small Sibling B');
781
+
expect(largeSiblingChunks).not.toContain('## Small Sibling C');
782
+
783
+
// Small sibling C should be alone
784
+
constsiblingCChunk=chunks.find((chunk)=>
785
+
chunk.includes('## Small Sibling C'),
786
+
);
787
+
expect(siblingCChunk).toBeDefined();
788
+
expect(siblingCChunk).not.toContain('## Small Sibling A');
789
+
expect(siblingCChunk).not.toContain('## Small Sibling B');
790
+
expect(siblingCChunk).not.toContain('## Large Sibling Section');
0 commit comments