Skip to content

Conversation

@sanikolaev
Copy link
Collaborator

Related issue #4128

Changes in this PR prevent boolean optimizations from collapsing field-scoped phrases into an all-fields phrase. The fix keeps optimizations within the same field/zone spec and preserves the spec when new phrase nodes are created.

Changes by area

1) Add field/zone spec hashing helper

File: src/sphinxquery/transform_commonkeywords.cpp

What changed

  • Added a small helper HashSpec(const XQLimitSpec_t & tSpec) that hashes:
    • m_dFieldMask
    • m_iFieldMaxPos
    • m_bZoneSpan
    • m_dZones (if any)

Why

  • The common keyword/phrase transforms group and compare nodes using hashes of phrase content. Without including the field/zone spec, phrases scoped to different fields could be treated as equivalent, which led to cross-field collapsing.

Effect

  • Phrases with different field/zone constraints no longer collide in hash-based grouping.

2) Seed phrase and subphrase hashes with the spec

File: src/sphinxquery/transform_commonkeywords.cpp

What changed

  • sphHashPhrase() now starts its hash from HashSpec(pNode->m_dSpec) rather than 0.
  • sphHashSubphrases() now seeds each subphrase hash with HashSpec(pNode->m_dSpec).

Why

  • Ensures phrase equality checks are spec-aware, preventing different field scopes from being optimized together.

Effect

  • Boolean optimization for common keywords only applies when the field/zone spec matches.

3) Include spec in common-phrase head/tail bigram hashes

File: src/sphinxquery/transform_commonkeywords.cpp

What changed

  • The head/tail bigram hashes for common-phrase detection are now seeded with HashSpec(...) before hashing words.

Why

  • Common-phrase detection uses these bigram hashes to find shared heads/tails across phrases. Without spec in the hash, phrases from different fields could be grouped together.

Effect

  • Common-phrase factoring only happens among phrases with the same field/zone constraints.

4) Preserve field/zone spec when building new phrase nodes

File: src/sphinxquery/transform_commonkeywords.cpp

What changed

  • In MakeTransformCommonPhrase(), new nodes now use the original node’s XQLimitSpec_t instead of XQLimitSpec_t():
    • pCommonPhrase
    • pNewOr
    • pNewPhrase

Why

  • The previous default constructor yields “all fields,” which erased the original field scope during optimization and widened matches.

Effect

  • Optimized trees preserve the original field/zone scope, fixing the incorrect broadening.

Behavior impact

  • Queries like (@f "a b") | (@f2 "a b") will no longer match rows where the phrase appears only in unrelated fields.
  • Optimizations remain intact when phrases share the same field/zone spec.

Files touched

  • src/sphinxquery/transform_commonkeywords.cpp

@sanikolaev sanikolaev requested a review from klirichek January 15, 2026 08:29
- hash field/zone spec in phrase/subphrase grouping
- seed common-phrase bigram hashes with spec
- preserve original spec when creating new phrase nodes

Related issue #4128
That is actually out-of-proof test, covering already tested code and
touching only one case from suggested changes. Let's do it another way.
Internal ReconstructNode wasn't show anything about field limits and
zones/zonespans; however this info is critically useful in tests
Actual test cases now in google test suite
(easier to debug/change), and in ubertests (test 397) - kind of easier
to demonstrate what should happen

ref: #4128
Copy link
Contributor

@klirichek klirichek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note:
'explain query' by default (by design?) doesn't apply boolean optimization. So, results of 'explain' and 'select + show plan' are different. See output of test 397 (added cases).

@sanikolaev sanikolaev merged commit 27a827b into master Jan 28, 2026
115 checks passed
@sanikolaev sanikolaev deleted the issue-4128-or-2-fields branch January 28, 2026 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants