Commit 13d95a9
committed
perf(TextAnalytics): allocation-free tokenize + single-pass termFreq
tokenize(): replaced replace().split().filter() chain (3 intermediate array
allocations per call) with a regex exec loop that emits tokens directly.
This is the hot path for SessionLinker (tokenizes every saved session),
DriftDetector, SmartTitle, and WordCloud — the allocation savings are
significant when processing dozens of sessions with thousands of messages.
termFreq(): merged the counting pass and max-finding pass into a single
loop — eliminates one full iteration over all TF map keys. For large
vocabularies (hundreds of unique terms per session) this halves the
object-key iteration overhead.1 parent 82f15f4 commit 13d95a9
1 file changed
Lines changed: 32 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33112 | 33112 | | |
33113 | 33113 | | |
33114 | 33114 | | |
33115 | | - | |
| 33115 | + | |
| 33116 | + | |
| 33117 | + | |
| 33118 | + | |
| 33119 | + | |
| 33120 | + | |
| 33121 | + | |
| 33122 | + | |
| 33123 | + | |
| 33124 | + | |
33116 | 33125 | | |
33117 | 33126 | | |
33118 | | - | |
33119 | | - | |
33120 | | - | |
| 33127 | + | |
| 33128 | + | |
| 33129 | + | |
| 33130 | + | |
| 33131 | + | |
| 33132 | + | |
| 33133 | + | |
| 33134 | + | |
33121 | 33135 | | |
33122 | 33136 | | |
33123 | | - | |
| 33137 | + | |
| 33138 | + | |
| 33139 | + | |
| 33140 | + | |
| 33141 | + | |
| 33142 | + | |
| 33143 | + | |
| 33144 | + | |
33124 | 33145 | | |
33125 | 33146 | | |
33126 | | - | |
33127 | 33147 | | |
33128 | | - | |
33129 | | - | |
| 33148 | + | |
| 33149 | + | |
| 33150 | + | |
| 33151 | + | |
| 33152 | + | |
| 33153 | + | |
33130 | 33154 | | |
33131 | 33155 | | |
33132 | 33156 | | |
| |||
0 commit comments