We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi Team, 我发现ik tokenizer对html_filter处理过的字符串输出offsets有误。html_filter的 base class BaseCharFilter里包含了offsets和diffs两个数组,分别是stripped以后� ��tokens的offsets和相对于源string需要修正的delta。ik(我用的ik20 12 FF hotfix1,google code)的代码,没有对这个offsets和diffs处理。导致输出的offset� ��处理后的无html tag的string上的offset。我在我的github上做了修改,大致测了一�� �貌似可以了。主要修改在这个github的pull request上了。 https://github.com/xpandan/ik-analyzer/commit/7cc797ca78399cdae4f31181970e85db28 be4e5d html_strip本身也不少bug,你也可以用mapping filter来测,原理一样的。有空帮我review下code吧。我是为了项� ��临时来研究lucene的,请多多指教。 Best, Dan
Original issue reported on code.google.com by [email protected] on 12 Sep 2014 at 10:34
[email protected]
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Original issue reported on code.google.com by
[email protected]
on 12 Sep 2014 at 10:34The text was updated successfully, but these errors were encountered: