Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]model/messages/api_types: alert_words mark up #1314

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Subhasish-Behera
Copy link
Contributor

@Subhasish-Behera Subhasish-Behera commented Mar 2, 2023

What does this PR do?

Partial fix for #588

Tested?

  • Manually
  • Existing tests (adapted, if necessary)
  • New tests added (for any new behavior)
  • Passed linting & tests (each commit)

Commit flow

first commit:Fetches alert words and registers for any alert_word event and updates the alerts words.
second commit: adds transform_content_alert_words which adds <span class=alert-word> around alert-words

Notes

-> This PR(specifically the second commit) is based on the alert word implementation of the Zulip web client which was later adapted by other clients.
link for the web implementation code: link
-> Other helpful documentations for contributors:
1. regex wrt python
2. re module
-> Whats Done?

  1. fetching and registering of alert_words
  2. finding the alert_word patterns and applying the

-> What needs to be done?

  1. Even though applying span logic is ready but the logic for applying markup for alert words inside of multiple elements is not ready. So basically the markup works for the alert-words without any other tags indie soup2markup. Before that we have to decide whether we want to apply different markup when multiple tags are present for a alert_word or we have to choose the markup from any one of them.

  2. Currently the markup applied for alert-words is same as code tag i.e msg_code as the markup for different zt_themes are yet to be discussed.

  3. Tests are not added as the non-test code is yet to reach a stable state.

@zulipbot zulipbot added the size: L [Automatic label added by zulipbot] label Mar 2, 2023
Copy link
Collaborator

@neiljp neiljp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Subhasish-Behera Thanks for looking at this other piece of the alert words puzzle 👍

I've focused on the first commit, since we need the alert words first in order to make progress, at least in the application itself.

It'd be good to split the first commit into two parts:

  • getting the initial data
  • updating the data

While forgetting to include the updating wouldn't be great (and we have that with other data right now), alert words aren't often updated, so you should be able to manually test a new first commit easily - and even the highlighting if you wanted. That first commit could be merged sooner on it's own once it looks good.

Please note my point about the alert words being 'private' to the model; while other parts of the code simply access inside the model, ideally we want to use a method for this (and migrate the others too). That would sit in the new first commit, and could form part of a test that the data from the server comes out via that function. For that, I'd suggest you check out the API link I gave, as well as the fixtures in tests/conftest.py.

The updating part does look very simple, and in a data sense it should be, based on the event data, but note that a change in the data should affect how messages are rendered. That is, not just ones that are constructed based on the message content and alert words at the time, but they need updating too. We could ignore that initially, but we shouldn't forget it :)

As a final quick point, if you've not, I'd suggest installing gitlint as per the README. It should complain about your long commit title ;)
That section should also give you pointers on the general format.
If you find you need to say too much, or include too many files, that can indicate the commit can be split to be simpler.

@neiljp neiljp added area: message rendering area: event handling How events from the server are responded to PR awaiting update PR has been reviewed & is awaiting update or response to reviewer feedback labels Mar 3, 2023
Copy link
Collaborator

@neiljp neiljp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Subhasish-Behera Let's still address the first commit first, but I just took a look at the second commit as it stands.

I can see how this mirrors web-app code, but we should test it ourselves too, to ensure that it works how we expect in this translated form.

If we can demonstrate this manually and with tests, this should be a good v1 implementation of this feature.

Beyond that, if we have more tests, and can compare what the web app highlights (and tests), we should be able to determine whether we can do the same using this current approach. This will affect if we need to think differently to cover more cases.

We may want to consider an alternative way of formatting this kind of text, and maybe search text too, if it crosses tag/style boundaries.

I expect the web app applies tags dynamically since it uses CSS to style the text, and I'm guessing the spans can overlap? That's complex with urwid - we process the HTML into urwid-styled Text instead.

If so, we might end up considering processing the content into text only, finding matches, then mapping those into adjacent pieces in the HTML or urwid-processed text.

However, let's focus on testing this v1 first :)
(and the v0, ie. the data handling, before that!)

Comment on lines +726 to +728
self.alerted_words,
"has_alert_word" in self.message["flags"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new parameters could be combined into one. If the message doesn't have alert words flagged, then the function doesn't need to know/process them.

@@ -796,14 +806,58 @@ def update_message_author_status(self) -> bool:

return author_is_present

def transform_content_alert_words(content: str, alerted_list: List[str]) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is translated from a server implementation, it would be useful to add a comment to note that, and in the commit text too.

That could refer to the file as it is now, but also a commit ref, so that if code changes later then it is easier to track what has changed in the original code.

Having this function in a commit on it's own first, with detailed tests to cover various cases, would be good before we hook it into the processing in a subsequent commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be great. will do it.
But it's tricky to translate the code changes of js to python directly in this case because of the difference in libraries. In js, the lodash library uses _.escapeRegExp() but in python we have to use re.compile for creating objects and then we call the built-in functions like object.search() etc

@Subhasish-Behera Subhasish-Behera force-pushed the alert8021 branch 7 times, most recently from d79dfa1 to 54c72d4 Compare March 21, 2023 19:35
@zulipbot zulipbot added size: XL [Automatic label added by zulipbot] and removed size: L [Automatic label added by zulipbot] labels Mar 21, 2023
@Subhasish-Behera
Copy link
Contributor Author

@zulipbot add "PR needs review"

@zulipbot zulipbot added the PR needs review PR requires feedback to proceed label Mar 25, 2023
@neiljp neiljp removed the PR awaiting update PR has been reviewed & is awaiting update or response to reviewer feedback label Mar 29, 2023
Copy link
Collaborator

@neiljp neiljp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Subhasish-Behera I focused on the first 3 commits, since I think that was the recent focus?

The current overall PR fails when I run it right now, at line 843 of messages.py, compiling a regex (multiple repeat at position 27).

Otherwise, improving your commit text would improve the speed of merging, and the first three seem close, if needing some adjustments.

):
model._handle_alert_words_event(event)

assert model._alert_words != initial_alerted_words
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is always true. But it is more context-dependent. Basically, currently, this test is designed to test the most minimal of changes in the active_conversation_info i.e change(start/stop) of a single user. The inital_alerted_words means just before that change and not at the start of the whole typing event series.
So it is true because of the values that are passed in mark.parametrize i.e it is an approach followed in the test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is true, then it is good to add a suffix to the test function name to indicate that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name test__handle_alert_words_event_minimal_change would do the job.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Do we get an event if the changes aren't made?

I'd suggest a double underscore from the function name to any 'meaning'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe by changes you mean no typing in the writebox(of any client).
That would be a event with op=stop. So yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test__handle_alert_words_event_minimal_change sounds bit subjective,so i will be changing it to test__handle_alert_words_event__single_user. Though it's natural as a single event can't inform about more than 1 user in this case.

@neiljp neiljp added PR awaiting update PR has been reviewed & is awaiting update or response to reviewer feedback and removed PR needs review PR requires feedback to proceed labels Apr 9, 2023
Comment on lines 340 to 342
def get_alert_words(self) -> List[str]:
return self._alert_words.copy()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't comment on this before, but this function placement in the file seems strange too - between two related functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix it asap

Copy link
Collaborator

@neiljp neiljp Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that strictly this could go into a separate commit with a test to eg. ensure that one can't modify the data externally, but this is fine.
(also to indicate it returns the expected data, as a minimal test)

Copy link
Member

@theViz343 theViz343 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Subhasish-Behera Thanks for working on this 👍! This seems to work well. There are a few cases I found during testing where it seems to be a bit unpredictable.

  • \something. Prefixing with a backslash seems to not get picked up by the regex if the alert word is \something. If the alert word is just something, it gets picked up when it shouldn't (according to the behavior on the webapp).
  • wqdnasj\ Suffixing with a backslash when the alert word is also wqdnasj\ seems to crash ZT with the following error - re.error: missing ), unterminated subpattern at position 23.

Apart from this, I've left some suggestions for minor code changes in the comments.


clean: str

def replace_callback(match: Match[Union[str, str, str]]) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the typing of match, Union[str, str, str] results in just Union[str] so you could shorten it. Also, a match object's type is generally Union[Match[str], None] for the case that there is no match, which could be in our case because not every alert word is guaranteed to be present.

Copy link
Contributor Author

@Subhasish-Behera Subhasish-Behera Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed upto Match[Union[str]] . I agree that the Match object can be None when the pattern is not found but we know from the flag that an alert word is present. Re the case that not every alert word is present, the code iterates over every alert word as configured by the User and then puts span if it identifies that particular word in the content(so in every time one alert word is used to create the pattern). so None doesn't seem to be a possible value in our case.

Also, if I add None in typing of match , the errors shown is

 error: Item "None" of "Optional[Match[str]]" has no attribute "group"  [union-attr]
          Item "None" of "Optional[Match[str]]" has no attribute "start" 

Comment on lines 829 to 830
before = match.group(1)
word = match.group(2)
after = match.group(3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: You could use match.group(1,2,3) here, which returns a tuple consisting of the three values.

Copy link
Contributor Author

@Subhasish-Behera Subhasish-Behera Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only use these 3 have is in here:

 if in_tag:
                return f"{before}{word}{after}"
 return f"{before}<span class='alert-word'>{word}</span>{after}"

I could have directly used match. group(n) but didn't. I think separating them into 3 variables gives better readability
By following this suggestion of match.group(1,2,3), I need it to be stored in some var and break it using the index to use it in the above code which will not clarify any further of what is being done.
So why group it when it can be accessed individually without creating any tuple

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I follow the above correctly, would it help if you used tuple unpacking?

one, two, three = function_returning_tuple_of three_values()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Multiple assignments in one line can do the same thing.

before,word,after = match.group(1,2,3)

Subhasish-Behera added 3 commits June 28, 2023 02:14
The alert words are stored in _alert_words and it's copy can
be accessed using get_alert_words.
…alert_words" added to event actions and corresponding handler is _handle_alert_words_event which updates _alert_words. Added AlertWordEvent in api_types.py Tests are added for _handle_alert_words_event.
stores alert_words using get_alert_words.Two new parameters
for tranform_content- alerted_list and alert_word_present.
if the second argument is true then a call is made to
transform_content_alert_words. There firstly all the alert_words
are converted into a string consisting of 3 parts clean,
befor_punctuation and after_puntuation.Secondly, these modified alerted words
are compiled to be searched in the content and added with the span
using replace_callback.
@neiljp
Copy link
Collaborator

neiljp commented Jul 4, 2023

@Subhasish-Behera Is this awaiting an update from you, or have you addressed the feedback?
Just checking based on my earlier feedback and that from Vishwesh :)
(If so, you can add a 👍 emoji or a comment to specific comments to indicate this)

@neiljp neiljp added the missing feature: user A missing feature for all users, present in another Zulip client label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: event handling How events from the server are responded to area: message rendering missing feature: user A missing feature for all users, present in another Zulip client PR awaiting update PR has been reviewed & is awaiting update or response to reviewer feedback size: XL [Automatic label added by zulipbot]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants