The update the patterns as:
pattern = "(?P(\s*\w*)(" + string.joinfields(salutation_opening_statements, "|") + ")+(\s\w*)[.,\xe2:]+\s)"
the original that I am thinking is:
pattern = "(\s*\w*)(?P(" + string.joinfields(salutation_opening_statements, "|") + ")+(\s\w*)(\s*\w*)(\s*\w*)(\s*\w*)(\s*\w*)[.,\xe2:]+\s*)"
But it doesn't recognize the former texts, the first one is better.
\s: space
\s*: many space
\w:[a-zA-Z]
\w*: many [a-zA-Z]
The update the patterns as:
pattern = "(?P(\s*\w*)(" + string.joinfields(salutation_opening_statements, "|") + ")+(\s\w*)[.,\xe2:]+\s)"
the original that I am thinking is:
pattern = "(\s*\w*)(?P(" + string.joinfields(salutation_opening_statements, "|") + ")+(\s\w*)(\s*\w*)(\s*\w*)(\s*\w*)(\s*\w*)[.,\xe2:]+\s*)"
But it doesn't recognize the former texts, the first one is better.
\s: space
\s*: many space
\w:[a-zA-Z]
\w*: many [a-zA-Z]