Skip to content

Commit d7e26a5

Browse files
authored
Update tutorial.md
1 parent 3b1762a commit d7e26a5

File tree

1 file changed

+5
-1
lines changed
  • topics/statistics/tutorials/text_mining_chinese

1 file changed

+5
-1
lines changed

topics/statistics/tutorials/text_mining_chinese/tutorial.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ We will use Regular Expressions in a tool called "Replace text". It contains fou
114114
> - In *"Replacement"*:
115115
> - {% icon param-repeat %} *"Insert Replacement"*
116116
> - *"Find pattern"*: `\r`
117+
> - *"Additional sed commands before replacement"*: `:a;N;$!ba;`
117118
> - {% icon param-repeat %} *"Insert Replacement"*
118119
> - *"Find pattern"*: `\n`
119120
> - {% icon param-repeat %} *"Insert Replacement"*
@@ -126,7 +127,10 @@ We will use Regular Expressions in a tool called "Replace text". It contains fou
126127
> > Regular expressions can not only find particular words, as you might be familiar with from regular text editors.
127128
> > It is more powerful and can find particular patterns, for example, only capitalised words or all numbers.
128129
> > In this step, we mostly delete unnecessary placeholders.
129-
> > The first pattern we want to find is `\r`. It catches a specific form of invisible linebreaks that would create unwanted gaps in the comparison later. We delete those by leaving the optional "Replace with" field blank.
130+
> > The first pattern we want to find is `\r`. It catches a specific form of invisible linebreaks that would create unwanted gaps in the comparison later.
131+
> > We delete those by leaving the optional "Replace with" field blank.
132+
> > The additional sed commands before replacement `:a;N;$!ba;` catch all blank spaces with this tool.
133+
> > It is necessary only once to ensure that particular end-of-line characters are removed consistently.
130134
> > Similarly, `\n` marks linebreaks. We also delete those by leaving the optional "Replace with" field blank.
131135
> > The next expression we search for is `\s`. Those are spaces as you see them between words on your computer. We delete those.
132136
> > As a result, there are no gaps in our text anymore.

0 commit comments

Comments
 (0)