Merge pull request #5832 from Sch-Da/main

Update tutorial.md
galaxyproject · Mar 11, 2025 · 6054cbe · 6054cbe
2 parents 2a016a1 + cf00861
commit 6054cbe
Showing 1 changed file with 7 additions and 3 deletions.
diff --git a/topics/statistics/tutorials/text_mining_chinese/tutorial.md b/topics/statistics/tutorials/text_mining_chinese/tutorial.md
@@ -109,11 +109,12 @@ We will use Regular Expressions in a tool called "Replace text". It contains fou
 
 > <hands-on-title> Cleaning the Text with Regular Expressions </hands-on-title>
 >
-> 1. {% tool [Replace Text](toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.3+galaxy1) %} with the following parameters:
+> 1. {% tool [Replace Text](toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.5+galaxy0) %} with the following parameters:
 >    - {% icon param-file %} *"File to process"*: `output` (Input dataset)
 >    - In *"Replacement"*:
 >        - {% icon param-repeat %} *"Insert Replacement"*
 >            - *"Find pattern"*: `\r`
+>            - *"Additional sed commands before replacement"*: `:a;N;$!ba;`
 >        - {% icon param-repeat %} *"Insert Replacement"*
 >            - *"Find pattern"*: `\n`
 >        - {% icon param-repeat %} *"Insert Replacement"*
@@ -126,7 +127,10 @@ We will use Regular Expressions in a tool called "Replace text". It contains fou
 >    > Regular expressions can not only find particular words, as you might be familiar with from regular text editors.
 >    > It is more powerful and can find particular patterns, for example, only capitalised words or all numbers.
 >    > In this step, we mostly delete unnecessary placeholders.
->    > The first pattern we want to find is `\r`. It catches a specific form of invisible linebreaks that would create unwanted gaps in the comparison later. We delete those by leaving the optional "Replace with" field blank.
+>    > The first pattern we want to find is `\r`. It catches a specific form of invisible linebreaks that would create unwanted gaps in the comparison later.
+>    > We delete those by leaving the optional "Replace with" field blank.
+>    > The additional sed commands before replacement `:a;N;$!ba;` catch all blank spaces with this tool.
+>    > It is necessary only once to ensure that particular end-of-line characters are removed consistently.
 >    > Similarly, `\n` marks linebreaks. We also delete those by leaving the optional "Replace with" field blank.
 >    > The next expression we search for is `\s`. Those are spaces as you see them between words on your computer. We delete those.
 >    > As a result, there are no gaps in our text anymore.
@@ -364,7 +368,7 @@ The last step is to visualise the results within a word cloud. It shows, which c
 
 > <hands-on-title> Task description </hands-on-title>
 >
-> 1. {% tool [Generate a word cloud](toolshed.g2.bx.psu.edu/repos/bgruening/wordcloud/wordcloud/1.9.4+galaxy0) %} with the following parameters:
+> 1. {% tool [Generate a word cloud](toolshed.g2.bx.psu.edu/repos/bgruening/wordcloud/wordcloud/1.9.4+galaxy1) %} with the following parameters:
 >    - {% icon param-file %} *"Input file"*: `out_file1` (output of **Cut** {% icon tool %})
 >    - *"Do you want to select a special font?": `Select from a list of fonts`: `Noto Sans Traditional Chinese`
 >    - *"Smallest font size to use"*: `8`