Skip to content

Commit

Permalink
Merge pull request #5832 from Sch-Da/main
Browse files Browse the repository at this point in the history
Update tutorial.md
  • Loading branch information
shiltemann authored Mar 11, 2025
2 parents 2a016a1 + cf00861 commit 6054cbe
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions topics/statistics/tutorials/text_mining_chinese/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,11 +109,12 @@ We will use Regular Expressions in a tool called "Replace text". It contains fou
> <hands-on-title> Cleaning the Text with Regular Expressions </hands-on-title>
>
> 1. {% tool [Replace Text](toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.3+galaxy1) %} with the following parameters:
> 1. {% tool [Replace Text](toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.5+galaxy0) %} with the following parameters:
> - {% icon param-file %} *"File to process"*: `output` (Input dataset)
> - In *"Replacement"*:
> - {% icon param-repeat %} *"Insert Replacement"*
> - *"Find pattern"*: `\r`
> - *"Additional sed commands before replacement"*: `:a;N;$!ba;`
> - {% icon param-repeat %} *"Insert Replacement"*
> - *"Find pattern"*: `\n`
> - {% icon param-repeat %} *"Insert Replacement"*
Expand All @@ -126,7 +127,10 @@ We will use Regular Expressions in a tool called "Replace text". It contains fou
> > Regular expressions can not only find particular words, as you might be familiar with from regular text editors.
> > It is more powerful and can find particular patterns, for example, only capitalised words or all numbers.
> > In this step, we mostly delete unnecessary placeholders.
> > The first pattern we want to find is `\r`. It catches a specific form of invisible linebreaks that would create unwanted gaps in the comparison later. We delete those by leaving the optional "Replace with" field blank.
> > The first pattern we want to find is `\r`. It catches a specific form of invisible linebreaks that would create unwanted gaps in the comparison later.
> > We delete those by leaving the optional "Replace with" field blank.
> > The additional sed commands before replacement `:a;N;$!ba;` catch all blank spaces with this tool.
> > It is necessary only once to ensure that particular end-of-line characters are removed consistently.
> > Similarly, `\n` marks linebreaks. We also delete those by leaving the optional "Replace with" field blank.
> > The next expression we search for is `\s`. Those are spaces as you see them between words on your computer. We delete those.
> > As a result, there are no gaps in our text anymore.
Expand Down Expand Up @@ -364,7 +368,7 @@ The last step is to visualise the results within a word cloud. It shows, which c
> <hands-on-title> Task description </hands-on-title>
>
> 1. {% tool [Generate a word cloud](toolshed.g2.bx.psu.edu/repos/bgruening/wordcloud/wordcloud/1.9.4+galaxy0) %} with the following parameters:
> 1. {% tool [Generate a word cloud](toolshed.g2.bx.psu.edu/repos/bgruening/wordcloud/wordcloud/1.9.4+galaxy1) %} with the following parameters:
> - {% icon param-file %} *"Input file"*: `out_file1` (output of **Cut** {% icon tool %})
> - *"Do you want to select a special font?": `Select from a list of fonts`: `Noto Sans Traditional Chinese`
> - *"Smallest font size to use"*: `8`
Expand Down

0 comments on commit 6054cbe

Please sign in to comment.