|
38 | 38 | Array.from(document.querySelectorAll('#sidebar a')).forEach(function(link) {
|
39 | 39 | link.setAttribute('tabIndex', sidebar === 'visible' ? 0 : -1);
|
40 | 40 | });
|
41 |
| - </script><div id=content class=content><main><div class=sidetoc><nav class=pagetoc></nav></div><h1 id=split><a class=header href=#split>split</a></h1><p>The <code>split</code> command is useful to divide the input into smaller parts based on number of lines, bytes, file size, etc. You can also execute another command on the divided parts before saving the results. An example use case is sending a large file as multiple parts to workaround online transfer size limits.<blockquote><p><img src=./images/info.svg alt=info> Since a lot of output files will be generated in this chapter (often with same filenames), remove these files after every illustration.</blockquote><h2 id=default-split><a class=header href=#default-split>Default split</a></h2><p>By default, the <code>split</code> command divides the input <code>1000</code> lines at a time. Newline character is the default line separator. You can pass a single file or <code>stdin</code> data as the input. Use <code>cat</code> if you need to concatenate multiple input sources.<p>By default, the output files will be named <code>xaa</code>, <code>xab</code>, <code>xac</code> and so on (where <code>x</code> is the prefix). If the filenames are exhausted, two more letters will be appended and the pattern will continue as needed. If the number of input lines is not evenly divisible, the last file will contain less than <code>1000</code> lines.<pre><code class=language-bash># divide input 1000 lines at a time |
| 41 | + </script><div id=content class=content><main><div class=sidetoc><nav class=pagetoc></nav></div><h1 id=split><a class=header href=#split>split</a></h1><p>The <code>split</code> command is useful to divide the input into smaller parts based on number of lines, bytes, file size, etc. You can also execute another command on the divided parts before saving the results. An example use case is sending a large file as multiple parts as a workaround for online transfer size limits.<blockquote><p><img src=./images/info.svg alt=info> Since a lot of output files will be generated in this chapter (often with same filenames), remove these files after every illustration.</blockquote><h2 id=default-split><a class=header href=#default-split>Default split</a></h2><p>By default, the <code>split</code> command divides the input <code>1000</code> lines at a time. Newline character is the default line separator. You can pass a single file or <code>stdin</code> data as the input. Use <code>cat</code> if you need to concatenate multiple input sources.<p>By default, the output files will be named <code>xaa</code>, <code>xab</code>, <code>xac</code> and so on (where <code>x</code> is the prefix). If the filenames are exhausted, two more letters will be appended and the pattern will continue as needed. If the number of input lines is not evenly divisible, the last file will contain less than <code>1000</code> lines.<pre><code class=language-bash># divide input 1000 lines at a time |
42 | 42 | $ seq 10000 | split
|
43 | 43 |
|
44 | 44 | # output filenames
|
|
155 | 155 | 8 10 57 total
|
156 | 156 | </code></pre><blockquote><p><img src=./images/warning.svg alt=warning> Since the division is based on file size, <code>stdin</code> data cannot be used.</blockquote><pre><code class=language-bash>$ seq 6 | split -n2
|
157 | 157 | split: -: cannot determine file size
|
158 |
| -</code></pre><p>By using <code>K/N</code> as the argument, you can view the <code>K</code>th chunk of <code>N</code> parts on <code>stdout</code>. No output file will be created in this scenario.<pre><code class=language-bash># divided the input into 2 parts |
| 158 | +</code></pre><p>By using <code>K/N</code> as the argument, you can view the <code>K</code>th chunk of <code>N</code> parts on <code>stdout</code>. No output file will be created in this scenario.<pre><code class=language-bash># divide the input into 2 parts |
159 | 159 | # view only the 1st chunk on stdout
|
160 | 160 | $ split -n1/2 greeting.txt
|
161 | 161 | Hi there
|
162 | 162 | Hav
|
163 | 163 | </code></pre><p>To avoid splitting a line, use <code>l/</code> as a prefix. Quoting from the <a href=https://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html>manual</a>:<blockquote><p>For <code>l</code> mode, chunks are approximately <code>input size / N</code>. The input is partitioned into <code>N</code> equal sized portions, with the last assigned any excess. If a line starts within a partition it is written completely to the corresponding file. Since lines or records are not split even if they overlap a partition, the files written can be larger or smaller than the partition size, and even empty if a line/record is so long as to completely overlap the partition.</blockquote><pre><code class=language-bash># divide input into 2 parts, don't split lines
|
164 |
| - |
165 | 164 | $ split -nl/2 purchases.txt
|
| 165 | + |
166 | 166 | $ head x*
|
167 | 167 | ==> xaa <==
|
168 | 168 | coffee
|
|
231 | 231 |
|
232 | 232 | $ seq 100 | split -l1 -a1
|
233 | 233 | split: output file suffixes exhausted
|
| 234 | +$ ls x* |
| 235 | +xa xc xe xg xi xk xm xo xq xs xu xw xy |
| 236 | +xb xd xf xh xj xl xn xp xr xt xv xx xz |
234 | 237 | </code></pre><p>You can use the <code>-d</code> option to use numeric suffixes, starting from <code>00</code> (length depends on the <code>-a</code> setting). You can use the long option <code>--numeric-suffixes</code> to specify a different starting number.<pre><code class=language-bash>$ seq 10 | split -l1 -d
|
235 | 238 | $ ls x*
|
236 | 239 | x00 x01 x02 x03 x04 x05 x06 x07 x08 x09
|
|
0 commit comments