Skip to content

Commit 9f6b088

Browse files
committed
Attentions -> Attention
1 parent 8a37c92 commit 9f6b088

16 files changed

+169
-29
lines changed

doc/docs/index.md

-1
This file was deleted.

doc/docs/index.md

+141
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# SeqKit - a cross-platform and ultrafast toolkit for FASTA/Q file manipulation
2+
3+
4+
- **Documents:** [http://bioinf.shenwei.me/seqkit](http://bioinf.shenwei.me/seqkit)
5+
([**Usage**](http://bioinf.shenwei.me/seqkit/usage/),
6+
[**FAQ**](http://bioinf.shenwei.me/seqkit/faq/),
7+
[**Tutorial**](http://bioinf.shenwei.me/seqkit/tutorial/),
8+
and
9+
[**Benchmark**](http://bioinf.shenwei.me/seqkit/benchmark/))
10+
- **Source code:** [https://github.com/shenwei356/seqkit](https://github.com/shenwei356/seqkit)
11+
[![GitHub stars](https://img.shields.io/github/stars/shenwei356/seqkit.svg?style=social&label=Star&?maxAge=2592000)](https://github.com/shenwei356/seqkit)
12+
[![license](https://img.shields.io/github/license/shenwei356/seqkit.svg?maxAge=2592000)](https://github.com/shenwei356/seqkit/blob/master/LICENSE)
13+
- **Latest version:** [![Latest Version](https://img.shields.io/github/release/shenwei356/seqkit.svg?style=flat?maxAge=86400)](https://github.com/shenwei356/seqkit/releases)
14+
[![Github Releases](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/total.svg?maxAge=3600)](http://bioinf.shenwei.me/seqkit/download/)
15+
[![Cross-platform](https://img.shields.io/badge/platform-any-ec2eb4.svg?style=flat)](http://bioinf.shenwei.me/seqkit/download/)
16+
[![Anaconda Cloud](https://anaconda.org/bioconda/seqkit/badges/version.svg)](https://anaconda.org/bioconda/seqkit)
17+
- **[Please cite](#citation):** [![doi](https://img.shields.io/badge/doi-10.1371%2Fjournal.pone.0163962-blue.svg?style=flat)](https://doi.org/10.1371/journal.pone.0163962)
18+
[![Citation Badge](https://api.juleskreuer.eu/citation-badge.php?doi=10.1371/journal.pone.0163962)](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=wHF3Lm8AAAAJ&citation_for_view=wHF3Lm8AAAAJ:zYLM7Y9cAGgC)
19+
- **Others**: [![check in Biotreasury](https://img.shields.io/badge/Biotreasury-collected-brightgreen)](https://biotreasury.rjmart.cn/#/tool?id=10081)
20+
21+
## Features
22+
23+
- **Easy to install** ([download](http://bioinf.shenwei.me/seqkit/download/))
24+
- Providing statically linked executable binaries for multiple platforms (Linux/Windows/macOS, amd64/arm64)
25+
- Light weight and out-of-the-box, no dependencies, no compilation, no configuration
26+
- `conda install -c bioconda seqkit`
27+
- **Easy to use**
28+
- Ultrafast (see [technical-details](http://bioinf.shenwei.me/seqkit/usage/#technical-details-and-guides-for-use) and [benchmark](http://bioinf.shenwei.me/seqkit/benchmark))
29+
- Seamlessly parsing both FASTA and FASTQ formats
30+
- Supporting (`gzip`/`xz`/`zstd`/`bzip2` compressed) STDIN/STDOUT and input/output file, easily integrated in pipe
31+
- Reproducible results (configurable rand seed in `sample` and `shuffle`)
32+
- Supporting custom sequence ID via regular expression
33+
- Supporting [Bash/Zsh autocompletion](http://bioinf.shenwei.me/seqkit/download/#shell-completion)
34+
- **Versatile commands** ([usages and examples](http://bioinf.shenwei.me/seqkit/usage/))
35+
- Practical functions supported by [38 subcommands](#subcommands)
36+
37+
38+
## Installation
39+
40+
Go to [Download Page](http://bioinf.shenwei.me/seqkit/download) for more download options and changelogs, or
41+
install via conda:
42+
43+
conda install -c bioconda seqkit
44+
45+
## Subcommands
46+
47+
|Category |Command |Function |Input |Strand-sensitivity|Multi-threads|
48+
|:----------------|:-------------------------------------------------------------------|:--------------------------------------------------------------------------------------------|:--------------|:-----------------|:------------|
49+
|Basic operation |[seq](https://bioinf.shenwei.me/seqkit/usage/#seq) |Transform sequences: extract ID/seq, filter by length/quality, remove gaps… |FASTA/Q | | |
50+
| |[stats](https://bioinf.shenwei.me/seqkit/usage/#stats) |Simple statistics: #seqs, min/max_len, N50, Q20%, Q30%… |FASTA/Q | ||
51+
| |[subseq](https://bioinf.shenwei.me/seqkit/usage/#subseq) |Get subsequences by region/gtf/bed, including flanking sequences |FASTA/Q |+ or/and - | |
52+
| |[sliding](https://bioinf.shenwei.me/seqkit/usage/#sliding) |Extract subsequences in sliding windows |FASTA/Q |+ only | |
53+
| |[faidx](https://bioinf.shenwei.me/seqkit/usage/#faidx) |Create the FASTA index file and extract subsequences (with more features than samtools faidx)|FASTA |+ or/and - | |
54+
| |[translate](https://bioinf.shenwei.me/seqkit/usage/#translate) |translate DNA/RNA to protein sequence |FASTA/Q |+ or/and - | |
55+
| |[watch ](https://bioinf.shenwei.me/seqkit/usage/#watch ) |Monitoring and online histograms of sequence features |FASTA/Q | | |
56+
| |[scat ](https://bioinf.shenwei.me/seqkit/usage/#scat ) |Real time concatenation and streaming of fastx files |FASTA/Q | ||
57+
|Format conversion|[fq2fa](https://bioinf.shenwei.me/seqkit/usage/#fq2fa) |Convert FASTQ to FASTA format |FASTQ | | |
58+
| |[fx2tab](https://bioinf.shenwei.me/seqkit/usage/#fx2tab) |Convert FASTA/Q to tabular format |FASTA/Q | | |
59+
| |[fa2fq](https://bioinf.shenwei.me/seqkit/usage/#fa2fq) |Retrieve corresponding FASTQ records by a FASTA file |FASTA/Q |+ only | |
60+
| |[tab2fx](https://bioinf.shenwei.me/seqkit/usage/#tab2fx) |Convert tabular format to FASTA/Q format |TSV | | |
61+
| |[convert](https://bioinf.shenwei.me/seqkit/usage/#convert) |Convert FASTQ quality encoding between Sanger, Solexa and Illumina |FASTA/Q | | |
62+
|Searching |[grep](https://bioinf.shenwei.me/seqkit/usage/#grep) |Search sequences by ID/name/sequence/sequence motifs, mismatch allowed |FASTA/Q |+ and - |partly, -m |
63+
| |[locate](https://bioinf.shenwei.me/seqkit/usage/#locate) |Locate subsequences/motifs, mismatch allowed |FASTA/Q |+ and - |partly, -m |
64+
| |[amplicon](https://bioinf.shenwei.me/seqkit/usage/#amplicon) |Extract amplicon (or specific region around it), mismatch allowed |FASTA/Q |+ and - |partly, -m |
65+
| |[fish](https://bioinf.shenwei.me/seqkit/usage/#fish) |Look for short sequences in larger sequences |FASTA/Q |+ and - | |
66+
|Set operation |[sample](https://bioinf.shenwei.me/seqkit/usage/#sample) |Sample sequences by number or proportion |FASTA/Q | | |
67+
| |[rmdup](https://bioinf.shenwei.me/seqkit/usage/#rmdup) |Remove duplicated sequences by ID/name/sequence |FASTA/Q |+ and - | |
68+
| |[common](https://bioinf.shenwei.me/seqkit/usage/#common) |Find common sequences of multiple files by id/name/sequence |FASTA/Q |+ and - | |
69+
| |[duplicate](https://bioinf.shenwei.me/seqkit/usage/#duplicate) |Duplicate sequences N times |FASTA/Q | | |
70+
| |[split](https://bioinf.shenwei.me/seqkit/usage/#split) |Split sequences into files by id/seq region/size/parts (mainly for FASTA) |FASTA preffered| | |
71+
| |[split2](https://bioinf.shenwei.me/seqkit/usage/#split2) |Split sequences into files by size/parts (FASTA, PE/SE FASTQ) |FASTA/Q | | |
72+
| |[head](https://bioinf.shenwei.me/seqkit/usage/#head) |Print first N FASTA/Q records |FASTA/Q | | |
73+
| |[head-genome](https://bioinf.shenwei.me/seqkit/usage/#head-genome) |Print sequences of the first genome with common prefixes in name |FASTA/Q | | |
74+
| |[range](https://bioinf.shenwei.me/seqkit/usage/#range) |Print FASTA/Q records in a range (start:end) |FASTA/Q | | |
75+
| |[pair](https://bioinf.shenwei.me/seqkit/usage/#pair) |Patch up paired-end reads from two fastq files |FASTA/Q | | |
76+
|Edit |[replace](https://bioinf.shenwei.me/seqkit/usage/#replace) |Replace name/sequence by regular expression |FASTA/Q |+ only | |
77+
| |[rename](https://bioinf.shenwei.me/seqkit/usage/#rename) |Rename duplicated IDs |FASTA/Q | | |
78+
| |[concat](https://bioinf.shenwei.me/seqkit/usage/#concat) |Concatenate sequences with same ID from multiple files |FASTA/Q |+ only | |
79+
| |[restart](https://bioinf.shenwei.me/seqkit/usage/#restart) |Reset start position for circular genome |FASTA/Q |+ only | |
80+
| |[mutate](https://bioinf.shenwei.me/seqkit/usage/#mutate) |Edit sequence (point mutation, insertion, deletion) |FASTA/Q |+ only | |
81+
| |[sana](https://bioinf.shenwei.me/seqkit/usage/#sana) |Sanitize broken single line FASTQ files |FASTQ | | |
82+
|Ordering |[sort](https://bioinf.shenwei.me/seqkit/usage/#sort) |Sort sequences by id/name/sequence/length |FASTA preffered| | |
83+
| |[shuffle](https://bioinf.shenwei.me/seqkit/usage/#shuffle) |Shuffle sequences |FASTA preffered| | |
84+
|BAM processing |[bam](https://bioinf.shenwei.me/seqkit/usage/#bam) |Monitoring and online histograms of BAM record features |BAM | | |
85+
|Miscellaneous |[sum](https://bioinf.shenwei.me/seqkit/usage/#sum) |Compute message digest for all sequences in FASTA/Q files |FASTA/Q | ||
86+
| |[merge-slides](https://bioinf.shenwei.me/seqkit/usage/#merge-slides)|Merge sliding windows generated from seqkit sliding |TSV | |
87+
88+
Notes:
89+
90+
- Strand-sensitivity:
91+
- `+ only`: only processing on the positive/forward strand.
92+
- `+ and -`: searching on both strands.
93+
- `+ or/and -`: depends on users' flags/options/arguments.
94+
- Multiple-threads: Using the default 4 threads is fast enough for most commands, some commands can benefit from extra threads.
95+
96+
## Citation
97+
98+
**W Shen**, S Le, Y Li\*, F Hu\*. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation.
99+
***PLOS ONE***. [doi:10.1371/journal.pone.0163962](https://doi.org/10.1371/journal.pone.0163962).
100+
<span class="__dimensions_badge_embed__" data-doi="10.1371/journal.pone.0163962" data-style="small_rectangle"></span>
101+
102+
## Contributors
103+
104+
- [Wei Shen](https://github.com/shenwei356)
105+
- [Botond Sipos](https://github.com/botond-sipos): `bam`, `scat`, `fish`, `sana`, `watch`.
106+
- [others](https://github.com/shenwei356/seqkit/graphs/contributors)
107+
108+
## Acknowledgements
109+
110+
We thank [Lei Zhang](https://github.com/jameslz) for testing SeqKit,
111+
and also thank [Jim Hester](https://github.com/jimhester/),
112+
author of [fasta_utilities](https://github.com/jimhester/fasta_utilities),
113+
for advice on early performance improvements of for FASTA parsing
114+
and [Brian Bushnell](https://twitter.com/BBToolsBio),
115+
author of [BBMaps](https://sourceforge.net/projects/bbmap/),
116+
for advice on naming SeqKit and adding accuracy evaluation in benchmarks.
117+
We also thank Nicholas C. Wu from the Scripps Research Institute,
118+
USA for commenting on the manuscript
119+
and [Guangchuang Yu](http://guangchuangyu.github.io/)
120+
from State Key Laboratory of Emerging Infectious Diseases,
121+
The University of Hong Kong, HK for advice on the manuscript.
122+
123+
We thank [Li Peng](https://github.com/penglbio) for reporting many bugs.
124+
125+
We appreciate [Klaus Post](https://github.com/klauspost) for his fantastic packages (
126+
[compress](https://github.com/klauspost/compress) and [pgzip](https://github.com/klauspost/pgzip)
127+
) which accelerate gzip file reading and writing.
128+
129+
## Contact
130+
131+
[Create an issue](https://github.com/shenwei356/seqkit/issues) to report bugs,
132+
propose new functions or ask for help.
133+
134+
## License
135+
136+
[MIT License](https://github.com/shenwei356/seqkit/blob/master/LICENSE)
137+
138+
## Starchart
139+
140+
<img src="https://starchart.cc/shenwei356/seqkit.svg" alt="Stargazers over time" style="max-width: 100%">
141+

doc/docs/usage.md

+14-14
Original file line numberDiff line numberDiff line change
@@ -468,7 +468,7 @@ Usage
468468
``` text
469469
get subsequences by region/gtf/bed, including flanking sequences.
470470
471-
Attentions:
471+
Attention:
472472
1. Use "seqkit grep" for extract subsets of sequences.
473473
"seqtk subseq seqs.fasta id.txt" equals to
474474
"seqkit grep -f id.txt seqs.fasta"
@@ -696,7 +696,7 @@ Columns:
696696
17. AvgQual average quality
697697
18. GC(%) percentage of GC content
698698
699-
Attentions:
699+
Attention:
700700
1. Sequence length metrics (sum_len, min_len, avg_len, max_len, Q1, Q2, Q3)
701701
count the number of gaps or spaces. You can remove them with "seqkit seq -g":
702702
seqkit seq -g input.fasta | seqkit stats
@@ -826,7 +826,7 @@ Usage
826826
```text
827827
compute message digest for all sequences in FASTA/Q files
828828
829-
Attentions:
829+
Attention:
830830
1. Sequence headers and qualities are skipped, only sequences matter.
831831
2. The order of sequences records does not matter.
832832
3. Circular complete genomes are supported with the flag -c/--circular.
@@ -955,7 +955,7 @@ This command is similar with "samtools faidx" but has some extra features:
955955
3. if you have large number of IDs, you can use:
956956
seqkit faidx seqs.fasta -l IDs.txt
957957
958-
Attentions:
958+
Attention:
959959
1. The flag -U/--update-faidx is recommended to ensure the .fai file matches the FASTA file.
960960
961961
The definition of region is 1-based and with some custom design.
@@ -1636,7 +1636,7 @@ Usage
16361636
``` text
16371637
search sequences by ID/name/sequence/sequence motifs, mismatch allowed
16381638
1639-
Attentions:
1639+
Attention:
16401640
16411641
0. By default, we match sequence ID with patterns, use "-n/--by-name"
16421642
for matching full name instead of just ID.
@@ -1824,7 +1824,7 @@ Usage
18241824
``` text
18251825
locate subsequences/motifs, mismatch allowed
18261826
1827-
Attentions:
1827+
Attention:
18281828
18291829
1. Motifs could be EITHER plain sequence containing "ACTGN" OR regular
18301830
expression like "A[TU]G(?:.{3})+?[TU](?:AG|AA|GA)" for ORFs.
@@ -2056,7 +2056,7 @@ Usage
20562056
``` text
20572057
extract amplicon (or specific region around it) via primer(s).
20582058
2059-
Attentions:
2059+
Attention:
20602060
1. Only one (the longest) matching location is returned for every primer pair.
20612061
2. Mismatch is allowed, but the mismatch location (5' or 3') is not controled.
20622062
You can increase the value of "-j/--threads" to accelerate processing.
@@ -2305,7 +2305,7 @@ Usage
23052305
``` text
23062306
remove duplicated sequences by ID/name/sequence
23072307
2308-
Attentions:
2308+
Attention:
23092309
1. When comparing by sequences, both positive and negative strands are
23102310
compared. Switch on -P/--only-positive-strand for considering the
23112311
positive strand only.
@@ -2530,7 +2530,7 @@ If you want to cut a sequence into multiple segments.
25302530
E.g., cutting into segments of 40 bp and keeping the last segment which can be shorter than 40 bp.
25312531
seqkit sliding -g -s 40 -W 40 input.fasta -o out.fasta
25322532
2533-
Attentions:
2533+
Attention:
25342534
1. For the two-pass mode (-2/--two-pass), The flag -U/--update-faidx is recommended to
25352535
ensure the .fai file matches the FASTA file.
25362536
@@ -2791,7 +2791,7 @@ Usage
27912791
```text
27922792
match up paired-end reads from two fastq files
27932793
2794-
Attentions:
2794+
Attention:
27952795
1. Orders of headers in the two files better be the same (not shuffled),
27962796
otherwise, it consumes a huge number of memory for buffering reads in memory.
27972797
2. Unpaired reads are optional outputted with the flag -u/--save-unpaired.
@@ -3418,7 +3418,7 @@ Usage
34183418
``` text
34193419
concatenate sequences with same ID from multiple files
34203420
3421-
Attentions:
3421+
Attention:
34223422
1. By default, only sequences with IDs that appear in all files are outputted.
34233423
use -f/--full to output all sequences.
34243424
2. If there are more than one sequences of the same ID, we output the Cartesian
@@ -3481,7 +3481,7 @@ Usage
34813481
``` text
34823482
edit sequence (point mutation, insertion, deletion)
34833483
3484-
Attentions:
3484+
Attention:
34853485
34863486
1. Mutiple point mutations (-p/--point) are allowed, but only single
34873487
insertion (-i/--insertion) OR single deletion (-d/--deletion) is allowed.
@@ -3694,7 +3694,7 @@ seqkit will write the sequences to temporary files, and create FASTA index.
36943694
36953695
Secondly, seqkit shuffles sequence IDs and extract sequences by FASTA index.
36963696
3697-
Attentions:
3697+
Attention:
36983698
1. For the two-pass mode (-2/--two-pass), The flag -U/--update-faidx is recommended to
36993699
ensure the .fai file matches the FASTA file.
37003700
@@ -3757,7 +3757,7 @@ seqkit will write the sequences to temporary files, and create FASTA index.
37573757
Secondly, seqkit sorts sequence by head and length information
37583758
and extracts sequences by FASTA index.
37593759
3760-
Attentions:
3760+
Attention:
37613761
1. For the two-pass mode (-2/--two-pass), The flag -U/--update-faidx is recommended to
37623762
ensure the .fai file matches the FASTA file.
37633763

seqkit/cmd/amplicon.go

+1-1
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ var ampliconCmd = &cobra.Command{
5252
Short: "extract amplicon (or specific region around it) via primer(s)",
5353
Long: `extract amplicon (or specific region around it) via primer(s).
5454
55-
Attentions:
55+
Attention:
5656
1. Only one (the longest) matching location is returned for every primer pair.
5757
2. Mismatch is allowed, but the mismatch location (5' or 3') is not controlled.
5858
You can increase the value of "-j/--threads" to accelerate processing.

0 commit comments

Comments
 (0)