Skip to content

Commit d914b43

Browse files
committed
small doc edits
1 parent c07e7d7 commit d914b43

File tree

1 file changed

+13
-8
lines changed

1 file changed

+13
-8
lines changed

README.md

+13-8
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,9 @@ To run, use the entry point:
2424

2525

2626
## CSV Files
27-
We support CSV files stored in text, gzip (`.gz`), or bzip2 (`.bz2`) formats.
27+
We support CSV files stored in text, compressed gzip (`.gz`), or compressed
28+
bzip2 (`.bz2`) formats.
29+
2830
By default, we attempt to auto-detect the header and delimiter of the CSV file
2931
via Python's supplied CSV parsing library. The `--query` option will query the
3032
detected CSV metadata and print to `STDOUT`:
@@ -34,7 +36,8 @@ detected CSV metadata and print to `STDOUT`:
3436
Found fields:
3537
['Date Of Stop', 'Time Of Stop', 'Latitude', 'Longitude', 'Description']
3638

37-
Note that `out.tns` is not touched when querying a CSV file.
39+
Note that `out.tns` is not touched when querying a CSV file, though it is
40+
required as a positional argument.
3841

3942
Any numer of CSV files can be provided for output, so long as the fields used
4043
to construct the sparse tensor are found in each file.
@@ -60,9 +63,10 @@ For more information on file formats, see
6063
## Tensor Construction
6164
### Mode selection
6265
Columns of the CSV file (referred to as "fields") are selected using the
63-
`--field=` flag. If the CSV file has a header, the supplied parameter must
64-
match a field in the header (but is **not** case sensitive). If the field has
65-
spaces in the name, simply enclose it in quotes: `--field="time of day"`.
66+
`--field=` flag (abbreviated `-f`). If the CSV file has a header, the supplied
67+
parameter must match a field in the header (but is **not** case sensitive).
68+
Otherwise, the columns are referenced by number and one-indexed. If the field
69+
has spaces in the name, simply enclose it in quotes: `--field="time of day"`.
6670

6771

6872
### Tensor values
@@ -80,7 +84,8 @@ treated as integers, floats, dates, or other types.
8084

8185
In addition to affecting the ordering of the resulting indices, the type of a
8286
column affects the mapping of CSV entries to unique indices. For example, one
83-
may wish to round floats such that `1.38` and `1.38111` map to the same value.
87+
may wish to round floats such that `1.38` and `1.38111` map to the same index,
88+
or to map dates `Aug 20` and `August 20` to the same index.
8489

8590
We provide several types which can be specified with the `--type=` flag:
8691
* `str` => String (default)
@@ -104,7 +109,7 @@ specified, and thus they will map to different indices if the type is `year`.
104109

105110
You can specify multiple fields in the same `--type` instance. For example:
106111
`--type=userid,itemid,int` would treat the fields `userid` and `itemid` both
107-
to integers.
112+
as integers.
108113

109114

110115
### Advanced mode types
@@ -125,7 +130,7 @@ as source code and specifies a custom type. For example,
125130
--type=cost,"lambda x : float(x) * 1.06"
126131

127132
may be a method of scaling all costs by 6% to account for sales tax. Note that
128-
all types should take a single parameter which will be an `str` object.
133+
all type functions take a single parameter which will be an `str` object.
129134

130135

131136

0 commit comments

Comments
 (0)