@@ -24,7 +24,9 @@ To run, use the entry point:
24
24
25
25
26
26
## CSV Files
27
- We support CSV files stored in text, gzip (` .gz ` ), or bzip2 (` .bz2 ` ) formats.
27
+ We support CSV files stored in text, compressed gzip (` .gz ` ), or compressed
28
+ bzip2 (` .bz2 ` ) formats.
29
+
28
30
By default, we attempt to auto-detect the header and delimiter of the CSV file
29
31
via Python's supplied CSV parsing library. The ` --query ` option will query the
30
32
detected CSV metadata and print to ` STDOUT ` :
@@ -34,7 +36,8 @@ detected CSV metadata and print to `STDOUT`:
34
36
Found fields:
35
37
['Date Of Stop', 'Time Of Stop', 'Latitude', 'Longitude', 'Description']
36
38
37
- Note that ` out.tns ` is not touched when querying a CSV file.
39
+ Note that ` out.tns ` is not touched when querying a CSV file, though it is
40
+ required as a positional argument.
38
41
39
42
Any numer of CSV files can be provided for output, so long as the fields used
40
43
to construct the sparse tensor are found in each file.
@@ -60,9 +63,10 @@ For more information on file formats, see
60
63
## Tensor Construction
61
64
### Mode selection
62
65
Columns of the CSV file (referred to as "fields") are selected using the
63
- ` --field= ` flag. If the CSV file has a header, the supplied parameter must
64
- match a field in the header (but is ** not** case sensitive). If the field has
65
- spaces in the name, simply enclose it in quotes: ` --field="time of day" ` .
66
+ ` --field= ` flag (abbreviated ` -f ` ). If the CSV file has a header, the supplied
67
+ parameter must match a field in the header (but is ** not** case sensitive).
68
+ Otherwise, the columns are referenced by number and one-indexed. If the field
69
+ has spaces in the name, simply enclose it in quotes: ` --field="time of day" ` .
66
70
67
71
68
72
### Tensor values
@@ -80,7 +84,8 @@ treated as integers, floats, dates, or other types.
80
84
81
85
In addition to affecting the ordering of the resulting indices, the type of a
82
86
column affects the mapping of CSV entries to unique indices. For example, one
83
- may wish to round floats such that ` 1.38 ` and ` 1.38111 ` map to the same value.
87
+ may wish to round floats such that ` 1.38 ` and ` 1.38111 ` map to the same index,
88
+ or to map dates ` Aug 20 ` and ` August 20 ` to the same index.
84
89
85
90
We provide several types which can be specified with the ` --type= ` flag:
86
91
* ` str ` => String (default)
@@ -104,7 +109,7 @@ specified, and thus they will map to different indices if the type is `year`.
104
109
105
110
You can specify multiple fields in the same ` --type ` instance. For example:
106
111
` --type=userid,itemid,int ` would treat the fields ` userid ` and ` itemid ` both
107
- to integers.
112
+ as integers.
108
113
109
114
110
115
### Advanced mode types
@@ -125,7 +130,7 @@ as source code and specifies a custom type. For example,
125
130
--type=cost,"lambda x : float(x) * 1.06"
126
131
127
132
may be a method of scaling all costs by 6% to account for sales tax. Note that
128
- all types should take a single parameter which will be an ` str ` object.
133
+ all type functions take a single parameter which will be an ` str ` object.
129
134
130
135
131
136
0 commit comments