CSV files, variable number of fields #2188

PSLLSP · 2024-03-29T14:33:36Z

hledger 1.32.3

Current hledger can process only well formed CSV files. It can skip several lines at the start of the file. And a rule end can be used to force skip to the rest of the current file.

I would like to have an option to process CSV files with variable number of fields, like that when field is missing, hledger will not end with an error because field is missing.

there could be an option to define minimal number of mandatory fields
new variable will be defined, that will keep number of fields in the current record; awk has variable NF, something like that

There is rule skip in if block statement, this could be used to skip records that doesn't have correct number of fields or take an action on these records, like adding missing information.

I have CSV records, those have several header lines followed by CSV records those have data I want to process. I can design hledger rule file that process one such file but I cannot design rule file that process those files when joined with cat *.csv > import.csv.

Example:

$ cat import.csv
statement: 2023/01
account: 123456789/4321
owner: John Smith
start date: 1.1.2023
end date: 31.1.2023
opening balance:  100.00 EUR
closing balance:  100.50 EUR

"ID";"Date";"Volume";"Currency";"Description";"Note"
"54321";"31.01.2023";"0.5";"EUR";"Interest";""

statement: 2023/02
account: 123456789/4321
owner: John Smith
start date: 1.2.2023
end date: 28.2.2023
opening balance:  100.50 EUR
closing balance:  101.00 EUR

"ID";"Date";"Volume";"Currency";"Description";"Note"
"56324";"28.02.2023";"0.5";"EUR";"Interest";""

I would like to define that no fields are mandatory (to be able to skip empty lines)
I would like to use if block rule to:

skip records/lines those do not have 6 fields
skip records where field1 (%1) has value ID - to ignore headers
process the rest like regular CSV records

I tried to design import rules for simplified case (all records have 6 fields) and it can be done. I only miss an option to compare number of CSV fields, so I have to use a workarround to find valid CSV records

CSV file for test:

$ cat test-import1.csv
"ID";"Date";"Volume";"Currency";"Description";"Note"
"54321";"31.01.2023";"0.5";"EUR";"Interest";""

"ID";"Date";"Volume";"Currency";"Description";"Note"
"56324";"28.02.2023";"0.6";"EUR";"Interest";""

"ID";"Date";"Volume";"Currency";"Description";"Note"
"56327";"31.03.2023";"0.7";"";"Interest";"CURRENCY is missing"

import rules:

$ cat test-import1.csv.rules
# hledger import rules

skip 0
separator ;

fields f1,f2,f3,f4,f5,f6

# debug; it prints only records those were accepted and included in the journal...
comment \n%f1;%f2;%f3;%f4;%f5;%f6

# define date format
date-format %d.%m.%Y

# skip header record
if %f1 ID
  skip

# skip records where "volume" field is empty.
# just a test that record can be ignored
if %f3 ^$
  skip

# other test for empty field.
#if ! %f3 .
#  skip

code  %f1
date  %f2
amount  %f3 %f4
description %f5 - %f6

# Currency is missing, define default value
# variable cannot be redefined :-( (%f4 USD)
# workarround - define currency or amount, like "amount %f3 USD"
if %f4 ^$
  amount %f3 USD

$ hledger -f test-import1.csv print
2023-01-31 (54321) Interest -
    ; 54321;31.01.2023;0.5;EUR;Interest;
    expenses:unknown         0.5 EUR
    income:unknown          -0.5 EUR

2023-02-28 (56324) Interest -
    ; 56324;28.02.2023;0.6;EUR;Interest;
    expenses:unknown         0.6 EUR
    income:unknown          -0.6 EUR

2023-03-31 (56327) Interest - CURRENCY is missing
    ; 56327;31.03.2023;0.7;;Interest;CURRENCY is missing
    expenses:unknown         0.7 USD
    income:unknown          -0.7 USD

It can even process file where some lines are not valid CSV records:

$ cat test-import2.csv
statement: 2023/01
account: 123456789/4321
owner: John Smith
start date: 1.1.2023
end date: 31.1.2023
opening balance:  100.00 EUR
closing balance:  100.50 EUR

"ID";"Date";"Volume";"Currency";"Description";"Note"
"54321";"31.01.2023";"0.5";"EUR";"Interest";""

statement: 2023/02
account: 123456789/4321
owner: John Smith
start date: 1.2.2023
end date: 28.2.2023
opening balance:  100.50 EUR
closing balance:  101.00 EUR

"ID";"Date";"Volume";"Currency";"Description";"Note"
"56324";"28.02.2023";"0.5";"EUR";"Interest";""

$ ln -sf test-import1.csv.rules test-import2.csv.rules
$ hledger -f test-import2.csv print
2023-01-31 (54321) Interest -
    ; 54321;31.01.2023;0.5;EUR;Interest;
    expenses:unknown         0.5 EUR
    income:unknown          -0.5 EUR

2023-02-28 (56324) Interest -
    ; 56324;28.02.2023;0.5;EUR;Interest;
    expenses:unknown         0.5 EUR
    income:unknown          -0.5 EUR

The text was updated successfully, but these errors were encountered:

simonmichael added A-WISH Some kind of improvement request, hare-brained proposal, or plea. csv The csv file format, csv output format, or generally CSV-related. labels Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV files, variable number of fields #2188

CSV files, variable number of fields #2188

PSLLSP commented Mar 29, 2024 •

edited

Loading

CSV files, variable number of fields #2188

CSV files, variable number of fields #2188

Comments

PSLLSP commented Mar 29, 2024 • edited Loading

PSLLSP commented Mar 29, 2024 •

edited

Loading