From 91b5eab220eb459d565ed8ea2a23642c7bc7f1cd Mon Sep 17 00:00:00 2001 From: johannesjh Date: Sun, 7 Dec 2025 15:53:07 +0100 Subject: [PATCH 1/2] Update README.rst to document the algorithms used fixes #139 by documentating the machine learning algorithms used in smart_importer --- README.rst | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/README.rst b/README.rst index 47ed4f3..4b0627d 100644 --- a/README.rst +++ b/README.rst @@ -234,3 +234,39 @@ In your importer code, you can then pass `jieba` to be used as tokenizer: tokenizer = lambda s: list(jieba.cut(s)) predictor = PredictPostings(string_tokenizer=tokenizer) + + +Privacy +------- + +smart_importer uses machine learning (artificial intelligence, AI) algorithms in an ethical, privacy-conscious way: +All data processing happens on the local machine; no data is sent to or retrieved from external servers or the cloud. +All the code, including the machine learning implementation, is open-source. + +Model: +The machine learning model used in smart_importer is a classification model. +The goal of the classification model is to predict transaction attributes, +such as postings/accounts and payee names, +in order to reduce the manual effort when importing transactions. +The model is implemented using the open-source `scikit-learn `__ library, +specifically using scikit-learn's `SVC (support vector machine) `__ implementation. + +Training data: +The model is trained on historical transactions from your Beancount ledger. +This training happens on-the-fly when the import process is started, by reading ``existing_entries`` from the importer. +The trained model is used locally on your machine during the import process, as follows. + +Input: +The input data are the transactions to be imported. +Typically, these are transactions with a single posting, where one posting (e.g., the bank account) is known and the other one is missing. + +Output: +The output data are transactions with predicted second postings and/or other predicted transaction attributes. + +Accuracy and Feedback Loops: +The effectiveness of the model depends on the volume and diversity of your historical data — small or homogeneous datasets may result in poor predictions. +Predictions are made automatically when importing new transactions, but users should always review them for accuracy before committing them to the ledger. +Users can manually adjust predictions (e.g., change the payee or account) and save the corrected transactions to their ledger. +These corrections are then used as training data for future predictions, allowing the accuracy to improve over time. + +The smart_importer project is fully open source, meaning you can inspect and modify the code as needed. From 173a873da80a7c8fbf648aa269c3a40d97b3f48a Mon Sep 17 00:00:00 2001 From: johannesjh Date: Sun, 7 Dec 2025 15:59:57 +0100 Subject: [PATCH 2/2] Update README.rst trims trailing whitespace to make pre-commit happy --- README.rst | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/README.rst b/README.rst index 4b0627d..416d464 100644 --- a/README.rst +++ b/README.rst @@ -240,18 +240,18 @@ Privacy ------- smart_importer uses machine learning (artificial intelligence, AI) algorithms in an ethical, privacy-conscious way: -All data processing happens on the local machine; no data is sent to or retrieved from external servers or the cloud. +All data processing happens on the local machine; no data is sent to or retrieved from external servers or the cloud. All the code, including the machine learning implementation, is open-source. -Model: +Model: The machine learning model used in smart_importer is a classification model. -The goal of the classification model is to predict transaction attributes, -such as postings/accounts and payee names, +The goal of the classification model is to predict transaction attributes, +such as postings/accounts and payee names, in order to reduce the manual effort when importing transactions. The model is implemented using the open-source `scikit-learn `__ library, specifically using scikit-learn's `SVC (support vector machine) `__ implementation. -Training data: +Training data: The model is trained on historical transactions from your Beancount ledger. This training happens on-the-fly when the import process is started, by reading ``existing_entries`` from the importer. The trained model is used locally on your machine during the import process, as follows. @@ -260,13 +260,13 @@ Input: The input data are the transactions to be imported. Typically, these are transactions with a single posting, where one posting (e.g., the bank account) is known and the other one is missing. -Output: +Output: The output data are transactions with predicted second postings and/or other predicted transaction attributes. Accuracy and Feedback Loops: -The effectiveness of the model depends on the volume and diversity of your historical data — small or homogeneous datasets may result in poor predictions. +The effectiveness of the model depends on the volume and diversity of your historical data — small or homogeneous datasets may result in poor predictions. Predictions are made automatically when importing new transactions, but users should always review them for accuracy before committing them to the ledger. -Users can manually adjust predictions (e.g., change the payee or account) and save the corrected transactions to their ledger. +Users can manually adjust predictions (e.g., change the payee or account) and save the corrected transactions to their ledger. These corrections are then used as training data for future predictions, allowing the accuracy to improve over time. The smart_importer project is fully open source, meaning you can inspect and modify the code as needed.