Sugaroid uses an sqlite3
-type database for portability.
All the responses are explicitly saved and trained on sugaroid.
Sugaroid has two types of training:
- Supervised training
- Unsupervised training
Supervised training is a list of proper responses, most commonly
collected from the Stanford Question Answering Dataset (Natural)
(SQuAD 2.0 from Stanford NLP, attribution to Rajpurkar & Jia et al. '18). Other
reponses are manually trained from interactions during testing.
All the responses are saved to ~/.config/sugaroid/sugaroid.db
which is opened in read-only mode during production mode to
prevent people from tampering with the dataset. At local testing,
it is possible to teach sugaroid a sequel of responses and this
will appended to the SQL database. Using Naive Bayers
algorithm.
Unsupervised training are a community collected dataset.
The sources of data, are obviously from the community, on its
hosted sugaroid.srevinsaju.me
instance on Microsoft Azure, frontend on AWS. This data are
also appended to the SQL database like
Supervised Training but they are saved
with lesser confidence ( 0.1 * confidence_from_statement
), as
data from community needs to undergo refining.
Sugaroid's backend module is sqlite3
against the conventional
MySQL or MariaDB adapters. sqlite3
was chosen considering its
portability alone. Despite higher IO operations on sqlite3
,
community data collection becomes easier because sqlite3
databases are more or less, a single file. Another problem it
solves is the different ways in which the operating systems
consider the file path to be. Using sqlite3
helps to keep
consistency in case. (For Windows, mysql
is case insensitive,
but on GNU/Linux/UNIX its case sensitive). Using sqlite3
solves
that problem.
Sugaroid collects data from its users which are then used to
train. This is done through cookies, on the first response you
provide to sugaroid (on the web interface), on adding the bot to
your discord channel (on the Discord adapter). However, your data
is completely safe, and is not collected for training purposes
if its (i) self hosted (ii) run as a desktop / command line app.
All data on the desktop version is still appended to your
respective configuration folders, which is, for example, on Linux,
~/.config/sugaroid/sugaroid.db
and on Windows its C:\Users\foobar\AppData\Local\sugaroid\sugaroid.db
.
Note:
AppData
folder is normally hidden on Windows, manually "Show all hidden folders" to see the AppData folder.
There are certain cases when you would like to analyze the data
stored in the database, or you would like to do some debugging.
In all such cases, the path to the sugaroid.db
is very much
useful. All you need is an sqlite3
binary, which is available
for all platforms.
Download
sqlite3
from here
And then, start investigating by
$ sqlite3 ~/.config/sugaroid/sugaroid.db
This will open a prompt, where you can enter most commands;
Apart from the main database, sugaroid
also stores data in
~/.config/sugaroid/sugaroid.db
~/.config/sugaroid/sugaroid.trainer.json
~/.config/sugaroid/sugaroid_internal.db
~/.config/sugaroid/data.json
Along with SQL, we have also used JSON type files for configuration alone.