v1

AIMS:

Test Vanilla TM
Test Coalesced TM
- without much pre-processing.
Try only with “Komiteens tilråding”-part of the document.
Baseline :
- ALL the Data
- scikit-learn count vectorizer, Max n_grams = 2, binarized
- s = 1

Files:

v1/code/preprocessing/simple_bag_of_words.py


{'featurized':converted data, 
'labels':labels as number,
'idx_:_word': unique words with indices,
'word_:_idx':reverse word map, 
'labels_:_labelnum':dict mapping of labels to numbers
'train_test_split':not split into train-test}

v1/code/preprocessing/countvectorizer_bag_of_words.py


{'featurized':converted data, 
'labels':labels as number,
'idx_:_word': unique words with indices,
'word_:_idx':reverse word map,
'featurenames_vectorizer': feature names after calling CountVectorizer fit_transform,
'labels_:_labelnum':dict mapping of labels to numbers
'train_test_split':index of split for train-test}

v1/code/vanillaTM.py

sample1 -> [label1, label2]

is converted to:

sample1 -> label1
sample1 -> label2

Failing with following error:

self.clause_bank[:, :, 0:self.number_of_state_bits_ta - 1] = np.uint32(~0)
OverflowError: Python integer -1 out of bounds for uint32

Possibly due to all labels not being represented in training data.

v1/code/coalescedTM.py

Further hyperparamter tuning required.

Current Accuracy at 69.5.

Classification report for class-wise PRF in v1/results/classificationreport_simplebow_coalesced.txt

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
v1		v1
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

v1

AIMS:

Files:

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

cair/TsetlinMachineSubjectTaggingPilot

Folders and files

Latest commit

History

Repository files navigation

v1

AIMS:

Files:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages