How to optimize config for a huge movie collection track ?

I rewrite the ruby layer to request only the C code, I use the default (stock) config to store cached acoustic fingerprint to a custom directory with a php CLI only script (only exec() line can interest you) :

(TMP operation on tmpfs RAM)
```
  foreach($audiocodecs as $i => $audiocodec) {
   $acoustic = "$ACOUSTICDIR/$videobase.$filesize.$i.csv.gz";
   if(file_exists($acoustic))
    continue;

   echo "$videofile\n";
   exec("ffmpeg -loglevel quiet -i \"$videofile\" -map 0:a:$i -ac 1 -ar 16000 -f f32le -acodec pcm_f32le \"$TMPRAW\"");

   echo "$acoustic\n";
   exec("Olaf/bin/olaf_c print \"$TMPRAW\" \"$videofile\" | gzip > \"$TMPGZ\" && mv \"$TMPGZ\" \"$acoustic\"", $output);

   unlink($TMPRAW);
  }
```

And to build the B+ tree I use a loop to load all ~1h40/2h (average movie duration) audio tracks :
```
  foreach($audiocodecs as $i => $audiocodec) {
   $acoustic = "$ACOUSTICDIR/$videobase.$filesize.$i.csv.gz";
   if(!file_exists($acoustic))
    continue;

   echo "$videofile\n";
   exec("gunzip -c \"$acoustic\"", $output);
   $content = "";
   foreach($output as $line)
    $content .= "1/1,$videobase.$i,$line\n";
   file_put_contents($TMPCSV, $content);
   echo "$acoustic\n";
   exec("Olaf/bin/olaf_c store_cached \"$TMPCSV\"");
  }
```

I can generate all my audio track fingerprint from movies, all gz are lightweight, but integration into B+ tree part is too slow and db become too huge. even on my i9 64GB DDR5 / PCIE SSD machine :(

How you can store 340 days of audio (around 800GB of mp3s) inside a 15GB database ? mine grow way faster than this with your default config https://github.com/JorenSix/Olaf/blob/master/src/olaf_config.c .... and I have about 4000 days (10 years !) of sound to index !!!!! game over lol But with an estimation based on your result, this must enter inside 150GB database, good for me, but this is not the case (I grow rapidly over multiples terabytes and show as a snail to rebuild the B+ tree from all lightweight fingerprints.csv.gz)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to optimize config for a huge movie collection track ? #48

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to optimize config for a huge movie collection track ? #48

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions