Skip to content

How to optimize config for a huge movie collection track ? #48

@ServeurpersoCom

Description

@ServeurpersoCom

I rewrite the ruby layer to request only the C code, I use the default (stock) config to store cached acoustic fingerprint to a custom directory with a php CLI only script (only exec() line can interest you) :

(TMP operation on tmpfs RAM)

  foreach($audiocodecs as $i => $audiocodec) {
   $acoustic = "$ACOUSTICDIR/$videobase.$filesize.$i.csv.gz";
   if(file_exists($acoustic))
    continue;

   echo "$videofile\n";
   exec("ffmpeg -loglevel quiet -i \"$videofile\" -map 0:a:$i -ac 1 -ar 16000 -f f32le -acodec pcm_f32le \"$TMPRAW\"");

   echo "$acoustic\n";
   exec("Olaf/bin/olaf_c print \"$TMPRAW\" \"$videofile\" | gzip > \"$TMPGZ\" && mv \"$TMPGZ\" \"$acoustic\"", $output);

   unlink($TMPRAW);
  }

And to build the B+ tree I use a loop to load all ~1h40/2h (average movie duration) audio tracks :

  foreach($audiocodecs as $i => $audiocodec) {
   $acoustic = "$ACOUSTICDIR/$videobase.$filesize.$i.csv.gz";
   if(!file_exists($acoustic))
    continue;

   echo "$videofile\n";
   exec("gunzip -c \"$acoustic\"", $output);
   $content = "";
   foreach($output as $line)
    $content .= "1/1,$videobase.$i,$line\n";
   file_put_contents($TMPCSV, $content);
   echo "$acoustic\n";
   exec("Olaf/bin/olaf_c store_cached \"$TMPCSV\"");
  }

I can generate all my audio track fingerprint from movies, all gz are lightweight, but integration into B+ tree part is too slow and db become too huge. even on my i9 64GB DDR5 / PCIE SSD machine :(

How you can store 340 days of audio (around 800GB of mp3s) inside a 15GB database ? mine grow way faster than this with your default config https://github.com/JorenSix/Olaf/blob/master/src/olaf_config.c .... and I have about 4000 days (10 years !) of sound to index !!!!! game over lol But with an estimation based on your result, this must enter inside 150GB database, good for me, but this is not the case (I grow rapidly over multiples terabytes and show as a snail to rebuild the B+ tree from all lightweight fingerprints.csv.gz)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions