-
Notifications
You must be signed in to change notification settings - Fork 42
Description
I rewrite the ruby layer to request only the C code, I use the default (stock) config to store cached acoustic fingerprint to a custom directory with a php CLI only script (only exec() line can interest you) :
(TMP operation on tmpfs RAM)
foreach($audiocodecs as $i => $audiocodec) {
$acoustic = "$ACOUSTICDIR/$videobase.$filesize.$i.csv.gz";
if(file_exists($acoustic))
continue;
echo "$videofile\n";
exec("ffmpeg -loglevel quiet -i \"$videofile\" -map 0:a:$i -ac 1 -ar 16000 -f f32le -acodec pcm_f32le \"$TMPRAW\"");
echo "$acoustic\n";
exec("Olaf/bin/olaf_c print \"$TMPRAW\" \"$videofile\" | gzip > \"$TMPGZ\" && mv \"$TMPGZ\" \"$acoustic\"", $output);
unlink($TMPRAW);
}
And to build the B+ tree I use a loop to load all ~1h40/2h (average movie duration) audio tracks :
foreach($audiocodecs as $i => $audiocodec) {
$acoustic = "$ACOUSTICDIR/$videobase.$filesize.$i.csv.gz";
if(!file_exists($acoustic))
continue;
echo "$videofile\n";
exec("gunzip -c \"$acoustic\"", $output);
$content = "";
foreach($output as $line)
$content .= "1/1,$videobase.$i,$line\n";
file_put_contents($TMPCSV, $content);
echo "$acoustic\n";
exec("Olaf/bin/olaf_c store_cached \"$TMPCSV\"");
}
I can generate all my audio track fingerprint from movies, all gz are lightweight, but integration into B+ tree part is too slow and db become too huge. even on my i9 64GB DDR5 / PCIE SSD machine :(
How you can store 340 days of audio (around 800GB of mp3s) inside a 15GB database ? mine grow way faster than this with your default config https://github.com/JorenSix/Olaf/blob/master/src/olaf_config.c .... and I have about 4000 days (10 years !) of sound to index !!!!! game over lol But with an estimation based on your result, this must enter inside 150GB database, good for me, but this is not the case (I grow rapidly over multiples terabytes and show as a snail to rebuild the B+ tree from all lightweight fingerprints.csv.gz)