Given that you are using only 2 symbols I'd be faster to use a goertzel filter instead of using a full FFT, this will also reduce the dependecy with FFTW and will simplify the fsk part in size and complexity :)
more info: https://en.wikipedia.org/wiki/Goertzel_algorithm
Actually it will be an honor to send you a MR :)
Given that you are using only 2 symbols I'd be faster to use a goertzel filter instead of using a full FFT, this will also reduce the dependecy with FFTW and will simplify the fsk part in size and complexity :)
more info: https://en.wikipedia.org/wiki/Goertzel_algorithm
Actually it will be an honor to send you a MR :)