Right now we have HTML to TXT conversion built in, based on Tika. Look into options like https://github.com/mrded/pandoc-as-a-service or rather https://cwiki.apache.org/confluence/display/TIKA/TikaServer to see if we can do more document conversions