As pointed out in stan-dev/loo#185, we only need the last M tail indices, so instead of sorting the whole array of weights, we can just sort the upper tail weights, which at least for Vector is faster. We can do this by replacing sortperm with partialsortperm.