So I tried the big file demo and in the end it sorted the clusters by size with reverse = true. And the cluster coming first had empty content. Not so sure whether it is correct because this cluster contains about half of all logs. And if true, can anyone explain why?