This function is coming straight from coding hell. Its main logic consists of one illegible long regular expression. The regex should be split up, even if that means that some of the code has to be duplicated.