Skip to content

C++ code parser parseXMLString may loose data when UTF-8 characters are fragmented #88

@danny-smit

Description

@danny-smit

The generated c++ class IVEFParser has a method parseXMLString with the following signature:

bool Parser::parseXMLString(QString data, bool cont)

The parameter bool cont indicates that the parser must keep data which does not contain a full XML message, to be parsed again at the next execution the parseXMLString method. This is very useful when the data is coming from a tcp stream, which is susceptible to packet fragmentation. Therefore it is not guaranteed that complete XML message will be received as a single packet, even though they may have been send as a single packet. Using bool cont = true, the parseXMLString method aims to be safe against packet fragmentation.

However this is a (design) flaw in this method as the data argument is passed as a QString. The QString is a character encoding aware kind of string, as opposed to raw bytes (e.g. QByteArray, std::string or char*). This may result in the data parsing error as follows.

For example when the XML is send as UTF-8 data over a TCP connection. As we already know, TCP is susceptible to packet fragmentation. This means that the data that is being send on the one end, may be cut in half and received in multiple pieces on the other end. This "cutting" can occur anywhere between any of the bytes that are send. When this occurs with UTF-8 encoded data, we have to be aware of the fact that UTF-8 is a multibyte encoding of max 4 bytes. This means that packet fragmentation can occur, which puts the first 2 bytes in the first packet and the last 2 bytes in a second packet. This is normally not a problem, as long as we collect and append the bytes properly. The bool cont indication aims to take care of collecting and appending, however this does not work with a QString.

Currently this results into packet 1 to be added to a QString. However the last to bytes are only half a charachter, because of which they are rejected by the QString, even though the remaining two bytes are appended afterwards.

Therefore collecting and appending the bytes should actually be done with the raw data, like a QByteArray. For example like:

bool IVEFParser::parseXMLString(const QByteArray& data, bool cont)

Allthough some indication of the expected encoding should probably be added.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions