Fucked Up Text in Code Snippet

The story of how I lost an hour of my life because of a weird unicode copy-&-paste character in a code snippet.

Today I made my first steps with Qt QML/Qt Quick, read a bit about it, picked up a very simple example and tried to build it; even though it was not much text, for a start I simply copied a few lines from an article in my text files and tried to build it…

When I suddenly hit an unexplainable error during the compilation:

Generating qrc_qml.cpp
CUSTOMBUILD : RCC Parse error : 'C:/devel/_test/QML-Test/src/qml.qrc' Line: 4 Column: 29 [Invalid XML name.] [...]

Huh, Invalid XML name? 😕

Looking at the mere six lines of the QRC file, I couldn’t see anything weird. So I checked and double-checked the syntax, looked for typos and at other places in the build process – but couldn’t find anything suspicious. I searched for 30 minutes or so (probably more…) on the internet for that error; read and re-read the Qt and CMake documentation – still no enlightenment.

Then, while testing around with all possibilites, I changed the text in the file a bit more – and discovered something odd…

The slashes marked red are the ones from the copied text, the slash marked green was added manually later by me, while playing around. And only by chance they aligned vertically so perfect that I could spot a slight mismatch:

The file/text

At first, I thought my eyes were just too tired, but after looking at in a hex editor, I understood what the cause of the error was.

Hex editor view: Bad Hex editor view: Good

Usually, I have the habit of pasting text that I copy from websites first into a plain text editor, before I insert or process it further (to sanitize it and to get rid of weird font and encoding artifacts – Hah! 😒 ), and I did do that indeed for this snippe also, but I normally use UTF-8 encoded text file. And by doing that, I guess, I dragged this weird character encoding over… 😢

Some more explanation:
The red-marked slashes were encoded as 0xE2 0x81 0x84 – that is the Unicode Character for FRACTION SLASH (U+2044), which is not the common and and simple SLASH (SOLIDUS) (U+002F), which would have been expected and accepted by the compiler! 😮 😠

So, after replacing those three faulty characters, all was fine again. But what an irritation and waste of time it was… 😩