Two different spaces

While working on the reporting system, I needed to pass a test by comparing .xlsx spreadsheets built with our system with those created with BIRT. Reports were compared by scripts, row-by-row. I identified and resolved all apparent differences, but the test was still failed.

Cells in both files visually looked the same. There were numbers with spaces as decimal separators.

| 58 971 713 | 58 971 713 | Are they different?

I decided to copy its values to IDE and compare them by Ctrl + F. Search had to show two matches, but the match was only one. Cell values were not the same! I started comparing strings one symbol after another and found that the difference was in spaces.

Quick test written in JS showed the difference:

1
2
3
4
5
'58 971 713'.charCodeAt(2);
// > 160

'58 971 713'.charCodeAt(2);
// > 32

BIRT used non-breaking space as the decimal separator (and it was correct), and I used the common space in my reporting system.

So, there are two types of space characters - Space and Non-breaking space. They look identical, but they are the different symbols, and they are not equal. Non-breaking space is used to prevent automatic line breaking.

Name Decimal code NCR HEX code Unicode
Space ( ) 32   20 U+0020
Non-breaking space ( ) 160   A0 U+00A0

I didn’t know about it because I haven’t worked with typography a lot. After changing the separator symbol, all tests became green.

Conclusion

Different symbols can look the same (for example: latin c and cyrillic с). And space is one of these symbols, too. It can be just a space, or non-breaking space.

Even if these symbols look similar, they are different for computers and software. This detail, invisible for a human eye, can be the reason of an error.

BTW, there are more types of spaces, but they don’t look like the common space. Here they are:

Name Decimal code NCR HEX code Unicode
En space ( ) 8194   2002 U+2002
Em space ( ) 8195   2003 U+2003
Narrow non-breaking space ( ) 8239   202F U+202F
Figure space ( ) 8199   2007 U+2007