Start a new topic

Tips: Arduino/Teensy + Visual Studio + Special characters èåæäøö€µ° / ISO-8859-1/-15 / UTF-8

Here is a summary of some experiments with accented and special characters on the Nextion display when using Arduino/Teensy microcontroller unit and Visual Studio IDE (with Visual Micro plugin).  

See attached PDF file. The content of the document is too big to be pasted directly in this post. 


If someone has a different experience or a better code for the conversion function (lookup table), please share with us :-) 


Here are some useful sources: 

http://playground.arduino.cc/Code/UTF-8 

http://playground.arduino.cc/Main/Utf8ascii 

Visual Studio / Micro: http://www.visualmicro.com/page/User-Guide.aspx?doc=Non-ASCII.html  

ZI Font Editor v. -0.08: http://support.iteadstudio.com/support/discussions/topics/11000003343/page/2  

 

Here is the code from chapter 3.3.2:  


// [In global variables and functions]

// ========== ========== ========== ========== ========== ========== 
// Conversion of text from UTF-8 to ISO-8859-xx

// Character to print if character code is not recognized
byte charError = 0x5F;  // 0x5F = "_"
                        // 0x3F = "?"
                        // 0x20 = " "

// Lookup tables
// Called by function utf8ToByte_2()

// UTF-8:
// List of special characters with UTF-8 codes that don't start with 0xC2 or 0xC3
byte utf8Characters[][4] = {
  { 0xC5, 0x92 },  // Π C5 92  U+0152
  { 0xC5, 0x93 },  // œ  C5 93  U+0153
  { 0xC5, 0xA0 },  // Š  C5 A0  U+0160
  { 0xC5, 0xA1 },  // š  C5 A1  U+0161
  { 0xC5, 0xB8 },  // Ÿ  C5 B8  U+0178
  { 0xC5, 0xBD },  // Ž  C5 BD  U+017D
  { 0xC5, 0xBE },  // ž  C5 BE  U+017E
  { 0xE2, 0x82, 0xAC },  // €  E2 82 AC  U+20AC
  { 0xEF, 0xBF, 0xBD }   // �  EF BF BD  U+FFFD
};

// ISO-8859-1:
// Characters must be in the same order as in the table utf8Characters
// Set charError if the character is not used in this charset
byte byteIso8859_1[] = {
  charError,  // Π C5 92  U+0152
  charError,  // œ  C5 93  U+0153
  charError,  // Š  C5 A0  U+0160
  charError,  // š  C5 A1  U+0161
  charError,  // Ÿ  C5 B8  U+0178
  charError,  // Ž  C5 BD  U+017D
  charError,  // ž  C5 BE  U+017E
  charError,  // €  E2 82 AC  U+20AC
  charError   // �  EF BF BD  U+FFFD
};

// ISO-8859-15:
// Characters must be in the same order as in the table utf8Characters
// Set charError if the character is not used in this charset
byte byteIso8859_15[] = {
  0xBC,  // Π C5 92  U+0152
  0xBD,  // œ  C5 93  U+0153
  0xA6,  // Š  C5 A0  U+0160
  0xA8,  // š  C5 A1  U+0161
  0xBE,  // Ÿ  C5 B8  U+0178
  0xB4,  // Ž  C5 BD  U+017D
  0xB8,  // ž  C5 BE  U+017E
  0xA4,  // €  E2 82 AC  U+20AC
  charError  // �  EF BF BD  U+FFFD
};

// TEST
// Windows-1252:
// Characters must be in the same order as in the table utf8Characters
// Set charError if the character is not used in this charset
byte byteWindows1252[] = {
  0x8C,  // Π C5 92  U+0152
  0x9C,  // œ  C5 93  U+0153
  0x8A,  // Š  C5 A0  U+0160
  0x9A,  // š  C5 A1  U+0161
  0x9F,  // Ÿ  C5 B8  U+0178
  0x8E,  // Ž  C5 BD  U+017D
  0x9E,  // ž  C5 BE  U+017E
  0x80,  // €  E2 82 AC  U+20AC
  charError  // �  EF BF BD  U+FFFD
};

// ========== 
// Function (2)
// Search for matching UTF-8 code in table utf8Characters
// and return corresponding 1-byte character in ISO-8859-xx
// bytesIn[] = Array of incoming bytes with UTF-8 encoding
// Function called by function utf8ToByte()
byte utf8ToByte_2(const byte bytesIn[], const uint16_t charset) {
  for (uint16_t i = 0; i < sizeof(utf8Characters); i++)
  {
    // Compare the size of bytesIn and row utf8Characters[i]
    // then compare each byte
    if (sizeof(bytesIn) == sizeof(utf8Characters[i]))
    {
      uint8_t flagMatch = 1;
      for (uint8_t j = 0; j < sizeof(bytesIn); j++)
      {
        if (bytesIn[j] != utf8Characters[i][j])
        {
          flagMatch = 0;
          break;  // break for j
        }
      }  // end for
      // If all bytes are equal, return 1-byte code according to ISO-8859-xx
      if (flagMatch == 1)
      {
          switch (charset)
          {
            case 1:  // ISO-8859-1
              if (i < sizeof(byteIso8859_1)) return (byteIso8859_1[i]);
              else return (charError);
              break;
            case 15:  // ISO-8859-15
              if (i < sizeof(byteIso8859_15)) return (byteIso8859_15[i]);
              else return (charError);
              break;
            case 1252:  // Windows-1252
              if (i < sizeof(byteWindows1252)) return (byteWindows1252[i]);
              else return (charError);
              break;
            default:
              return (charError);
          }  // end switch (charset)
      }  // end if (flagMatch)
    }  // end if sizeof
    // else next for
  }  // end for rows in utf8Characters
  // else if no matching character found
  return (charError);
}  // end function utf8ToByte_2

// ========== 
// Function (3)
// **** UTF-8-Decoder: convert UTF-8-string to ISO-8859-xx / Windows-1252 ****
// based on http://playground.arduino.cc/Main/Utf8ascii  

static byte byte1;  // Last byte buffer
static byte byte2;  // Second last byte buffer

// Convert a single character from UTF-8 to ISO-8859-1/-15 or Windows-1252
// Parameter: charset = 1 / 15 / 1252: Charset to use for conversion:
//   * 1 = ISO-8859-1
//   * 15 = ISO-8859-15
//   * 1252 = Windows-1252
// Return "0" if a byte has to be ignored
// Originally: byte utf8ascii(byte ascii) {
byte utf8ToByte(const byte byteIn, const uint16_t charset) {
  // if byteIn between 0x00 and 0x7F  =>  1-byte character
  // if byteIn between 0x8A and 0xFF  =>  2-/3-/4-byte character

  if (byteIn < 128)  // Standard ASCII-byte 0..0x7F handling  
  {
    byte1 = 0;
    byte2 = 0;
    return(byteIn);
  }

  // else byteIn >= 128
  // get previous input
  byte lastByte = byte1;   // get last byte
  byte lastByte2 = byte2;  // get second last byte
  // Decrement byte order for next loop
  byte2 = byte1;   // last byte will be second last byte in next loop
  byte1 = byteIn;  // current byte will be last byte in next loop

  // if lastByte between 0xC0 and 0xDF  =>  2-byte character
  // if lastByte between 0x80 and 0xBF  =>  3-byte or 4-byte character

  // if lastByte between 0xC0 and 0xDF  =>  2-byte character
  if ((lastByte >= 0xC0) && (lastByte <= 0xDF)) {
    switch (lastByte)     // conversion depending on previous UTF-8-byte
    {
      case 0xC2: 
        byte1 = 0; byte2 = 0;
        return (byteIn); break;  // Remove C2, keep only byteIn
      case 0xC3: 
        byte1 = 0; byte2 = 0;
        return (byteIn | 0xC0); break;  // Change bits 6 and 7
        // Ä = C3 84 ==> 84 | C0 = 1000.0100 | 1100.0000 = 1100.0100 = C4 = Ä
      default:
        byte1 = 0; byte2 = 0;
        //return (utf8ToByte_2({ lastByte, byteIn }, charset));
        byte bytesIn[] = { lastByte, byteIn };
        return (utf8ToByte_2(bytesIn, charset));
        //return (utf8ToByte_2b(bytesIn, charset));  // TEST
    }  // end switch lastByte
  //}  // end if lastByte 0xC0-0xDF

  // else if lastByte between 0x80 and 0xBF  =>  3-byte or 4-byte characters
  }
  else if ((lastByte >= 0x80) && (lastByte <= 0xBF)) {
    byte1 = 0; byte2 = 0;
    byte bytesIn[] = { lastByte2, lastByte, byteIn };
    return (utf8ToByte_2(bytesIn, charset));
    //return (utf8ToByte_2b(bytesIn, charset));  // TEST

  }  // end if lastByte

  return (0);  // otherwise: return zero (byte has to be ignored)
}  // end function utf8ToByte

// ========== 
// Function (4)
// Convert Arduino String object from UTF-8 to ISO-8859-1/-15 / Windows-1252
// Function called by Serialxx.print()
// Originally: String utf8ascii(String s)
// in http://playground.arduino.cc/Main/Utf8ascii  
String utf8ToByte_Str(String s, const uint16_t charset)
{
  String r="";
  char c;
  for (int i=0; i<s.length(); i++)
  {
    c = utf8ToByte(s.charAt(i), charset);
    if (c!=0) r+=c;
  }
  return r;
}

// ========== 
// Function (5)
// In place conversion of C string from UTF-8 to ISO-8859-1/-15 / Windows-1252
// The converted string is shorter than UTF-8
// Originally: void utf8ascii(char* s)
// in http://playground.arduino.cc/Main/Utf8ascii  
void utf8ToByte_CStr(char* s, const uint16_t charset)
{
  int k=0;
  char c;
  for (int i=0; i<strlen(s); i++)  // strlen(s) = without '\0'
  {
    c = utf8ToByte(s[i], charset);
    if (c!=0) 
      s[k++]=c;  // s[k]=c; k++;
      // i = 0 ==> k = 0
      // k incrémenté APRÈS avoir stocké char c dans s[k]
      // Si c==0 alors pas de caractère stocké, et k non incrémenté
  }
  s[k]='\0';  // '\0' terminator, k <= strlen(s)
}

// ========== 
// (7)

#define SerialNxtn Serial2  // <<<<<<<<<<<<<<<<<<<<<<<<< Adapt to your project 

// Charset to use for conversion of text to Nextion display:
//   * 1  = ISO-8859-1
//   * 15 = ISO-8859-15
uint16_t charsetNxtn = 15;  // ISO-8859-15

// Function to change "txt" attribute of a Nextion component
// Parameters:
//   * NxtnCompName: Nextion component name ("t0", "bLight"...). Only ASCII 127.
//   * text: Text to set in the component. With special characters. UTF-8.
void Nxtn_SetTxt(const char *NxtnCompName, const char *text)
{
  SerialNxtn.print(NxtnCompName);
  SerialNxtn.print(".txt=");
  SerialNxtn.write(0x22);  // = SerialNxtn.print("\"");
  //SerialNxtn.print(text);
  SerialNxtn.print(utf8ToByte_Str(text, charsetNxtn));
  SerialNxtn.write(0x22);  // "
  SerialNxtn.write(0xFF);
  SerialNxtn.write(0xFF);
  SerialNxtn.write(0xFF);
}

// ========== ==========
// [In setup() or loop()]

Nxtn_SetTxt("t0", "I går var det snø og -5° i Genève."); 

 

I hope this can help. 


pdf
1 Comment

Hey Raphael,


Thanks for the shout out to the ZI Font Editor.


My understandings may make things either more simplified or more complex.  So here goes.


The way the Nextion deals with characters is on a single byte basis (except for the  Indexed fonts).

In almost all cases with MCUs, it is a byte that is transferred to the Nextion.

This triggers a lookup ...

    byte value (from 0 to 255) - decimal 32.

    If not negative, locate font data matrix for this value and render.


In most .zi fonts the generation for ascii is from 32 to 126, quantity 95

while the generation for iso-8859-X is 32-255 for quantity of 224


It is possible to create a truncated font set to contain numbers

ending with the last number at 57 - from 32 to 57 for quantity 26 -

the number values from 48 to 57 follow the above Nextion lookup,

and so numbers will render - but letters will fail as expected.


On the MCU level, everything is bytes.

This is why the iso-8859, the "character" is not looked up

It is that in different regions, byte 240 means many things

But to the Nextion, it is merely 240-32 -> render matrix 208.


This is predetermined by the iso-8859 set in HMI Settings.

After this point it is static inside the TFT file, it doesn't change.

When used in a different region iso-8859s, it will behave differently.

And this is because 240 means different things between regions.



Login or Signup to post a comment