Character Translation

FILEFLEX PROGRAMMER MANUAL

CHAPTER 20
Character Translation

If you're using a language that has special characters in its character sets (i.e., accent marks, umlauts, and other specialty characters), you may run into an interesting problem moving documents from Macintosh to Windows or vice-versa. That's because while ASCII is cleanly defined for the US English character set of "a-zA-Z", that does not mean that character values of special characters are uniformly used across platforms.

FileFlex user Antonio Lucena of Madrid, Spain describes the conversion issue as it pertains to DOS vs. Windows files as well:

The problem is that Windows uses different character set than MS-DOS (and the databases created with dBASE). MS-DOS uses OEM Char set, and Windows uses ANSI. For example in OEM, a diacritical "e" is numbered 130, but in ANSI, same "e" is numbered 233. The same problem appears when you open a document (with diachitical vowels on it) made with the EDIT tool from MS-DOS and you try to open it with the WRITE tool from Windows and no previous conversion was made."

Note: The above message illustrates the value of the FileFlex discussion groups. Another user had discovered the translation problem and by asking questions to this user and making that dialog public via the FileFlex discussion boards available from FileFlex HQ at http://www.component-net.com, Antonio was able to see the message and contribute his feedback. With feedback from him and others, we were able to identify the need for the DBTranslateChars function described below.

To use FileFlex character-level translation, you develop a translation table that describes the new and old values and pass it to FileFlex along with a container of characters to be translated.

Setting up a character translation table is very straightforward. You need to build a string consisting of 256 characters. The position in the string is the value of the old character and the value at that position becomes the new character.

Note: The first character in the string is considered "position 0" by FileFlex. Also note that you cannot place a 0 into any character position. If you do not want translation, place the corresponding character value into that position or the value 255.

Creating Character Translation Utility Scripts

The best way to create the character translation table is to write a simple utility script. Here's an example script that simply contains the ASCII character set:

on buildTranslateTable_ASCIIX
  global ASCIIX
  put "" into theTable
  repeat with i = 0 to 255
    if i = 0 then
      put numToChar(255) after theTable -- use 255 in byte 0
    else
      put numToChar(i) after theTable -- position in table
    end if
  end repeat
  put theTable into ASCIIX
end buildTranslateTable_ASCIIX

Note the name of the handler is "BuildTranslateTable_ASCIIX". We've developed a convention where the routine that builds the translation table is called "BuildTranslateTable_" and the name of the translation itself is appended to the end. In order to clearly indicate character translation tables, we've also placed an X after every translation table ("X" for an often used abbreviation for translate, which is "Xlate"). The translation table is placed in a global variable of the same name. So, for a translation table that converts to Windows diacriticals, we recommend naming the handler "BuildTranslateTable_WinCharX" and the global variable containing the sort order "WinCharX".

Here's an example routine that converts upper case to lower case (and the reverse):

on buildTranslateTable_CaseReverseX
  global CaseReverseX, ASCIIX
  buildTranslateTable_ASCIIX
  put ASCIIX into theTable
  -- fill in lower case
  repeat with i = 65 to 90
    put numToChar(i+32) into char i+1 of theTable 
    -- using i+1 above because strings begin at 1, not 0
  end repeat
  -- fill in upper case
  repeat with i = 97 to 122
    put numToChar(i-32) into char i+1 of theTable 
  end repeat
  put theTable into CaseReverseX
end buildTranslateTable_CaseReverseX

The above routine reverses the case, so an upper case "A" becomes a lower case "a" and vice versa. To create a routine that always converts to upper case, make both sets of characters upper case. Likewise, to create a routine that always converts to lower case, make both sets of characters lower case. Here's an UpperX routine:

on buildTranslateTable_UpperX
  global UpperX, ASCIIX
  buildTranslateTable_ASCIIX
  put ASCIIX into theTable
  -- fill in upper case
  repeat with i = 97 to 122
    put numToChar(i-32) into char i+1 of theTable 
    -- using i+1 above because strings begin at 1, not 0
  end repeat
  put theTable into UpperX
end buildTranslateTable_UpperX

"WARNING: Make absolutely certain you fill in all 256 bytes. Failure to do this could cause FileFlex to scan beyond the end of the translation table and the results could be unpredictable and your program could abnormally terminate."

Translating Characters Using FileFlex

You can use FileFlex to translate character sets within a text container using the DBTranslateChars function. DBTranslateChars takes two parameters: the string to be translated and the pre-built translation table described above. It returns the translated string:

put DBTranslateChars(myString,CaseReverseX) into newString

Here's a sample routine that will do the character translation:

on mouseUp
  global CaseReverseX
    
  buildTranslateTable_CaseReverseX
  put DBTranslateChars(field "text data",CaseReverseX) into field "text data"
end mouseUp

Standalone Case Conversion Functions

FileFlex provides you with the ability to do case conversions of standalone strings.

The function DBUpper will convert a string from lower case to upper case using the standard ASCII upper case conversion. Here's how to call DBUpper:

put DBUpper(string) into newString

Likewise DBLower will convert a string intelligently from upper case to lower case using the standard ASCII lower case conversion. Here's how to call DBLower:

put DBLower(string) into newString

Discuss this chapter on the FileFlex Boards.

Casa de Bender