Results 1 to 10 of 22

Thread: Dealing with UTF-8 in BRAINscript

Hybrid View

  1. #1
    Contributor
    Join Date
    Jan 2013
    Location
    Boston, MA
    Posts
    15

    Default Dealing with UTF-8 in BRAINscript

    Hello, everyone.

    The topic of manipulating UTF-8 strings within the nodes has come up with increasing frequency recently. To that end, I’d like to share a few tips with the community.
    1. Before you can manipulate UTF-8 data you need to make sure those fields are stored as unicode, not string type. You can do this by opening the BRD viewer and looking at the field in question. The data type the field is stored in is directly under the field name. If it’s of type “string”, it may display properly in the BRD viewer, but will not work when you attempt to operate on the string.
    2. UTF-8 encoded field names are not allowed and will cause a node to fail.

    Let’s say you have completed steps #1 and #2 above and now you want to add a filter node that excludes certain records based on the contents of a column. Normally, you would just use the BRAINscript of:
    Code:
    emit * where columnA <> “my_string”
    If my string happens to be a UTF-8 encoded string, you’ll have a problem because you can’t natively enter that string in the script. For example:
    Code:
    emit * where columnA <> “只依三“
    Will not work because UTF-8 isn’t supported in the script at the moment.

    The solution is to use the Unicode Converter tool in BRE. To do so:
    1. Open BRE
    2. Go to the View menu and select Unicode Converter
    3. Paste the UTF-8 string you want to use in the script, in our example it’s “只依三“, and click Convert
    4. You’ll get an output that looks something like “\u53ea\u4f9d\u4e09” – that’s the hexadecimal values of the characters you entered.
    5. Go back to your filter node and paste that string wherever you would use a normal string (i.e. functions, variables, etc)
    Code:
    emit * where columnA <> u”\u53ea\u4f9d\u4e09”
    Note the “u” before the string. That’s needed to indicate that the following should be treated as Unicode


    Last edited by pdespot; 06-11-2013 at 05:44 PM.

  2. #2

    Default

    Hi!

    How to correctly show utf-8 data in BRD viewer after unicode conversion?

    How to get output excel to work with utf-8 data?

    Thanks in advance!

  3. #3
    Contributor
    Join Date
    Jan 2013
    Location
    Boston, MA
    Posts
    15

    Default

    Hello, aop.

    The BRD viewer has a separate configuration for how which character set to use. To change it:
    1) Open the BRD viewer
    2) Go to the Preference menu and select "Set codepages"
    3) That will bring up a dialog box in which you can select your character set
    4) Enter "utf-8" without the quotes and click OK
    5) That should correctly display the UTF-8 characters

  4. #4

    Default

    It does work for the original utf-8 data but afterapplying the unicode function for the field it does not display correctly with that codepage.

  5. #5
    Contributor
    Join Date
    Jan 2013
    Location
    Boston, MA
    Posts
    15

    Default

    Interesting. Could you please post the graph with the node making that change and some sample data?

  6. #6

    Default

    I attached a zip below with graph and demo data.
    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •