Results 1 to 2 of 2

Thread: Removing accented characters

  1. #1

    Default Removing accented characters

    I am trying to remove accented characters and replace them with their corresponding non-accented character. I am trying the following:

    <field>.str()..regexSubstitute("ÚŕŰŔ","e").regexSu bstitute("┴┬─└┼├","A").regexSubstitute("ă","C").re gexSubstitute("đ","D").regexSubstitute("╔╩╦╚","E") .regexSubstitute("â","F").regexSubstitute("═╬¤╠"," I").regexSubstitute("Đ","N").regexSubstitute("ËďÍĎ ěŇ","O").regexSubstitute("Ő","S").regexSubstitute( "┌█▄┘","U").regexSubstitute("Ţč","Y").regexSubstit ute("Ä","Z").regexSubstitute("ßÔńÓňŃ","a").regexSu bstitute("š","c").regexSubstitute("â","f").regexSu bstitute("Ýţ´ý","i").regexSubstitute("˝","n").rege xSubstitute("ˇ˘÷˛°§","o").regexSubstitute("Ü","s") .regexSubstitute("˙űب","u").regexSubstitute("ř ", "y").regexSubstitute("×","z")

    First, it seems I have to add the str() at the start of the substitution list, or else I get a UTF-8 character error. (Not sure why.) But, this substitution does not need to work in all instances. I do see some situations in which the characters are correctly replaced (ă is replaced with C) but other instances where they are not (Ú remains, and is not replaced with e).

    Can anyone see/advise what I am doing wrong?

    Thank you in advance

  2. #2
    Lavastorm Employee
    Join Date
    Nov 2012
    Location
    Warrington, UK
    Posts
    200

    Default

    Your replacement of "ă" with "C" works because this "ă" is the only character in the pattern for that particular regexSubstitute() operator.

    In the case of the substitution of "ÚŕŰŔ" for "e" this is failing to match because your pattern is specifies that the entire four character string "ÚŕŰŔ" needs to be present for the match. If you want to match any of the characters then you need to change the pattern to insert an 'or' between each accented character "Ú|ŕ|Ű|Ŕ"

    Can you provide an example of the data that is causing the UTF-8 error - e.g. by opening the data viewer selecting all or the selected rows/columns then use 'Edit' -> 'Copy as Static Data' and paste the data into your reply to the post.

    You previously asked a similar question back in April. Was there an issue with using the solutions provided to that post? If you require additional character substitutions you can modify the '_Acc_Chars' and '_Reg_Chars' strings as described in the reply.

    http://community.lavastorm.com/threa...0466#post10466

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •