UFT8 and lang chars

Help and bugs.

UFT8 and lang chars

Postby roltel » June 19th, 2012, 3:27 pm

Hello again.

I was able to upload my aiml files a last. And my bot is running ok, i guess.

We use Portuguese character because our bot will be for Portuguese speakers. I have no problem in uploading our aiml files. Nether less when i go to the mysql aiml table, pt-PT chars apear messed up. I change them in the database manualy and i get everything the correct in and out from the bot using: "olá" - "como está você?".

I thought, i could change the messed chars using phpmyadmin, 10.000 lines...

But first i tested downloading that aiml file, and reuploading it. I realized the chars went messed up again.

What can be advised me to do?
roltel
Casual Member
 
Posts: 14
Joined: March 24th, 2012, 4:17 am

Re: UFT8 and lang chars

Postby GeekCaveCreations » June 19th, 2012, 8:04 pm

I've been looking into this problem for a while now (since version 1, in fact), and have yet to come up with a solution. The 'special characters' you've shown in your example (lower case 'a' with an accent, and lower case 'e' with a circumflex) should not be a problem, since they're both within the UTF-8 and Latin-1 character sets (Program O uses UTF-8 for it's character encoding), and I've no idea why it's giving grief. I used to think that it was a problem with what are called multibyte characters, but none of the letters in your examples are multibyte, so I'm not sure what's happening.

If you can do me a favor, and post a few sample AIML categories here that are giving you problems, I'll build a special testing AIML file, and use it to see what can be done. Maybe I can use them to trace what's breaking, and can go from there.
Comforting the disturbed, and disturbing the comfortable
Chat with Morti
User avatar
GeekCaveCreations
Safe, Reliable Insanity, Since 1961
 
Posts: 1115
Joined: April 18th, 2011, 10:52 pm
Location: Nevada, USA

Re: UFT8 and lang chars

Postby roltel » June 20th, 2012, 4:02 am

Thanks a lot for reply, really apreciate it.

Here is some categories you can try and test:

<category><pattern>* QUE * PLANTA</pattern><template>E um organismo vivo pertencente ao ramo vegetal. É e um ser vivo que depende de agua e luz e um bom solo para viver.</template>
<category><pattern>QUAIS SÃO AS LEIS DA TERMODINÂMICA</pattern><template>Não sou físico mas acho que tem a ver com calor, entropia e conservação de energia certo?</template></category>
<category><pattern>O QUE CAUSA UM TUMOR</pattern><template>Câncro.</template></category>
<category><pattern>* É COMPRIMENTO DE ONDA</pattern><template>Comprimento de onda é a relação inversa de frequência.</template></category>
<category><pattern>* É * TERMODINÂMICA</pattern><template>É o ramo da física que se debruça sobre a transformação do calor de e para outras forma de energia e a leis que regulam essas conversões de energia.</template>

Thanks in advance
roltel
Casual Member
 
Posts: 14
Joined: March 24th, 2012, 4:17 am

Re: UFT8 and lang chars

Postby programo » June 20th, 2012, 2:11 pm

thank you for the test aiml that will definitely help
:D
User avatar
programo
Site Admin
 
Posts: 306
Joined: April 4th, 2011, 4:46 pm
Location: Nottingham UK

Re: UFT8 and lang chars

Postby roltel » August 25th, 2012, 10:00 am

I spent hours and ours trying to figure this out. Im not a programer but i have some light in php.

Could you give me a hand in trying to figure this out?

For now the most important would be to import correctly those special char. I trying changing mysql encoding on tables and database, but nothing hapenned.

I tracked php codes to face that problem and i came out with something like: $str = str_replace(chr(130), ',', $str); // baseline single quote

Can you try to do a patch with a replace charset or something? Or can you guide on how should i do it?

Thanks!

references for my study:
http://php.net/manual/en/function.chr.php
http://forums.digitalpoint.com/showthread.php?t=550665
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
roltel
Casual Member
 
Posts: 14
Joined: March 24th, 2012, 4:17 am

Re: UFT8 and lang chars

Postby programo » September 7th, 2012, 1:42 pm

I will be sure to look into this as soon as possible... hopefully over the weekend
User avatar
programo
Site Admin
 
Posts: 306
Joined: April 4th, 2011, 4:46 pm
Location: Nottingham UK

Re: UFT8 and lang chars

Postby roltel » September 12th, 2012, 3:38 am

Thanks a lot. If we get to fix and use our encodings we will start developing in php/mysql rather the java tools! :)

We would realy apreciate such a fix. Thanks in advance!
roltel
Casual Member
 
Posts: 14
Joined: March 24th, 2012, 4:17 am

Re: UFT8 and lang chars

Postby roltel » November 8th, 2013, 2:04 am

Was unable to do any relevant change. But now, i reinstalled latest version and finaly i got my special language chars working.

Portuguese aiml tested with sucess. Good work.
roltel
Casual Member
 
Posts: 14
Joined: March 24th, 2012, 4:17 am


Return to Bugs and Help

Who is online

Users browsing this forum: No registered users and 1 guest

cron