Jump to content

[SOLVED] comparing utf-8 encoded text in php, and queried from mysql db


dsaba

Recommended Posts

OK i realize that different programs do things differently, I don't know if you would classify "mysql" as a program but mysql and php are two seperate entities, call them whatever you like, but i'm trying to make a point that I THINK that they treat strings differently, and have different methods of parsing strings

 

I am storying text from many different languages in my mysql with charset utf-8, and collation utf8_bin

 

all the text being parsed in the php script is also encoded in utf-8, as the script is saved in utf-8 charset

 

now i am wanting to allow users to sign up or register with my website, and be able to choose usernames in foreign alphabets

 

if I am going to do this, i need to be positive that I can successfully compare usernames from any language encoded in utf-8, and test for their uniqueness

 

for example fredrico who is from spain signs up at my website, when he chooses his spanish username I will query the mysql db for WHERE username='$fredrico'

 

I need to be positive that $fredrico compared to any other utf-8 encoded string can successfully find a match with only the same spanish characters

 

 

in other words for this to work, I need to know if the letter aleph in hebrew encoded in utf-8 is unique and will never equal another character from another alphabet

 

 

Is this true?

 

-thank you, ready carefully if you don't understand, ask if u still don't get it

Link to comment
Share on other sites

aleph in hebrew encoded in utf-8 will never equal some other character encoded in utf8.  But be aware that characters that have the same meaning may have different utf-8 encodings for various reasons.  As long as your users enter their username the same way every time you will have no problem.

 

Take a look here for gory details: http://www.unicode.org/reports/tr15/#Introduction

 

So yes, you will not get false positives (matching a name that it shouldn't match), but it's possible to get false negatives (not matching a name that should match).  A good example is the german "ss", "ae", "ue", "oe" which are identical in meaning to ß, ä, ö, ü, but will be encoded differently.

 

This does mean that if one person signs up as "Günther" and another as "Guenther", they will be considered as different.  Which is probably ok, as they are visually different, although they represent the same name.

Link to comment
Share on other sites

dont' quite get what u mean by "false negatives"

 

what you are talking about with german is irrelevant, that is transilterated german, to me that would not be classified as german script or  seperate language if it is transliterated german

 

tranlisterated means another script written in latin characters, so to me it will be classified as english

 

for example the "chet" letter in hebrew is transliterated to "ch" like you see in the word Channukah

but this information is irrelevant to what i'm originally asked, i'm not checking for transliteration and am not matching transliteration to any language counterpart

 

 

well yeah i had done some research before and this is what I thought the answer was, so yes a utf-8 representation of a character is unique and will never match another utf-8 representation of another character in any language

 

so if I search for א among utf-8 representations of alphabets of 20 different languages, it will only find a match with another א , and never anything else

 

i just figured i might post this to see if anyone else with experience in this topic might know something I didn't already presume

 

topic solved

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.