Jump to content

UTF-8 search strange situation


filoaman
Go to solution Solved by filoaman,

Recommended Posts

Hi all

I have a very strange situation here...

 

I try to make a string search for a term in utf-8. My search term is "Den lille fløyten"

if i use this code:

$string = "Den Lille Fløyten"; // The string to search
$phrase = "Den lille fløyten"; // The term to search
$phrase = preg_replace('/\s+/', '\s+', preg_quote($phrase));
$p = '/\b' . $phrase . '\b/ui';
$result = preg_match($p, $string); // This gives me true (1)

everything is OK!

 


 
But if i put my search term in an array (which i have to do...) and then try to do the search using this search term as an array element i always get false result:
$SearchTerms=array('Høstnatt på Fjellskogen', 'Langt Innpå Skoga', 'Den lille fløyten', 'Sølv'); // my array with search terms
$string = "Den Lille Fløyten"; // The string to search
$phrase = $SearchTerms[2] // The term to search as part of an array
$phrase = preg_replace('/\s+/', '\s+', preg_quote($phrase));
$p = '/\b' . $phrase . '\b/ui';
$result = preg_match($p, $string); // This gives me FALSE (0)!!!!!!!

Any Ideas?

 

 
Thanks in advance.

 

Link to comment
Share on other sites

Try,

header('Content-Type: text/plain; charset=utf-8');
$SearchTerms=array('Høstnatt på Fjellskogen', 'Langt Innpå Skoga', 'Den lille fløyten', 'Sølv'); // my array with search terms
$string = "Den Lille Fløyten"; // The string to search
$phrase = $SearchTerms[2]; // The term to search as part of an array

echo $phrase."\n";

$phrase = preg_replace('/\s+/', '\s+', preg_quote($phrase));
$p = '/\b' . $phrase . '\b/ui';
$result = preg_match($p, $string); // This gives me FALSE (0)!!!!!!!

echo $result;

Results:

 

Den lille fløyten1
Link to comment
Share on other sites

Some example to find within an array

 

 

<?php
$SearchTerms=array('Høstnatt på Fjellskogen', 'Langt Innpå Skoga', 'Den lille fløyten', 'Sølv'); // my array with search terms
$string = "Den lille fløyten"; // The string to search

//using in_array()
if(in_array($string,$SearchTerms)){
echo $string." found <br />";
} else {
echo $string." not found <br />";
}


//using preg_match
$found = array();
foreach ($SearchTerms as $str) {
  if (preg_match ("/\b($string)\b/i", $str, $matches)){
  $found[] = $matches[1];
}
}
print_r($found);
echo "<br />";

//preg_grep()
$found2 = array();
$found2 = preg_grep("/\b($string)\b/i", $SearchTerms);
print_r($found2);

?>

 

returns:

Den lille fløyten found
Array ( [0] => Den lille fløyten )
Array ( [2] => Den lille fløyten )

Link to comment
Share on other sites

Well, thank you all for the answers.

First of all i'd like to say that this code is part of a more big and complex script, so finally reading the above answers i try some of the proposes and i manage to get the result i like, but... only when i run this particular part of code as a self-standing script and not as a part of my whole script.

 

Finally after hours of testing and experimentation i found what is the problem:

When i manually create the "$string" (term to search), everything works great!

But when i create the the "$string" search term, using a 'mysql_query' then the script doesn't work.

There is no difference if i create the "$SearchTerms" array using 'mysql_query' or not but the code doesn't work if i extract the "$string" search term using a 'mysql_query'!

And of course this is happen ONLY when the search term is with UTF-8 encoding, when i search for ascii terms everything works fine.

 

I give you the  'mysql_query' i use to extract the "$string" search term, for inspection and ideas:

 

 

$stringArray=array();
$result = mysql_query("SELECT * FROM foo WHERE foo2='foo3' ");
while ($row = mysql_fetch_array($result)) {
    array_push($stringArray, $row["foo4"]);
}
$string=$stringArray[x]; // where 'x' is the position of the UTF-8 search term i like to use
Link to comment
Share on other sites

  • Solution

Yes, finally the problem was that my "$string" search term wasn't UTF-8.


 


I make the mistake to use (i really can't remember why... :tease-03: ) this expression: "htmlentities($foo);" when i originally imported my data to mySql.


So although on the screen when i was using the "print" or "echo" command i was able to see the search term with UTF-8 appearance actually the data wasn't real UTF-8.


 


I use a small  function to recognize the encoding of my "$string" search term and i realize that it wasn't UTF-8.


I search the database and i realize that the data inside the database wasn't UTF-8 also.


So i modify the code on the mySql insertion removing the "htmlentities" and the data inserted on the mySql as UTF-8!


 


The script works just fine now!


 


Thank you.


Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.