Jump to content

UTF-8 search strange situation


filoaman

Recommended Posts

Hi all

I have a very strange situation here...

 

I try to make a string search for a term in utf-8. My search term is "Den lille fløyten"

if i use this code:

$string = "Den Lille Fløyten"; // The string to search
$phrase = "Den lille fløyten"; // The term to search
$phrase = preg_replace('/\s+/', '\s+', preg_quote($phrase));
$p = '/\b' . $phrase . '\b/ui';
$result = preg_match($p, $string); // This gives me true (1)

everything is OK!

 


 
But if i put my search term in an array (which i have to do...) and then try to do the search using this search term as an array element i always get false result:
$SearchTerms=array('Høstnatt på Fjellskogen', 'Langt Innpå Skoga', 'Den lille fløyten', 'Sølv'); // my array with search terms
$string = "Den Lille Fløyten"; // The string to search
$phrase = $SearchTerms[2] // The term to search as part of an array
$phrase = preg_replace('/\s+/', '\s+', preg_quote($phrase));
$p = '/\b' . $phrase . '\b/ui';
$result = preg_match($p, $string); // This gives me FALSE (0)!!!!!!!

Any Ideas?

 

 
Thanks in advance.

 

Link to comment
https://forums.phpfreaks.com/topic/280982-utf-8-search-strange-situation/
Share on other sites

Try,

header('Content-Type: text/plain; charset=utf-8');
$SearchTerms=array('Høstnatt på Fjellskogen', 'Langt Innpå Skoga', 'Den lille fløyten', 'Sølv'); // my array with search terms
$string = "Den Lille Fløyten"; // The string to search
$phrase = $SearchTerms[2]; // The term to search as part of an array

echo $phrase."\n";

$phrase = preg_replace('/\s+/', '\s+', preg_quote($phrase));
$p = '/\b' . $phrase . '\b/ui';
$result = preg_match($p, $string); // This gives me FALSE (0)!!!!!!!

echo $result;

Results:

 

Den lille fløyten1

Some example to find within an array

 

 

<?php
$SearchTerms=array('Høstnatt på Fjellskogen', 'Langt Innpå Skoga', 'Den lille fløyten', 'Sølv'); // my array with search terms
$string = "Den lille fløyten"; // The string to search

//using in_array()
if(in_array($string,$SearchTerms)){
echo $string." found <br />";
} else {
echo $string." not found <br />";
}


//using preg_match
$found = array();
foreach ($SearchTerms as $str) {
  if (preg_match ("/\b($string)\b/i", $str, $matches)){
  $found[] = $matches[1];
}
}
print_r($found);
echo "<br />";

//preg_grep()
$found2 = array();
$found2 = preg_grep("/\b($string)\b/i", $SearchTerms);
print_r($found2);

?>

 

returns:

Den lille fløyten found
Array ( [0] => Den lille fløyten )
Array ( [2] => Den lille fløyten )

Well, thank you all for the answers.

First of all i'd like to say that this code is part of a more big and complex script, so finally reading the above answers i try some of the proposes and i manage to get the result i like, but... only when i run this particular part of code as a self-standing script and not as a part of my whole script.

 

Finally after hours of testing and experimentation i found what is the problem:

When i manually create the "$string" (term to search), everything works great!

But when i create the the "$string" search term, using a 'mysql_query' then the script doesn't work.

There is no difference if i create the "$SearchTerms" array using 'mysql_query' or not but the code doesn't work if i extract the "$string" search term using a 'mysql_query'!

And of course this is happen ONLY when the search term is with UTF-8 encoding, when i search for ascii terms everything works fine.

 

I give you the  'mysql_query' i use to extract the "$string" search term, for inspection and ideas:

 

 

$stringArray=array();
$result = mysql_query("SELECT * FROM foo WHERE foo2='foo3' ");
while ($row = mysql_fetch_array($result)) {
    array_push($stringArray, $row["foo4"]);
}
$string=$stringArray[x]; // where 'x' is the position of the UTF-8 search term i like to use

Yes, finally the problem was that my "$string" search term wasn't UTF-8.


 


I make the mistake to use (i really can't remember why... :tease-03: ) this expression: "htmlentities($foo);" when i originally imported my data to mySql.


So although on the screen when i was using the "print" or "echo" command i was able to see the search term with UTF-8 appearance actually the data wasn't real UTF-8.


 


I use a small  function to recognize the encoding of my "$string" search term and i realize that it wasn't UTF-8.


I search the database and i realize that the data inside the database wasn't UTF-8 also.


So i modify the code on the mySql insertion removing the "htmlentities" and the data inserted on the mySql as UTF-8!


 


The script works just fine now!


 


Thank you.


Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.