Jump to content

can't find items with apostrophe in them.


The Little Guy

Recommended Posts

I have the following code in a function ($start/$end are passed in parameters):

<?php
$start = preg_quote($start, "/");
$end = preg_quote($end, "/");
$good = (bool)preg_match_all($search = "/$start(.+)$end/isU", $data, $matches);
print_r($matches);
?>

 

$data comes from the output of CURL

 

when I use my function like so:

<?php
myFunc("I've", "Thanks for sharing.");

// Within the data:
// I’ve been visiting your blog for a while now and I always find a gem in your new posts.  Thanks for sharing.
?>

 

My print_r() comes back with an empty array, and I don't understand why? I can find all kinds of things, but when I search for something containing an apostrophe it doesn't work, even though it does exist... What should I do?

Link to comment
https://forums.phpfreaks.com/topic/237675-cant-find-items-with-apostrophe-in-them/
Share on other sites

Those two apostrophes are different: one is straight, one is smart. That's the only problem I see.

Got a few options open to you: could alter the data, could make a more complex expression to allow for character variants...

 

By the way, if you're simply checking for whether one string comes before another, strpos is a better choice than regular expressions.

I will have to make a more complex expression. Any suggestions?

 

I am not making a function to check if one string comes before another I am making a function that gets the text between two strings and returns all occurrences that were found.

The easy way out would be to build a set of characters equivalent to something. For example, a character set for apostrophes.

 

If you wanted to say that "i" and "I" were equivalent, you could do something like this:

$start = "I've";
$end = "Thanks for sharing";
$data = "I've been visiting your blog for a while now and I always find a gem in your new posts. Thanks for sharing.";

function myFunc($start, $end, $data) {
$equiv = array("iI"); // strings where each character is "equivalent" to every other
$equiv = array_map("preg_quote", $equiv);

$start = preg_quote($start); $end = preg_quote($end);
foreach ($equiv as $chars) {
	$start = preg_replace("/[$chars]/u", "[$chars]", $start);
	$end = preg_replace("/[$chars]/u", "[$chars]", $end);
}

// $start = "[iI]'ve"; $end = "Thanks for shar[iI]ng";
$count = preg_match_all("/$start(.+?)$end/isu", $data, $matches);
print_r($matches);
}

myFunc($start, $end, $data);

The idea is that you replace each of those characters in the $start and $end strings with PCRE character sets.

 

Note the /u flag I added. That's for UTF-8 mode. Since smart quotes aren't part of normal ASCII they have to be represented in another character set somehow. While you could use ISO 8859-1 (Latin1) or one of a thousand others, PCRE only provides support UTF-8...

I found this, what do you think of it:

 

public function convert_smart_quotes($string){
$search = array(chr(145),chr(146),chr(147),chr(148),chr(151),chr(150),chr (133),chr(149));
$replace = array("'","'",'"','"','--','-','...',"•");
return str_replace($search, $replace, $string);
}

That will work if the $string is in the right text encoding; if not then you could be replacing the wrong characters. If it's not... hmm, Latin1 is that? then you'll need to convert the string to it.

mb_convert_encoding($string, "ISO-8859-1")

But now you're altering the data to suit your code. I'm of the mindset that it's better to leave user input as-is and deal with it without making modifications.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.