Jump to content

can't find items with apostrophe in them.


The Little Guy

Recommended Posts

I have the following code in a function ($start/$end are passed in parameters):

<?php
$start = preg_quote($start, "/");
$end = preg_quote($end, "/");
$good = (bool)preg_match_all($search = "/$start(.+)$end/isU", $data, $matches);
print_r($matches);
?>

 

$data comes from the output of CURL

 

when I use my function like so:

<?php
myFunc("I've", "Thanks for sharing.");

// Within the data:
// I’ve been visiting your blog for a while now and I always find a gem in your new posts.  Thanks for sharing.
?>

 

My print_r() comes back with an empty array, and I don't understand why? I can find all kinds of things, but when I search for something containing an apostrophe it doesn't work, even though it does exist... What should I do?

Link to comment
Share on other sites

Those two apostrophes are different: one is straight, one is smart. That's the only problem I see.

Got a few options open to you: could alter the data, could make a more complex expression to allow for character variants...

 

By the way, if you're simply checking for whether one string comes before another, strpos is a better choice than regular expressions.

Link to comment
Share on other sites

The easy way out would be to build a set of characters equivalent to something. For example, a character set for apostrophes.

 

If you wanted to say that "i" and "I" were equivalent, you could do something like this:

$start = "I've";
$end = "Thanks for sharing";
$data = "I've been visiting your blog for a while now and I always find a gem in your new posts. Thanks for sharing.";

function myFunc($start, $end, $data) {
$equiv = array("iI"); // strings where each character is "equivalent" to every other
$equiv = array_map("preg_quote", $equiv);

$start = preg_quote($start); $end = preg_quote($end);
foreach ($equiv as $chars) {
	$start = preg_replace("/[$chars]/u", "[$chars]", $start);
	$end = preg_replace("/[$chars]/u", "[$chars]", $end);
}

// $start = "[iI]'ve"; $end = "Thanks for shar[iI]ng";
$count = preg_match_all("/$start(.+?)$end/isu", $data, $matches);
print_r($matches);
}

myFunc($start, $end, $data);

The idea is that you replace each of those characters in the $start and $end strings with PCRE character sets.

 

Note the /u flag I added. That's for UTF-8 mode. Since smart quotes aren't part of normal ASCII they have to be represented in another character set somehow. While you could use ISO 8859-1 (Latin1) or one of a thousand others, PCRE only provides support UTF-8...

Link to comment
Share on other sites

I found this, what do you think of it:

 

public function convert_smart_quotes($string){
$search = array(chr(145),chr(146),chr(147),chr(148),chr(151),chr(150),chr (133),chr(149));
$replace = array("'","'",'"','"','--','-','...',"•");
return str_replace($search, $replace, $string);
}

Link to comment
Share on other sites

That will work if the $string is in the right text encoding; if not then you could be replacing the wrong characters. If it's not... hmm, Latin1 is that? then you'll need to convert the string to it.

mb_convert_encoding($string, "ISO-8859-1")

But now you're altering the data to suit your code. I'm of the mindset that it's better to leave user input as-is and deal with it without making modifications.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.