Regex, why double array?

Ninjakreborn · July 11, 2010

<?php

$url = file_get_contents('http://search.yahoo.com/search;_ylt=A0geu5jCGDhIk6gAr6pXNyoA?p=link:' . $domain);

$preg = preg_match_all("=<span class\=\"url\">(.*)</span>=siU", $url, $results);

?>

This simply goes and get's everything from a string where <span class="url">whatever</span>

it grabs everything in the span tags. This works great..but why is it doubling the array?

It returns $results as $array[0] and $array[1] with 0 and 1 both having a duplicate..so all of 0 has the text I want

but the same text is in 1..why did it create it that way?

More trying to figure this out so I can understand Regular Expressions better.

wildteen88 · July 11, 2010

If you look in the source you'll see, you'll have two distinctive arrays,

$array[0] will contain both the html and urls, eg

Array
(
    [0] => Array
        (
            [0] => <span class="url">http://some url here</span>
             .... rest of matches
        )
)

Where as $array[1] will just contains the urls (what you're searching for (.*)), eg

Array
(
    [1] => Array
        (
            [0] => http://some url here
             .... rest of matches
        )
)

Hope that helps to understand what preg_match_all is doing

Ninjakreborn · July 11, 2010

Very much so, thank you very much.

Oh and Wildteen, I have not seen you in forever. It's not to see you around here again.

Hope things are doing well for you.

Thanks again for the help. I am really trying to jump into this Regex stuff, finally starting to write some of my

own. After all this time I finally find out it's very helpful with some of the stuff I have had to do.

Thanks again.

Ninjakreborn · July 11, 2010

<?php
$temp = preg_match_all("\bInlinks\b \(([:digit:])\)", $url, $test);
?>

OK I can't get this to work. I want to take a look at a giant string and return 1 instance of this

"Inlinks (123)" The 123 could be replaced by whatever number of how many results.

I am having a strange error like

Warning: preg_match_all() [function.preg-match-all]: Delimiter must not be alphanumeric or backslash in /users1/functions.php on line 51

I tried the above regular expression also without the \b's around the text because I thought you could directly match using a word or something.

So when the Regex finds the words Inlinks followed by a set of numbers in parenthesis I want to grab those numbers. My logic flow for this was

match the word Inlinks followed by () (which is why I escaped the first set) then get any numeric characters inside that. But instead I get this

error. Any advice would be apreciated.

wildteen88 · July 11, 2010

You forgot the delimiters (they define the start/end of your regex pattern).

I tend to use ~ as the delimiters. You'll want to wrap [:digit:] within square brackets too, otherwise you'll receive an error. Also another change you'll need to add is the 1 or more quantifier (+ symbol) after [:digit:] to match one or more numbers, otherwise your pattern will only match a single digit and not multiple.

Fixed code

$temp = preg_match_all("~\bInlinks\b \(([[:digit:]+])\)~", $url, $test);

Ninjakreborn · July 11, 2010

OK I appreciate that. I made some notes about the delimiters and other things you said. I tried it and it wasn't working, so I cut it out into a test case

and it still wasn't working then reduced the pattern a little to try and make it more specific and it still wasn't working. Here is the current pattern and test case

<?php

$temp = preg_match_all("~\(([[:digit:]+])\)~", 'test whatever (1234) whatever test', $test);
echo '<pre>';
print_r($test);
echo '</pre>';
?>

I basically want to grab the numbers between (). So in this ist should just be start delimiter and then escape the ( and then inside that grab

all the digits and then unescape the other ) and close delimeter. Any advice is appreciated. This test case returns an empty array. I will expand upon it myself to make sure

the Inlinks is in front of it after I get the test case working but there must be something I am doing wrong here.

Thanks again.

wildteen88 · July 11, 2010

Woops my bad. I meant to say add the + operator after [[:digit:]], not [:digit:]. I forgot the extra []

Ninjakreborn · July 11, 2010

Oh ok. I got it adjusted and now it's even working with the Inlinks part as well.

Thanks again, I appreciate it.

Sign In

Regex, why double array?

Recommended Posts

Ninjakreborn

Link to comment

Share on other sites

wildteen88

Link to comment

Share on other sites

Ninjakreborn

Link to comment

Share on other sites

Ninjakreborn

Link to comment

Share on other sites

wildteen88

Link to comment

Share on other sites

Ninjakreborn

Link to comment

Share on other sites

wildteen88

Link to comment

Share on other sites

Ninjakreborn

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information