Jump to content

Recommended Posts

Ok so here is my preg match so far, but I am getting an error that says something to do with a delimiter.

preg_match('/somedomain.com/name-meaning/(^)">/',$pagedata,$matches);
print_r($matches);

 

I really have no clue what I am doing. Could anyone please fix that regex expression for me please (and tell me what is wrong)? Thanks!

The / in the middle should be a \/ because it is a special character (delimiter).  Are you trying to return the XXXX into a variable?

 

First: preg_match will only return up to once.

 

Preg_match_all will do a better job of multiple searches.

 

I'm no regex wiz but I think if you change the / to a \/ you should get the matches you want.

 

Oh yeah and replace (^) with .+

 

so you probably want:

 

preg_match_all('/somedomain.com\/name-meaning\/.+">/',$pagedata,$matches);

OK thanks, that one is working now. Now I am trying a second part, which I will now (try to) describe below.

 

second part

second part

second part

 

need a new regex. Please ignore previous posts, they were for the first regex. I now have a second one that is causing me problems. This boldness is just to prevent confusion :)

 

second part

second part

second part

 

So here is the raw text that this script will be chewing through:

<a href="/browse/letter/a?page=2">2</a>  
<a href="/browse/letter/a?page=3">3</a>  
<a href="/browse/letter/a?page=4">4</a>  
<a href="/browse/letter/a?page=5">5</a> 
<a href="/browse/letter/a?page=6">6</a>  
<a href="/browse/letter/a?page=7">7</a>  
<a href="/browse/letter/a?page=8">8</a>  
<a href="/browse/letter/a?page=9">9</a> 
<a href="/browse/letter/a?page=10">10</a>  
<a href="/browse/letter/a?page=11">11</a>  
<a href="/browse/letter/a?page=12">12</a>  
<a href="/browse/letter/a?page=13">13</a>  
<a href="/browse/letter/a?page=14">14</a>  
<a href="/browse/letter/a?page=15">15</a>

(there are no newline characters, I added them just so that the code above does not get squished into a single line)

 

I want to extract how many pages there are. So I would want the result array to be 2,3,4,...,15

 

So from my understanding, I am looking for something that starts with browse/letter/a?page= and ends with "

 

..right?

 

 

I now tried to change my delimiter to "~", here is what I have so far:

 

$regex = "~browse~/letter~/$letter?page=(.*)\"~Us";
echo $regex."\n\n";
preg_match($regex,$page,$matches);
print_r($matches);

 

But I am getting an error

Unknown modifier '/' 

 

Thanks for the help before! How about this one?

$ is still a special character.

 

I got:

preg_match_all("/browse\/letter\/a\?page=[0-9]+/",$page,$match);

to match and assemble an array such that:

 

Array ( [0] => Array ( [0] => browse/letter/a?page=2 [1] => browse/letter/a?page=3 [2] => browse/letter/a?page=4 [3] => browse/letter/a?page=5 [4] => browse/letter/a?page=6 [5] => browse/letter/a?page=7 [6] => browse/letter/a?page=8 [7] => browse/letter/a?page=9 [8] => browse/letter/a?page=10 [9] => browse/letter/a?page=11 [10] => browse/letter/a?page=12 [11] => browse/letter/a?page=13 [12] => browse/letter/a?page=14 [13] => browse/letter/a?page=15 ) )

 

I'm trying to figure out how to assemble the array of numbers.

 

Changing the delimiter to a character other than forward slash is always a good idea when working with paths, due to the constant need to keep escaping forward slashes otherwise. The reason the OPs code didn't work was because you replaced all forward slashes with a ~, all you needed to do was replace the first and last (hence delimiters) and completely remove the backslashes that were escaping the forward slashes, leaving...

 

$regex = "~browse/letter/$letter?page=(.*)\"~Us";

 

As andrewgauger has pointed $ is a meta character BUT, I believe in this instance that would be irrelevant. I assume by the use of $letter you wish a character which is stored in a variable to be inserted there. Since this is a double quoted string, that dollar sign will have been evaluated out of the string before the PCRE engine receives the pattern. To capture the numbers all you need to do is add a capture group around the number part, so combining andrewgaugers pattern with the OPs....

 

$regex = "~browse/letter/$letter\?page=([0-9]+)\"~";

Thanks for the corrections, and the very detailed explanation. It is no longer telling me about an error, but it is still not printing out the data that I wanted. Just to be safe, I printed out the contents of $page on the screen, as well as printed out the contents of $regex, as well as the resulting array.

 

RELEVANT $page DATA COPIED FROM "view source" of the output of my code(This is all on one line. I only added new lines to make it easier for you to see):

<p> 
1  
<a href="/browse/letter/a?page=2">2</a>  
<a href="/browse/letter/a?page=3">3</a>  
<a href="/browse/letter/a?page=4">4</a> 
<a href="/browse/letter/a?page=5">5</a>  
<a href="/browse/letter/a?page=6">6</a>  
<a href="/browse/letter/a?page=7">7</a>  
<a href="/browse/letter/a?page=8">8</a>  
<a href="/browse/letter/a?page=9">9</a>  
<a href="/browse/letter/a?page=10">10</a>  
<a href="/browse/letter/a?page=11">11</a>  
<a href="/browse/letter/a?page=12">12</a>  
<a href="/browse/letter/a?page=13">13</a>  
<a href="/browse/letter/a?page=14">14</a>  
<a href="/browse/letter/a?page=15">15</a>  
<a href="/browse/letter/a?page=2">next»</a> 
</p>

 

Here is the printed out $regex

~browse/letter/a\?page=([0-9]+)"~

 

And here is the printed out $matches result

Array
(
    [0] => browse/letter/a?page=2"
    [1] => 2
)

 

And once again, here is all my code:

echo $page;
echo "AFTERPAGE\n\n\n<br/><br/>\n\n";
//$regex = "~browse~/letter~/".$letter."?page=.*\">(.*)<~/a>~Us";
$regex = "~browse/letter/$letter\?page=([0-9]+)\"~";
echo $regex."\n\n";
preg_match($regex,$page,$matches);
print_r($matches);

 

-Oh and if it matters, $page is gotten by using 'curl'. But I'm sure that it works fine because of the outputed values, they are the same as when I view the URL i'm scraping using 'curl'

 

Soo, does anyone still see a problem? I want the resulting array to contain 2,3,4,...,14,15. But it's not!

To capture the numbers all you need to do is add a capture group around the number part, so combining andrewgaugers pattern with the OPs....

 

$regex = "~browse/letter/$letter\?page=([0-9]+)\"~";

 

Thanks, use () to designated capture group--got it!

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.