Jump to content

How extract some data using regex?


fapapfap

Recommended Posts

 

So I would like to scrape a page to make a list of URLs.  The source has lots of URLs in the following format:

 

<a href="http://www.site.com/page.php?action=p&id=99">TEXT</a>, <a href="http://www.site.com/page.php?action=p&id=97">TEXT</a>, <a href="http://www.site.com/page.php?action=p&id=98">TEXT</a>            and carries on in exactly the same format.  Can someone help me with the code I need to scrape the page and extract just the links into a list using regex?

Link to comment
https://forums.phpfreaks.com/topic/253924-how-extract-some-data-using-regex/
Share on other sites

Seems to work:

<?php

$pattern = '~<a href="http:\/\/www\.site\.com\/page\.php\?action=p\&id=[0-9]+">[a-zA-Z]+<\/a>~i';

$text = 'So I would like to scrape a page to make a list of URLs.  The source has lots of URLs in the following format:

<a href="http://www.site.com/page.php?action=p&id=99">TEXT</a>, <a href="http://www.site.com/page.php?action=p&id=97">TEXT</a>, <a href="http://www.site.com/page.php?action=p&id=98">TEXT</a>            and carries on in exactly the same format.  Can someone help me with the code I need to scrape the page and extract just the links into a list using regex?';

if (preg_match_all($pattern, $text, $matches)) {
echo '<pre>' . print_r($matches) . '</pre>';
}

Seems to work:

<?php

$pattern = '~<a href="http:\/\/www\.site\.com\/page\.php\?action=p\&id=[0-9]+">[a-zA-Z]+<\/a>~i';

$text = 'So I would like to scrape a page to make a list of URLs.  The source has lots of URLs in the following format:

<a href="http://www.site.com/page.php?action=p&id=99">TEXT</a>, <a href="http://www.site.com/page.php?action=p&id=97">TEXT</a>, <a href="http://www.site.com/page.php?action=p&id=98">TEXT</a>            and carries on in exactly the same format.  Can someone help me with the code I need to scrape the page and extract just the links into a list using regex?';

if (preg_match_all($pattern, $text, $matches)) {
echo '<pre>' . print_r($matches) . '</pre>';
}

 

Thank you so much, perfect!

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.