Preg match song information

sphinx · October 13, 2011

Hello,

I'm using:

<?php
$page_contents = file_get_contents("http://zixtyycraft.com/radio/song.php");
$matches = array();
preg_match('/Currently Playing:/', $page_contents, $matches);
echo $matches[0];
?>

To try and echo the song onto another website, but all that is displaying is: Currently Playing:

I'm attempting to get data from:

http://zixtyycraft.com/radio/song.php

Many thanks

requinix · October 13, 2011

Load the HTML into a DOMDocument and grab the contents of the first (and only) tag.

sphinx · October 13, 2011

Hi there,

Sorry I'm unsure how to apply this because I generally set it up to look for certain attributes, ie: numbers.

<?php
$page_contents = file_get_contents("http://zixtyycraft.com/radio/song.php");
$matches = array();
preg_match('/<b></b>/', $page_contents, $matches);
echo $matches[0];
?>

Basically i want the contents between the tags

Many thanks for your time.

requinix · October 13, 2011

DOM

$dom = new DOMDocument();
$dom->loadHTMLFile("http://zixtyycraft.com/radio/song.php");

$b = $dom->getElementsByTagName("B")->item(0);
// $b is a DOMNode...

.josh · October 14, 2011

DOM solution is ideal for scraping html.

But to address your problem with the regex, issue is you haven't told it to match anything except for the literal string "Currently Playing:" or "". You need to use things like wildcards and quantifiers etc.. to create a pattern. For example, if you want to grab everything within .. tag:

preg_match('~<b>(.*?)</b>~i',$page_contents, $matches);

So ~(.*?)~i is the pattern. Overall the goal here is to use the and as anchors, basically a way to tell the regex engine where in the string to look for something. Then we have (.*?) which will match for the stuff between those tags.

~ This is the pattern delimiter. All patterns must be wrapped in a delimiter, because preg_match has optional modifiers you can put within the first argument string. I included a modifier in this pattern so you can see (the "i" at the end). In your code you used / which is fine except if you need to use that character as part of your pattern, you will need to escape it, and closing html tags use /. So if you are making a regex to scrape html, it makes for cleaner patterns to pick some other delimiter.

 Match for literal string "". This is to tell the engine where you want to start matching

( Start of group to capture. Basically when you wrap part of your pattern in parenthesis, you are telling the engine to put what it matches in an additional, separate element in the returned $matches array.

. This is a wildcard. It means to match one of any single character (except newline chars unless you tell it to w/ a modifier)

* This is a quantifier. It says to match 0 or more of any of the previous character or group. So together .* means to match 0 or more of any characters

? This means to make the .* a lazy match. By default quantifiers are greedy. This means that they will match everything they can possibly match in the string and then start giving stuff back in order to satisfy the rest of the pattern. This isn't ideal a lot of times. Consider the string "foobar". If you have ~(.*)~i and your intention is to match stuff between the "b" tag, this will actually match everything up to the last instance of  : "foobar". So ? tells the quantifier not to be greedy, to only match one character at a time until it finds the first instance of the rest of the pattern. So ~(.*?)~i will match "foobar".

) End of group to capture.

 Match for literal string "". This is to tell the engine where to stop matching.

~ Ending pattern delimiter.

i A pattern modifier. This tells the regex engine to do a case-insensitive match.

So an example:

$page_contents = file_get_contents("http://zixtyycraft.com/radio/song.php");
preg_match('~<b>(.*?)</b>~i',$page_contents, $matches);
print_r($matches);

This will print out the following:

Array
(
    [0] => <b>The Chemical Brothers - Life Is Sweet (Daft Punk Remix)</b>
    [1] => The Chemical Brothers - Life Is Sweet (Daft Punk Remix)
)

$matches[0] contains the full matched pattern, everything between the ~ pattern delimiters.

$matches[1] contains everything in the first captured group, everything between the parenthesis (the (.*?))

Sign In

Preg match song information

Recommended Posts

sphinx

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

sphinx

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

.josh

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information