sphinx Posted October 13, 2011 Share Posted October 13, 2011 Hello, I'm using: <?php $page_contents = file_get_contents("http://zixtyycraft.com/radio/song.php"); $matches = array(); preg_match('/Currently Playing:/', $page_contents, $matches); echo $matches[0]; ?> To try and echo the song onto another website, but all that is displaying is: Currently Playing: I'm attempting to get data from: http://zixtyycraft.com/radio/song.php Many thanks Quote Link to comment Share on other sites More sharing options...
requinix Posted October 13, 2011 Share Posted October 13, 2011 Load the HTML into a DOMDocument and grab the contents of the first (and only) tag. Quote Link to comment Share on other sites More sharing options...
sphinx Posted October 13, 2011 Author Share Posted October 13, 2011 Hi there, Sorry I'm unsure how to apply this because I generally set it up to look for certain attributes, ie: numbers. <?php $page_contents = file_get_contents("http://zixtyycraft.com/radio/song.php"); $matches = array(); preg_match('/<b></b>/', $page_contents, $matches); echo $matches[0]; ?> Basically i want the contents between the <b> tags Many thanks for your time. Quote Link to comment Share on other sites More sharing options...
requinix Posted October 13, 2011 Share Posted October 13, 2011 DOM $dom = new DOMDocument(); $dom->loadHTMLFile("http://zixtyycraft.com/radio/song.php"); $b = $dom->getElementsByTagName("B")->item(0); // $b is a DOMNode... Quote Link to comment Share on other sites More sharing options...
.josh Posted October 14, 2011 Share Posted October 14, 2011 DOM solution is ideal for scraping html. But to address your problem with the regex, issue is you haven't told it to match anything except for the literal string "Currently Playing:" or "<b></b>". You need to use things like wildcards and quantifiers etc.. to create a pattern. For example, if you want to grab everything within <b>..</b> tag: preg_match('~<b>(.*?)</b>~i',$page_contents, $matches); So ~<b>(.*?)</b>~i is the pattern. Overall the goal here is to use the <b> and </b> as anchors, basically a way to tell the regex engine where in the string to look for something. Then we have (.*?) which will match for the stuff between those tags. ~ This is the pattern delimiter. All patterns must be wrapped in a delimiter, because preg_match has optional modifiers you can put within the first argument string. I included a modifier in this pattern so you can see (the "i" at the end). In your code you used / which is fine except if you need to use that character as part of your pattern, you will need to escape it, and closing html tags use /. So if you are making a regex to scrape html, it makes for cleaner patterns to pick some other delimiter. <b> Match for literal string "<b>". This is to tell the engine where you want to start matching ( Start of group to capture. Basically when you wrap part of your pattern in parenthesis, you are telling the engine to put what it matches in an additional, separate element in the returned $matches array. . This is a wildcard. It means to match one of any single character (except newline chars unless you tell it to w/ a modifier) * This is a quantifier. It says to match 0 or more of any of the previous character or group. So together .* means to match 0 or more of any characters ? This means to make the .* a lazy match. By default quantifiers are greedy. This means that they will match everything they can possibly match in the string and then start giving stuff back in order to satisfy the rest of the pattern. This isn't ideal a lot of times. Consider the string "<b>foo</b><b>bar</b>". If you have ~<b>(.*)</b>~i and your intention is to match stuff between the "b" tag, this will actually match everything up to the last instance of </b> : "<b>foo</b><b>bar</b>". So ? tells the quantifier not to be greedy, to only match one character at a time until it finds the first instance of the rest of the pattern. So ~<b>(.*?)</b>~i will match "<b>foo</b><b>bar</b>". ) End of group to capture. </b> Match for literal string "</b>". This is to tell the engine where to stop matching. ~ Ending pattern delimiter. i A pattern modifier. This tells the regex engine to do a case-insensitive match. So an example: $page_contents = file_get_contents("http://zixtyycraft.com/radio/song.php"); preg_match('~<b>(.*?)</b>~i',$page_contents, $matches); print_r($matches); This will print out the following: Array ( [0] => <b>The Chemical Brothers - Life Is Sweet (Daft Punk Remix)</b> [1] => The Chemical Brothers - Life Is Sweet (Daft Punk Remix) ) $matches[0] contains the full matched pattern, everything between the ~ pattern delimiters. $matches[1] contains everything in the first captured group, everything between the parenthesis (the (.*?)) Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.