thebadbad

Members

View Profile See their activity

Posts
1,613
Joined
October 13, 2007
Last visited
October 30, 2015

Content Type

All Activity

Profiles

Forums

Topics
Posts

Everything posted by thebadbad

Prev
1
2
3
4
5
6
7
8
9
Next
Page 4 of 65

regex: replace links captions by url

thebadbad replied to gdfhghjdfghgfhf's topic in Regex Help

I would probably parse each URL with parse_url(), for more reliable results: <?php $str = '<a href="http://www.mediafire.com/something">http://www.mediafire.com/something</a> text text text text text text text text text text text <a href="http://www.somelink.com"></a> text text text text text text text <a href="http://megaupload.com/something/blabla">Some link without WWW</a> text text text text text text text text text text text text text text <a title="somename" href="http://www.rapidshare.com/download" style="color:#000000" target="_blank">some confusing link</a> ...........more text here.......... <a href="http://www.microsoft.com"></a> ....more text........... <a href="http://www.4shared.com/download.php?file=myfile"><img src="a_link_with_an_image.gif"></a>'; function _callback($matches) { $domains = array('mediafire.com', 'megaupload.com', 'rapidshare.com', '4shared.com'); $domain = parse_url($matches[2], PHP_URL_HOST); //remove any sub domains $parts = array_reverse(explode('.', $domain)); $domain = "{$parts[1]}.{$parts[0]}"; if (in_array($domain, $domains)) { $matches[0] = "[DL]{$matches[0]}[/DL]"; } return $matches[0]; } $str = preg_replace_callback('~<a\b[^>]+\bhref\s?=\s?([\'"])(.+?)\1[^>]*>.*?</a>~is', '_callback', $str); echo $str; ?> If you add a domain with a double TLD (e.g. .co.uk) to the $domains array, you would have to rewrite the code.
- November 12, 2009
- 9 replies
looking for help making links complete paths

thebadbad replied to slushpuppie's topic in Regex Help

If you're looking for a more robust way of translating relative paths to absolute paths, there's a function at http://w-shadow.com/blog/2007/07/16/how-to-extract-all-urls-from-a-page-using-php/. A way to use it: <?php function relative2absolute($absolute, $relative) { $p = @parse_url($relative); if(!$p) { //$relative is a seriously malformed URL return false; } if(isset($p["scheme"])) return $relative; $parts=(parse_url($absolute)); if(substr($relative,0,1)=='/') { $cparts = (explode("/", $relative)); array_shift($cparts); } else { if(isset($parts['path'])){ $aparts=explode('/',$parts['path']); array_pop($aparts); $aparts=array_filter($aparts); } else { $aparts=array(); } $rparts = (explode("/", $relative)); $cparts = array_merge($aparts, $rparts); foreach($cparts as $i => $part) { if($part == '.') { unset($cparts[$i]); } else if($part == '..') { unset($cparts[$i]); unset($cparts[$i-1]); } } } $path = implode("/", $cparts); $url = ''; if($parts['scheme']) { $url = "$parts[scheme]://"; } if(isset($parts['user'])) { $url .= $parts['user']; if(isset($parts['pass'])) { $url .= ":".$parts['pass']; } $url .= "@"; } if(isset($parts['host'])) { $url .= $parts['host']."/"; } $url .= $path; return $url; } $raw = preg_replace_callback( '~\b(href|src)\s?=\s?([\'"])(.+?)\2~is', create_function( '$matches', 'return $matches[1] . \'=\' . $matches[2] . relative2absolute(\'http://www.domain.com/\', $matches[3]) . $matches[2];' ), $raw ); ?>
- November 10, 2009
- 5 replies
Filtering values in a multidimensional array

thebadbad replied to Wechr's topic in PHP Coding Help

Small alteration of thorpe's code: <?php $urls = array(); foreach ($array as $arr) { if ($arr['name'] == 'indie') { $urls[] = $arr['url']; } } //see contents of the $urls array echo '<pre>' . print_r($urls, true) . '</pre>'; ?>
- November 9, 2009
- 5 replies
Parsing out inventory numbers

thebadbad replied to blommer's topic in PHP Coding Help

If we first split the string at the Vegetables header, we can then grab each vegetable and amount with a regular expression and then do whatever we want with the data: <?php $html = <<<HTML <HTML> <HEAD> <TITLE>Inventory</TITLE> </HEAD> <BODY> <H2>Inventory</H2> for Monday, December 5, 2009 <A NAME="I1"> Fruits <A NAME="F1"> Apples 10 Pears 5 <A HREF="index.html">Return to home...</A> <HR> Vegetables <A NAME="V1"> Corn Cobs 3 <A NAME="S5795_3">Lettuce Heads 10 <A NAME="S5795_5">Potatoes 3 <A HREF="index.html">Return to home...</A> <HR> </BODY> </HTML> HTML; list(, $html) = explode('Vegetables ', $html, 2); preg_match_all('~([^<]+) \s*([0-9]+) ~i', $html, $matches, PREG_SET_ORDER); //print structured data echo '<table>'; foreach ($matches as $match) { echo "\n\t<tr><td>{$match[1]}</td><td>{$match[2]}</td></tr>"; } echo "\n</table>"; ?>
- November 8, 2009
- 2 replies
grabbing NFL scores

thebadbad replied to chiefrokka's topic in PHP Coding Help

I did see your PM, but the problem is that the script doesn't work for this season, and that I haven't had the time to fix it/rewrite it yet. But I may come around doing it at some point.
- November 8, 2009
- 42 replies
[SOLVED] Field - Subtract First Word

thebadbad replied to EternalSorrow's topic in PHP Coding Help

Bad idea, since it removes punctuation.
- November 8, 2009
- 16 replies
Declaring a prefix as a string variable with ' and ", and ~ as delimiter..? Help

thebadbad replied to physaux's topic in Regex Help

Where did your delimiters go in the site4, site5 and site6 prefixes? Your code will fail to work when you mess up the delimiters. To include both single and double quotes in a string, you either have to escape the one used as string delimiter (not to be confused with regex delimiter) or e.g. use the heredoc syntax: $str = 'resultTitle\' id=\'infopei\'><a href="'; $str = "resultTitle' id='infopei'><a href=\""; or $str = <<<HTML resultTitle' id='infopei'><a href=" HTML; And I would incorporate the use of preg_quote() instead of your approach, to separate literal text from the regex pattern: <?php define('REGEX', '([^\s]*?)'); //quite important to make the quantifier lazy in your case, to end the match at the first occurrence of the suffixes define('DELIMITER', '~'); define('MODIFIERS', 'i'); $parts = array( array(' <a href="', '">'), array('<a href="', '">'), array(' <a href="', ''), array('NONE', ''), array('<a href=', '>'), array('<h2 class=r><a class=l href="', '">') ); //test with first prefix and suffix $pattern = DELIMITER . preg_quote($parts[0][0], DELIMITER) . REGEX . preg_quote($parts[0][1], DELIMITER) . DELIMITER . MODIFIERS; preg_match($pattern, $data, $match); echo $match[1]; ?>
- November 6, 2009
- 2 replies
[SOLVED] preg_match logical error

thebadbad replied to plznty's topic in PHP Coding Help

Because there's a line break between the two divs. Try to add \s* between them, and you should also make your quantifier lazy by adding a question mark after .* (stopping the match at the first encountered </div> character sequence, not the last).
- November 6, 2009
- 2 replies
imagecreatefrompng breaks if theres a space in the url

thebadbad replied to patawic's topic in PHP Coding Help

Just run $name through rawurlencode() (as it seems their system doesn't like plus chars). Else, str_replace(' ', '%20', $name) should work fine.
- November 5, 2009
- 8 replies
My preg_match isn't working..

thebadbad replied to physaux's topic in Regex Help

That would be the source code you're retrieving. If you 'trust' the URL you're grabbing, you could simply do <?php //$data holds the source code of the remote page preg_match('~ <a href="([^"]*)">~i', $data, $match); echo $match[1]; ?> Assuming the prefix and suffix actually match with the source code. Else if you want to keep your URL pattern, try this, using a modified version of the pattern you provided (it had some errors/opportunities for improvement): <?php preg_match('~ <a href="(https?://[a-z0-9]+(?:[-.][a-z0-9]+)*\.[a-z]{2,6}(?::[0-9]{1,5})?(?:/.*?)?)">~is', $data, $match); echo $match[1]; ?> @salathe You forgot to add the delimiter as the second parameter to preg_quote().
- November 5, 2009
- 5 replies
[SOLVED] removing html tags except image

thebadbad replied to mraza's topic in Regex Help

Sorry, forgot the rest of the expression And I just realized that there's no need to run strip_tags() with the second parameter before translating the tags in question to BBCode. Updated code: <?php $content = '<div> This is an image <img src="http://image.info/200910/186336.jpg" border="0" alt="" /> </div>'; $replace = array( '~<img\b[^>]+\bsrc\s?=\s?([\'"])(.*?)\1[^>]*>~is' => '[img=$2]', '~<b\b[^>]*>(.*?)~is' => '[b]$1[/b]' ); $content = preg_replace(array_keys($replace), $replace, $content); $content = strip_tags($content); ?>
- November 4, 2009
- 11 replies
[SOLVED] removing html tags except image

thebadbad replied to mraza's topic in Regex Help

< and > function as pattern delimiters in your pattern <img>, thus only the literal img are replaced. Probably doesn't make sense to you, but here's how you could do it: <?php $content = '<div> This is an image <img src="http://image.info/200910/186336.jpg" border="0" alt="" /> </div>'; $content = strip_tags($content, '<img>'); $content = preg_replace('~<img\b[^>]+\bsrc\s?=\s?([\'"])(.*?)\1~is', '[img=$2]', $content); ?> Just ask if you need something explained, and I'm sure a kind soul (if not me ) will help you understand.
- November 3, 2009
- 11 replies
[SOLVED] Rearranging txt

thebadbad replied to newbtophp's topic in PHP Coding Help

If the syntax is strictly as in your sample (i.e. with no single quotes in the random text), you could also use a single preg_match_all() call: <?php $source = file_get_contents('filename.txt'); preg_match_all('~\'([^\']+)\'~', $source, $matches); $data = implode('', $matches[1]); ?>
- November 2, 2009
- 8 replies
[SOLVED] Find only certain URLs from page ... regex (semi-complete script)

thebadbad replied to doa24uk's topic in Regex Help

Simplest solution is to run the array through array_unique().
- October 21, 2009
- 19 replies
How do I automate checking google for changes to search results?

thebadbad replied to Neomech's topic in PHP Coding Help

Google doesn't allow automated searches. But apart from that, you could scrape the result pages (may be a good idea to set the user agent string before you load the pages with either file_get_contents() or cURL), putting page links into an array and store it, then repeat some other day, and compare the arrays with some of the array functions (or MySQL functions if you store the information in a database). The hardest part would be to grab the result links from the source code. But should be doable with the proper regular expression, or maybe with PHP DOM (but I doubt it looking at Google's source code). E.g. I first thought that the result link anchors had the exlusive class l (lowercase L), but also book search and news results have those.
- October 21, 2009
- 1 reply
[SOLVED] Find only certain URLs from page ... regex (semi-complete script)

thebadbad replied to doa24uk's topic in Regex Help

Here's an idea: <?php $string = 'http://site1.com/file.php http://site5.com/file.php http://www.site2.com/file.php'; //grab every URL preg_match_all('~https?://[^" ]+~i', $string, $matches); //filter out the domains not on our whitelist function _callback($url) { $whitelist = array( 'site1.com', 'www.site1.com', 'site2.com', 'www.site2.com', 'site3.com', 'www.site3.com' ); return in_array(parse_url($url, PHP_URL_HOST), $whitelist); } $urls = array_filter($matches[0], '_callback'); echo '<pre>' . print_r($urls, true) . '</pre>'; ?> But there's a few problems here. Firstly the regular expression isn't perfect (mainly because it's also supposed to grab 'plain' URLs not part of a HTML tag with delimiting quotes), and secondly the whitelist currently must contain all variants of the URLs, i.e. including subdomains. But I'm sure you can find a function to return the pure domain (it's a bit tricky because you have to take into account 'double TLDs' like .co.uk). If you don't need to extract 'plain' URLs (see above) from the page, but only URLs from href (and possibly src) attributes, you can use this safer regular expression instead: '~\b(?:href|src)\s?=\s?([\'"])(.+?)\1~is' and then feed $matches[2] to array_filter().
- October 21, 2009
- 19 replies
preg_replace() help

thebadbad replied to Anman's topic in PHP Coding Help

If you want to run the function and use its output in the replacement, you would have to use preg_replace_callback() e.g.: <?php $message = preg_replace_callback( '#\[code=(.*?)\](.*?)\[/code\]#i', create_function( '$matches', 'return \'<div class="codeblock">\' . geshify($matches[1], $matches[2]) . \'</div>\';' ), $message ); ?>
- October 20, 2009
- 1 reply
[SOLVED] Simple Scraper... Weird Output

thebadbad replied to phoenixx's topic in Regex Help

Addition: Forgot to grab the titles. Although my for loop isn't that elegant. <?php $page = 1; $data = array(); while (true) { $html = file_get_contents('http://www.mytinyphone.com/ringtones/classical/?page_ring=' . $page++); $match_count = preg_match_all('~href="/ringtone/([0-9]+)/"><img[^>]*>(.*?)</a>~is', $html, $matches); if ($match_count > 0) { for ($i = 0; $i < $match_count; $i++) { $data[] = array($matches[1][$i], $matches[2][$i]); } } else { //page doesn't exist break; } } echo '<pre>' . print_r($data, true) . '</pre>'; ?>
- October 20, 2009
- 2 replies
[SOLVED] Simple Scraper... Weird Output

thebadbad replied to phoenixx's topic in Regex Help

A way of doing it: <?php $page = 1; $ids = array(); while (true) { $html = file_get_contents('http://www.mytinyphone.com/ringtones/classical/?page_ring=' . $page++); $match_count = preg_match_all('~href="/ringtone/([0-9]+)/~i', $html, $matches); if ($match_count > 0) { $ids = array_merge($ids, $matches[1]); } else { //page doesn't exist break; } } echo '<pre>' . print_r($ids, true) . '</pre>'; ?> Will load all pages though, and thus probably time out, but should work with the appropriate settings (assuming the website in question doesn't cut you off due to too many requests). I'm not too sure about this, but maybe it could be optimized by loading all the page sources into a single string first, and then run a single preg_match_all() on the huge string. Don't know if it'll be more efficient.
- October 20, 2009
- 2 replies
Need to write a cookie with a url var

thebadbad replied to leptoon's topic in PHP Coding Help

Simple. $_GET['ID'] will contain the id, and janusmccarthy already pointed you to the manual page for the setcookie() function.
- October 20, 2009
- 4 replies
Help with lottery style system?

thebadbad replied to N1CK3RS0N's topic in PHP Coding Help

Or alternatively <?php $balls = range(1, 36); shuffle($balls); echo implode(' ', array_slice($balls, 0, 3)); //or simply access the random numbers via $balls[0], $balls[1] and $balls[2] ?>
- October 19, 2009
- 13 replies
[SOLVED] matching numbers inside ( )

thebadbad replied to Michdd's topic in Regex Help

That's a capital o in his sample, not a zero. Easy mistake to make though
- October 19, 2009
- 11 replies
[SOLVED] matching numbers inside ( )

thebadbad replied to Michdd's topic in Regex Help

I would probably go with preg_replace_callback() (with nested preg_replace() calls): <?php $str = '(S2O3)-2'; $str = preg_replace_callback( '~$[^)]*$~', create_function( '$matches', 'return preg_replace(\'~[0-9]+~\', \'$0\', $matches[0]);' ), $str ); echo $str; ?>
- October 19, 2009
- 11 replies
Displaying message based on referral

thebadbad replied to acctman's topic in PHP Coding Help

The simple way: <?php if (isset($_SERVER['HTTP_REFERER'])) { $info = parse_url($_SERVER['HTTP_REFERER']); if (isset($info['host'])) { echo 'Welcome ' . htmlentities($info['host']) . ' Members.'; } else { //invalid referrer } } else { //referrer not set } ?>
- October 19, 2009
- 3 replies
[SOLVED] Remove only line breaks between [code] [/code]

thebadbad replied to Garrett's topic in Regex Help

Sure It's always good to point things out, even if they're slightly off topic. People might actually learn something!
- October 19, 2009
- 8 replies

Prev
1
2
3
4
5
6
7
8
9
Next
Page 4 of 65

Sign In

thebadbad

Posts

Joined

Last visited

Content Type

Profiles

Forums

Everything posted by thebadbad

regex: replace links captions by url

looking for help making links complete paths

Filtering values in a multidimensional array

Parsing out inventory numbers

grabbing NFL scores

[SOLVED] Field - Subtract First Word

Declaring a prefix as a string variable with ' and ", and ~ as delimiter..? Help

[SOLVED] preg_match logical error

imagecreatefrompng breaks if theres a space in the url

My preg_match isn't working..

[SOLVED] removing html tags except image

[SOLVED] removing html tags except image

[SOLVED] Rearranging txt

[SOLVED] Find only certain URLs from page ... regex (semi-complete script)

How do I automate checking google for changes to search results?

[SOLVED] Find only certain URLs from page ... regex (semi-complete script)

preg_replace() help

[SOLVED] Simple Scraper... Weird Output

[SOLVED] Simple Scraper... Weird Output

Need to write a cookie with a url var

Help with lottery style system?

[SOLVED] matching numbers inside ( )

[SOLVED] matching numbers inside ( )

Displaying message based on referral

[SOLVED] Remove only line breaks between [code] [/code]

Browse

Activity

Important Information