Jump to content

variable in a function in another function called from a preg_replace_callback


gerkintrigg

Recommended Posts

I want to get absolute URLs for my screen-scraper app that will get HTML code and render it in the browser with a few changes to spelling etc. To make sure it grabs CSS and images properly I have played about with different ways of getting absolute URLs

 

I ended up with the following code:

<?php
$url='http://www.google.com/one_level_up/2_levels_up/3levels_up/';
$path = '../../intl/en/images/logo.gif'; #this should display
$path2 = '../../../intl/en/images/logo.gif'; # this should not
$page='<img src="'.$path.'"> Don\'t Display This: <img src="'.$path2.'">';
#------------------------- Function Below --------------------------------
function get_src($tmp_url,$path){
$tmp_url = rtrim($tmp_url, '/');
$path = ltrim($path, '\\');
if(($num_of_them = substr_count($path, '../')) > 0) {
    $tmp_url = preg_replace("#(/[a-z0-9-]+){{$num_of_them}}$#iD", '', $tmp_url);
    $path = $tmp_url . '/'. str_replace('../', '', $path);
}
else{
$path=str_replace($path,($tmp_url.'/'.$path),$path);
}
    return $path;
}
?>Display This:
<?php 

#-------------------------- get the right URLs ------------------------------
function real_links($matches){
return 'src="'.get_src('http://www.google.com/one/two/',$matches[1]).'"';
        # you'll note that the Google url needs to be defined, rather than a variable... why is that?
}
$page=preg_replace_callback('~src="(.*?)"~','real_links',$page);
echo $page;
?>

 

You'll note that the Google url needs to be defined, rather than a variable... why is that? How can I replace it with the $url variable at the top of the code, without causing an error?

 

So you have 2 paths - the url (which is dynamic):

 

http://www.google.com/one_level_up/2_levels_up/3levels_up/

or

http://www.google.com/one_level_up/2_levels_up/

or

http://www.google.com/one_level_up/

 

and the path (which is also dynamic):

 

../../intl/en/images/logo.gif

 

 

So what your lookig for is the full image url ?? eg..

 

http://www.google.com/one_level_up/2_levels_up/intl/en/images/logo.gif

In my first example I showed how I could get the correct URLs from any web page.

By replacing the $url variable with $_POST['url'], this can easily be changed to react to user input, but it's the fact that the (currently static) variable in the callback function doesn't like to be made a variable that's causing all my problems.

 

while the web page url can be defined from a form post, the callback needs to be defined in the code itself. I'm not sure whether the syntax is wrong or I am just doing something that's not strictly allowed.

 

To clarify, this line:

return 'src="'.get_src('http://www.google.com/one/two/',$matches[1]).'"';

will not work if it read like this:

return 'src="'.get_src($tmp_url,$matches[1]).'"';

To clarify, this line:

return 'src="'.get_src('http://www.google.com/one/two/',$matches[1]).'"';

will not work if it read like this:

return 'src="'.get_src($tmp_url,$matches[1]).'"';

 

That is because $tmp_url is not defined within is not defined within real_links(). Considering its a callback and can only have specific arguments passed to it you may need to use the global keyword to have it see the $tmp_url variable.

Thanks Thorpe.

 

That looks like a handy tip. I tried that out and read through the literature for the global command but the following code doesn't remove the folder options like the get_src() function should and doesn't act quite like the hard-coded version:

 

<?php
$url='http://www.google.com/one_level_up/2_levels_up/3levels_up/';
$my_url=$url;
$path = '../../intl/en/images/logo.gif'; #this should display
$path2 = '../../../intl/en/images/logo.gif'; # this should not
$page='<img src="'.$path.'"> Don\'t Display This: <img src="'.$path2.'">';
#------------------------- Function Below --------------------------------
function get_src($tmp_url,$path){
$tmp_url = rtrim($tmp_url, '/');
$path = ltrim($path, '\\');
if(($num_of_them = substr_count($path, '../')) > 0) {
    $tmp_url = preg_replace("#(/[a-z0-9-]+){{$num_of_them}}$#iD", '', $tmp_url);
    $path = $tmp_url . '/'. str_replace('../', '', $path);
}
else{
$path=str_replace($path,($tmp_url.'/'.$path),$path);
}
    return $path;
}
?>Display This:
<?php 

#-------------------------- get the right URLs ------------------------------
function real_links($matches){
global $my_url;
return 'src="'.get_src($my_url,$matches[1]).'"';
}
$page=preg_replace_callback('~src="(.*?)"~','real_links',$page);
echo $page;
?>

sorry, I edited that last post because I thought it was confusing too...

 

it outputs this code:

Display This:
<img src="http://www.google.com/one_level_up/2_levels_up/3levels_up/intl/en/images/logo.gif"> Don't Display This: <img src="http://www.google.com/one_level_up/2_levels_up/3levels_up/intl/en/images/logo.gif">

 

The $my_url variable doesn't perform the operations to it that the get_src() function is supposed to be performing (and DOES perform when it's hard-coded)...

 

If it's not a global, it returns:

Display This:
<img src="/intl/en/images/logo.gif"> Don't Display This: <img src="/intl/en/images/logo.gif">

which is not right either...

 

I think the global is handy. Can i define the get_src() function as global too?

Okay, so I still don't know how to get the variable to work, but I removed the need for it by putting the content of the get_src() function inside the callback like this:

#-------------------------- get the right URLs ------------------------------
function real_links($matches){
global $my_url;
#------------ function --------------

$tmp_url = rtrim($my_url, '/');
$path = ltrim($matches[1], '\\');
if(($num_of_them = substr_count($path, '../')) > 0) {
    $tmp_url = preg_replace("#(/[a-z0-9-]+){{$num_of_them}}$#iD", '', $tmp_url);
    $path = $tmp_url . '/'. str_replace('../', '', $path);
}
else{
$path=str_replace($path,($tmp_url.'/'.$path),$path);
}
#--------- end function ---------
return 'src="'.$path.'"';
}
$page=preg_replace_callback('~src="(.*?)"~','real_links',$page);
echo $page;

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.