Jump to content

variable in a function in another function called from a preg_replace_callback


gerkintrigg

Recommended Posts

I want to get absolute URLs for my screen-scraper app that will get HTML code and render it in the browser with a few changes to spelling etc. To make sure it grabs CSS and images properly I have played about with different ways of getting absolute URLs

 

I ended up with the following code:

<?php
$url='http://www.google.com/one_level_up/2_levels_up/3levels_up/';
$path = '../../intl/en/images/logo.gif'; #this should display
$path2 = '../../../intl/en/images/logo.gif'; # this should not
$page='<img src="'.$path.'"> Don\'t Display This: <img src="'.$path2.'">';
#------------------------- Function Below --------------------------------
function get_src($tmp_url,$path){
$tmp_url = rtrim($tmp_url, '/');
$path = ltrim($path, '\\');
if(($num_of_them = substr_count($path, '../')) > 0) {
    $tmp_url = preg_replace("#(/[a-z0-9-]+){{$num_of_them}}$#iD", '', $tmp_url);
    $path = $tmp_url . '/'. str_replace('../', '', $path);
}
else{
$path=str_replace($path,($tmp_url.'/'.$path),$path);
}
    return $path;
}
?>Display This:
<?php 

#-------------------------- get the right URLs ------------------------------
function real_links($matches){
return 'src="'.get_src('http://www.google.com/one/two/',$matches[1]).'"';
        # you'll note that the Google url needs to be defined, rather than a variable... why is that?
}
$page=preg_replace_callback('~src="(.*?)"~','real_links',$page);
echo $page;
?>

 

You'll note that the Google url needs to be defined, rather than a variable... why is that? How can I replace it with the $url variable at the top of the code, without causing an error?

 

Link to comment
Share on other sites

So you have 2 paths - the url (which is dynamic):

 

http://www.google.com/one_level_up/2_levels_up/3levels_up/

or

http://www.google.com/one_level_up/2_levels_up/

or

http://www.google.com/one_level_up/

 

and the path (which is also dynamic):

 

../../intl/en/images/logo.gif

 

 

So what your lookig for is the full image url ?? eg..

 

http://www.google.com/one_level_up/2_levels_up/intl/en/images/logo.gif

Link to comment
Share on other sites

In my first example I showed how I could get the correct URLs from any web page.

By replacing the $url variable with $_POST['url'], this can easily be changed to react to user input, but it's the fact that the (currently static) variable in the callback function doesn't like to be made a variable that's causing all my problems.

 

while the web page url can be defined from a form post, the callback needs to be defined in the code itself. I'm not sure whether the syntax is wrong or I am just doing something that's not strictly allowed.

 

To clarify, this line:

return 'src="'.get_src('http://www.google.com/one/two/',$matches[1]).'"';

will not work if it read like this:

return 'src="'.get_src($tmp_url,$matches[1]).'"';

Link to comment
Share on other sites

To clarify, this line:

return 'src="'.get_src('http://www.google.com/one/two/',$matches[1]).'"';

will not work if it read like this:

return 'src="'.get_src($tmp_url,$matches[1]).'"';

 

That is because $tmp_url is not defined within is not defined within real_links(). Considering its a callback and can only have specific arguments passed to it you may need to use the global keyword to have it see the $tmp_url variable.

Link to comment
Share on other sites

Thanks Thorpe.

 

That looks like a handy tip. I tried that out and read through the literature for the global command but the following code doesn't remove the folder options like the get_src() function should and doesn't act quite like the hard-coded version:

 

<?php
$url='http://www.google.com/one_level_up/2_levels_up/3levels_up/';
$my_url=$url;
$path = '../../intl/en/images/logo.gif'; #this should display
$path2 = '../../../intl/en/images/logo.gif'; # this should not
$page='<img src="'.$path.'"> Don\'t Display This: <img src="'.$path2.'">';
#------------------------- Function Below --------------------------------
function get_src($tmp_url,$path){
$tmp_url = rtrim($tmp_url, '/');
$path = ltrim($path, '\\');
if(($num_of_them = substr_count($path, '../')) > 0) {
    $tmp_url = preg_replace("#(/[a-z0-9-]+){{$num_of_them}}$#iD", '', $tmp_url);
    $path = $tmp_url . '/'. str_replace('../', '', $path);
}
else{
$path=str_replace($path,($tmp_url.'/'.$path),$path);
}
    return $path;
}
?>Display This:
<?php 

#-------------------------- get the right URLs ------------------------------
function real_links($matches){
global $my_url;
return 'src="'.get_src($my_url,$matches[1]).'"';
}
$page=preg_replace_callback('~src="(.*?)"~','real_links',$page);
echo $page;
?>

Link to comment
Share on other sites

sorry, I edited that last post because I thought it was confusing too...

 

it outputs this code:

Display This:
<img src="http://www.google.com/one_level_up/2_levels_up/3levels_up/intl/en/images/logo.gif"> Don't Display This: <img src="http://www.google.com/one_level_up/2_levels_up/3levels_up/intl/en/images/logo.gif">

 

The $my_url variable doesn't perform the operations to it that the get_src() function is supposed to be performing (and DOES perform when it's hard-coded)...

 

If it's not a global, it returns:

Display This:
<img src="/intl/en/images/logo.gif"> Don't Display This: <img src="/intl/en/images/logo.gif">

which is not right either...

 

I think the global is handy. Can i define the get_src() function as global too?

Link to comment
Share on other sites

Okay, so I still don't know how to get the variable to work, but I removed the need for it by putting the content of the get_src() function inside the callback like this:

#-------------------------- get the right URLs ------------------------------
function real_links($matches){
global $my_url;
#------------ function --------------

$tmp_url = rtrim($my_url, '/');
$path = ltrim($matches[1], '\\');
if(($num_of_them = substr_count($path, '../')) > 0) {
    $tmp_url = preg_replace("#(/[a-z0-9-]+){{$num_of_them}}$#iD", '', $tmp_url);
    $path = $tmp_url . '/'. str_replace('../', '', $path);
}
else{
$path=str_replace($path,($tmp_url.'/'.$path),$path);
}
#--------- end function ---------
return 'src="'.$path.'"';
}
$page=preg_replace_callback('~src="(.*?)"~','real_links',$page);
echo $page;

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.