Jump to content

[SOLVED] Partially replace a string with data from the same string...


Recommended Posts

Hi there, long time lurker first time poster :)

 

I am teaching myself PHP and I am trying to learn about regular expressions, and preg_replace and that sort of thing but I'm having some trouble figuring out what to do.

 

I'm trying to write a script to help import some old HTML files as blog post for my website, these HTML files are 300-2000 lines long. I am currently using fgets() to read the files one line at a time, clean them and write them to the database in the appropriate format once the entire article is completed. My issue is many of the links included on these post no longer exist (some date into the '90s) and I want as part of my importing function to "fix" these links as so.

 

Here is an example line I might get:

<p>12. Then use a <a href="http://www.deadsite.com/randompage.html">coping saw</a> to cut the initial grooves onto the mark you just made.</p>

 

What I would like to do is parse each line and replace it as so:

 

<p>12. Then use a <a href="http://www.google.com/search?q=coping+saw">coping saw</a> to cut the initial grooves onto the mark you just made.</p>

 

Basically I am taking the contents of the link (the part the user can read) and placing it as a search query link to say, google or wikipedia. There is no rhyme or reason really to how these are formatted other than the standard <a href> tag, and sometimes there is more than one link a line, sometimes none.

 

The way I have this written in my draft, it's a two line function, of course that's my imagination. Any ideas? I would greatly appreciate it :)

the str_replace function should work, replacing all occurences of the first param with the second

using regex, can become more complicated (preg_replace).

 

$input = str_replace("www.old.com", "www.new.com", $input);

 

cheers,

tdw

the str_replace function should work, replacing all occurences of the first param with the second

using regex, can become more complicated (preg_replace).

 

$input = str_replace("www.old.com", "www.new.com", $input);

 

cheers,

tdw

 

I could use this to replace the content once I have it separated, but I also need a method of extracting the link text and original link such as this:

Random Text <a href="http://randomsiteblah.com/rand4308/">Link Text 1</a> more random text.
Random Characters <a href="http://anotherdomain.com/448random.html">Link Text 2</a> more random text.

 

to

 

Random Text <a href="http://google.com/search?=First+Link">First Link</a> more random text.
Random Characters <a href="http://google.com/search?=Another+Link">Another Link</a> more random text.

 

Each one is different, but they all follow a standard (after <a href=" and before "> and then after the previous statement but before the next /a>

This will do exactly as you requested. However, I have not done any extensive testing.

<?php

function replaceLinks ($string) {

  preg_match_all("|<a[^>]+>(.*)</a>|U", $string, $links, PREG_SET_ORDER);

  foreach ($links as $link) {

    $oldLink = $link[0];
    $linkText = $link[1];

    $searchParams = str_replace(' ', '+', $linkText);
    $newLink = '<a href="http://www.google.com/search?q='.$searchParams.'">'.$linkText.'</a>';

  }
  return str_replace($oldLink, $newLink, $string);
}

$string = '<p>12. Then use a <a href="http://www.deadsite.com/randompage.html">coping saw</a> to cut the initial grooves onto the mark you just made.</p>';

$string = replaceLinks ($string);

echo $string;
//Output:
//
//<p>12. Then use a <a href="http://www.google.com/search?q=coping+saw">coping saw</a> to cut the initial grooves onto the mark you just made.</p>

?>

 

You could add some functionality to the function to strip out the actual link from the href param to test if the link is currently valid or not and only replace it if it is not.

 

EDIT: Modified function. The return was out of place and was exiting after the first replacement if there were multiple links!

Ah that is exactly what I was trying to do, now I see where my error was (in the syntax). Thank you so much for your help :)

 

Also, very good idea about checking the links first! I will do that!

 

Good catch on the multiple links, I will run this around for a bit in my script and post here how my results came out :)

Ok, for some reason this breaks it...

 

<p>Sample Here: <a href="http://www.cookies.com/index.html">Sample 1</a> <a href="http://www.cookies.com/index.html">Sample 2</a> (Mirror)</p>

 

Returns

 

<p>Sample Here: <a href="http://www.cookies.com/index.html">Sample 1</a> | <a href="http://www.google.com/search?q=Low+Quality">Sample 2</a> (Mirror)</p>

 

Throughout the files I try to process it seems to work and then not work sporadically...

 

Here is my stripped down debug script:

 

<?

$pointer = @fopen("processme.html", "r");
if ($pointer) {
    while (!feof($pointer)) {
        $theLine = fgets($pointer, 4096);
        $theLine = replaceLinks($theLine); //keep apart for debugging
        echo $theLine;
    }
    fclose($pointer);
}



function replaceLinks ($string) {

  preg_match_all("|<a[^>]+>(.*)</a>|U", $string, $links, PREG_SET_ORDER);

  foreach ($links as $link) {

    $oldLink = $link[0];
    $linkText = $link[1];

    $searchParams = str_replace(' ', '+', $linkText);
    $newLink = '<a href="http://www.google.com/search?q='.$searchParams.'">'.$linkText.'</a>';

  }
  return str_replace($oldLink, $newLink, $string);
}

?> 

OK, The str_replace() and the return need to be separated. Corrected function below. Also added trim() to the link text when creating the query parameters.

<?php

function replaceLinks ($string) {

  preg_match_all("|<a[^>]+>(.*)</a>|U", $string, $links, PREG_SET_ORDER);

  foreach ($links as $link) {

    $oldLink = $link[0];
    $linkText = $link[1];
    $searchParams = str_replace(' ', '+', trim($linkText));
    $newLink = '<a href="http://www.google.com/search?q='.$searchParams.'">'.$linkText.'</a>';
    $string = str_replace($oldLink, $newLink, $string);

  }
  return $string;
}

?>

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.