Jump to content

Please help with my extraction


EchoFool

Recommended Posts

Hey,

 

 

I have a script that is meant to grab links of a specific type.

 

 

<?php



//need to some how extract all links from this text 
$link = '[quote]http://www.domain.com/?d=03WO6WPC random text, http://www.domain.com/?d=0334fWPChttp://www.domain.com/?d=03WV4SPC[/quote]';
?>

 

 

The urls are always the same character length, but there is no telling how the user types them out so im looking for a way to extract the 3 urls based on domain.com.

 

 

The end result should be to create an array of 3 urls which belong to domain.com .

 

 

 

 

I tried explode but some one may type it without spaces so explode fails.. any ideas?

Link to comment
https://forums.phpfreaks.com/topic/249666-please-help-with-my-extraction/
Share on other sites

<?php
$link = '[quote]http://www.domain.com/?d=03WO6WPC random text, http://www.domain.com/?d=0334fWPChttp://www.domain.com/?d=03WV4SPC[/quote]';

$new = explode('domain.com', $link);

$new1 = 'http://www.domain.com'.$new[1].'<br>';
$new2 = 'http://www.domain.com'.$new[2].'<br>';
$new3 = 'http://www.domain.com'.$new[3].'<br>';


echo substr($new1, 0, 33),'<br>';
echo substr($new2, 0, 33),'<br>';
echo substr($new3, 0,33);
?>

Hi thank you for the reply,

 

quick question, if i want to then edit the $new1 2 and 3 and then re-import it back to the original string how would i do that?

 

My intentions is to extract the domain links..check if they are valid then add a <img src="tick.jpg"> in font of the link  and put each link on its own line.

 

Thus resulting in :

 

<?php
$newlinks = '[quote]
<img src="tick.jpg"/> http://www.domain.com/?d=03WO6WPC
random text, 
<img src="tick.jpg"/> http://www.domain.com/?d=0334fWPC
<img src="tick.jpg"/> http://www.domain.com/?d=03WV4SPC
[/quote]';
?>

 

 

What would you suggest ?

 

I know of no way to check if a url exists with PHP. fopen should work, it should return false if url does not exist, but I had no luck with it. Try it yourself and see if it works for you.

 

As for putting img in front, NP.

 

<?php
$link = '[quote]http://www.domain.com/?d=03WO6WPC random text, http://www.domain.com/?d=0334fWPChttp://www.domain.com/?d=03WV4SPC[/quote]';

$new = explode('domain.com', $link);

$new1 = '<img src="tick.jpg">http://www.domain.com'.$new[1];
$new2 = '<img src="tick.jpg">http://www.domain.com'.$new[2];
$new3 = '<img src="tick.jpg">http://www.domain.com'.$new[3];

$new_again = $new[0].$new1.$new2.$new3;

echo 'To test if this works - see below.<br><br>';
echo $new_again;
?>

No, 'domain.com' will not be part of the output array since we're exploding with it being the boundary string. Just echo out the array to see.

 

What I worry about is your question

My intentions is to extract the domain links..check if they are valid
I hope by 'valid' you mean 'does the site exist'  If it is a question about existence then you should have no problem with the code.

I will interject here, try the following...

 

<?PHP

  //## Link String
  $link = '[quote]http://www.domain.com/?d=03WO6WPC random text, http://www.domain.com/?d=0334fWPChttp://www.domain.com/?d=03WV4SPC[/quote]';

  //## Explode string to get urls, also set the $urlArray array
  $linkArray = explode('domain.com/',$link);
  $urlArray  = array();

  //## Allows us to check each link individually
  foreach($linkArray AS $url) {
    //## If the $url variable contains the "?d=" [GET query variable] we process it
    if(strstr($url,'?d=')) {
      $checkURL = 'http://www.domain.com/?d='.substr($url,3,;

      //## Fetch the URL headers
      $urlHeaders = @get_headers($checkURL);

      //## If all is okay the URL exists, if not then it doesn't
      if(in_array('HTTP/1.1 200 OK', $urlHeaders)) {
        $urlArray[] = array('URL'=>$checkURL, 'EXISTS'=>' -> Does Exist');
      } else {
        $urlArray[] = array('URL'=>$checkURL, 'EXISTS'=>' -> Does Not Exist');
      }
    }
  }

  //## Print out the URLs and status, do whatever with these results
  echo '<pre>';
  print_r($urlArray);
  echo '</pre>';

?>

 

Regards, PaulRyan.

@Paul,

  I think you will run into the same problem using 'get_headers' as I did with 'fopen', some sites that do not exist take you to a different web site (mostly asking if you want to buy that domain name) and you get a positive hit on a non existent site. I know of no way to differentiate between the two.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.