Jump to content

open html http:// file and retrieve <title>


TravisT

Recommended Posts

Hello, I am new to the forum and somewhat new to php, nice to meet you all. This is the first time I have ever really scripted with PHP so I'm still learning about all the tools I have and what I have to call things in PHP.

 

I have a list of urls and as I loop through each one, I'd like to be able to get information from the webpage. The <title> would be a good start. I also want to know the best way for me to compare data I have.

 

I'll show the basic code below, but I successfully go through each url in this text file. I put it in a <ul><li> list just fine. So if $url == http://www.youtube.com/file how is the normal way to check and see if the word "youtube" is in $url?

 

I found preg_match() but I think I'm approaching the whole thing wrong because I get no output. I am an intermediate to somewhat advanced scripter in other languages similar to php, I just need to learn how you do the normal things in PHP.

 

So I'd like to compare a string "youtube" to a variable '$url'. And I would like to be able to grab the title or other info from the file $url. Here is what I have so far. (Recent research showed me how I should do this with an XML file so I will probably change the .txt to .xml) Can you please tell me what to look for as I have been searching and can't really find a comprehensive answer.

 

I changed the whole page to an echo trying to fix something last night. Before it was written like..

 

<?php
if ($true) {
$var = value
?>
<html code>The value is <?php $var ?> .</html code>
<?php
}
?>

 

/index.php

<?php
include 'include/header.html';


echo "<div id='wrapper'>
	<div id='left'>
		<div class='article'>
<br />
<p>";
echo "Today is " . date("l") . ", the " . date("jS") . " of " . date("F") . ".";
$lines = file('data/news.txt');
if ($lines){
foreach ($lines as $line_num => $line) {

$url = htmlspecialchars($line);

//Now I have url. I want to check the url and get the <title> & misc. data.
//if youtube is in $url {html code to embed youtube};
//my attempt was $x = file($url); but I got a lot of 404 and 403 errors.

//now I fill html.
echo "<ul id='menu1' class='auroramenu'>
<li><a href='#'>Story ".$line_num."</a> <a style='display: none;' class='aurorashow' href='#'></a> <a style='display: inline;' class='aurorahide' href='#'></a>
		<ul> <br />
			<p>".$url."</p><br />
  				<li style='text-align:right;'><a href='".$url."' target='_blank'>Read the story.</a>  </li> 
		</ul> 
	</li>
</ul>";
}
		}
		echo "</p><br />
		</div>
	</div>
<div id='right'>";
include 'include/sidebar.html';
echo "</div><br class='clr' /></div><br />";
include 'include/footer.html';
?>

 

Thank you for your help.

 

Link to comment
https://forums.phpfreaks.com/topic/236411-open-html-http-file-and-retrieve/
Share on other sites

Here's a way I came up with. If anyone has better or faster methods tan this I'd love to hear it.

 

I parse the url to find the host, then match against that, you could easily be finding the word youtube or youtube.com in any part of a url.

Example would be:

http://mysite.com/out.php?url=http://www.youtube.com/movies

 

Stripping the protocol, exploding the / , using $variable[0], and then preg_match also works.

 

If you want fast displaying results on a page in whatever order look into multi-curl.

This is the simple method and should find most titles but not all.

 

<?php

//check if youtube function
function checkYoutube($inserturl) {
$inserturl = strtolower(trim($inserturl));
if(substr($inserturl,0,5) != "http:"){
$inserturl = "http://$inserturl";
}
$parsedUrl = parse_url($inserturl);
$host = trim($parsedUrl['host'] ? $parsedUrl['host'] : array_shift(explode('/', $parsedUrl['path'], 2)));
                
$checkhost = "youtube.com";
    // match
    if(preg_match("/$checkhost/i", $inserturl)){
     return TRUE; 
     } else {
     return FALSE;
     }
}

//read a file
$my_file = "urls.txt";//change file name to yours
if (file_exists($my_file)) {
$data = file($my_file);
$total = count($data);
echo "<br />Total urls: $total<br />";
foreach ($data as $line) {
if($line != "" && checkYoutube($line) == TRUE){
$url = trim($line);
//making sure any url has the http protocol
if(substr($url,0,5) != "http:"){
$url = "http://$url";
}

//using curl is better for more options, setting the timeout matters for speed versus accuracy
$context = stream_context_create(array(
    'http' => array(
        'timeout' => 8
    )
));
//get the content from url
$the_contents = @file_get_contents($url, 0, $context);
//alive or dead condition
if (empty($the_contents)) {
$status = "dead";
$color = "#FF0000";
$title = $url;
} else {
$status = "alive";
$color = "#00FF00";
preg_match("/<title>(.*)<\/title>/Umis", $the_contents, $title); 
$title = $title[1];
//$title = htmlspecialchars($title, ENT_QUOTES); //saving data to database

}

//show results on page
echo "<a style='font-size: 20px; background-color: #000000; color: $color;' href='$url' TARGET='_blank'>$title</a><br />";
}
}
} else {
echo "Can't locate $my_file";
}
?>

I made a slight error as I wasn't checking just the host area but the entire url.

 

I made the changes here.

 

For anyone wanting to use this just make a text file named urls.txt in the same folder of this script.

Place the urls 1 per line.

 

<?php

//check if youtube function
function checkYoutube($inserturl) {
$inserturl = strtolower(trim($inserturl));
if(substr($inserturl,0,5) != "http:"){
$inserturl = "http://$inserturl";
}
$parsedUrl = parse_url($inserturl);
$host = trim($parsedUrl['host'] ? $parsedUrl['host'] : array_shift(explode('/', $parsedUrl['path'], 2)));
                
$checkhost = "youtube.com";
    // match
    if(preg_match("/$checkhost/i", $host)){
     return TRUE; 
     } else {
     return FALSE;
     }
}

//read a file
$my_file = "urls.txt";//change file name to yours
if (file_exists($my_file)) {
$data = file($my_file);
$total = count($data);
echo "<br />Total urls: $total<br />";
foreach ($data as $line) {
if($line != "" && checkYoutube($line) == TRUE){
$url = trim($line);
//making sure any url has the http protocol
if(substr($url,0,5) != "http:"){
$url = "http://$url";
}

//using curl is better for more options, setting the timeout matters for speed versus accuracy
$context = stream_context_create(array(
    'http' => array(
        'timeout' => 8
    )
));
//get the content from url
$the_contents = @file_get_contents($url, 0, $context);
//alive or dead condition
if (empty($the_contents)) {
$status = "dead";
$color = "#FF0000";
$title = $url;
} else {
$status = "alive";
$color = "#00FF00";
preg_match("/<title>(.*)<\/title>/Umis", $the_contents, $title); 
$title = $title[1];
//$title = htmlspecialchars($title, ENT_QUOTES); //saving data to database

}

//show results on page
echo "<a style='font-size: 20px; background-color: #000000; color: $color;' href='$url' TARGET='_blank'>$title</a><br />";
}
}
} else {
echo "Can't locate $my_file";
}
?>

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.