Jump to content

Parsing Text File


gigantorTRON

Recommended Posts

Hello,

I'm working with a text file that contains the names of websites and their corresponding URLs in the following fashion:

<a href="www.cnn.com">CNN News</a>

Each site and title is separated by line breaks. I'm not very experienced with fopen, fread, etc. and was wondering if anyone could give me pointers on how to go about reading the text file line by line and saving the URL and title in separate columns in a database.

 

Oh, and one more thing. The categories of these sites are also included every so often. I was hoping to save the category in a third field. Ex:

 

News (\n new line here)
<a href="www.cnn.com">CNN News</a> (\n)
<a href="www.bbc.com">BBC News </a>

etc.

 

Thanks!

Link to comment
Share on other sites

Here's your pointer: file().

Then you could do this (read comments):

 

<?php
$array = file('filename.txt');
foreach ($array as $key => $value) {
$url = preg_replace("/.*href=\"(.*)\">.*/", "$1", $value, 1); //retrieve URLs
$title = preg_replace("/.*\">(.*)<\/a>.*/", "$1", $value, 1); //retrieve titles
mysql_query("INSERT INTO `table` (`url`, `title`) VALUES ('$url', '$title')"); //insert into database
}
?>

 

Dunno about your last question. Didn't test the code, hope it works.

 

EDIT: Just tested without the query part, had a small error in the last preg_replace; it's corrected now.

Link to comment
Share on other sites

And, to remove the extra space added to the titles/URLs (caused by line breaks some how), add an s to the first preg_replace parameter:

 

<?php
$array = file('filename.txt');
if ($array) {
foreach ($array as $key => $value) {
	$url = preg_replace("/.*href=\"(.*)\">.*/s", "$1", $value, 1); //retrieve URLs
	$title = preg_replace("/.*\">(.*)<\/a>.*/s", "$1", $value, 1); //retrieve titles
	mysql_query("INSERT INTO `table` (`url`, `title`) VALUES ('$url', '$title')"); //insert into database
}
}
?>

 

I also added a check for the file before retrieving the strings.

Link to comment
Share on other sites

I won't stop, will I? ;)

 

Made sure that the code I posted won't insert wrong stuff into the database when the category lines are passed:

 

<?php
$array = file('filename.txt');
if ($array) {
foreach ($array as $key => $value) {
	if (strpos($value, "</a>") === false) {continue;} //skip the category lines
	$url = preg_replace("/.*href=\"(.*)\">.*/s", "$1", $value, 1); //retrieve URLs
	$title = preg_replace("/.*\">(.*)<\/a>.*/s", "$1", $value, 1); //retrieve titles
	mysql_query("INSERT INTO `table` (`url`, `title`) VALUES ('$url', '$title')"); //insert into database
}
}
?>

Link to comment
Share on other sites

slight mod to the above code to get the categories

 

<?php
$array = file('filename.txt');
if ($array) {
foreach ($array as $key => $value) {
	if (strpos($value, "</a>") === false) {
                     $cat = trim ($value);
                }
                else { 
	$url = preg_replace("/.*href=\"(.*)\">.*/s", "$1", $value, 1); //retrieve URLs
	$title = preg_replace("/.*\">(.*)<\/a>.*/s", "$1", $value, 1); //retrieve titles
	mysql_query("INSERT INTO `table` (`cat`,`url`, `title`) VALUES ('$cat', '$url', '$title')"); //insert into database
               }
}
}
?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.