Im taking over development for a friends site that was all html in the past, he has about 1000 pages of html news and doesnt want to loose anything in the transfer to php, obviously i dont want to spend a few days/weeks going through each seperate html file copy and pasting bits from old news articles so i was thinking about using something along the lines of preg_replace, ive used it a couple of times for things like bb code because the only things that seem to stay constant in the files are the tags around things like "posted by" and "date" etc so i was thinking something like this: (the source would be copied and pasted into a form then this would process it)

$output = file("$_POST['input']");
$output = explode("|", $output);

$find = array(

$replace = array(
"<a href=\"\\1\">\\1</a>",
"<a href=\"\\1\">\\2</a>"


obviously this is just a quick example of a way to process BB code, my question is - is there anyway to save the value of the "(.*?)" 's to a csv file or a txt file so i can load it into a db onces theyre all processed?

thanks in advance.

Yes, by using preg_match_all:

[a href=\"http://www.php.net/preg_match_all\" target=\"_blank\"]http://www.php.net/preg_match_all[/a]

Then just play with the matches array.

