bschultz Posted July 29, 2011 Share Posted July 29, 2011 I have a bunch of pages (recipes) that were hard coded a few years ago in straight html. Now, I put every NEW recipe into a database. I'd like to take all of these raw html files...and insert the data from them into the db. - The date is on line 15 of the html file (May 29, 2007<br>) - - - need to convert this to a mysql date format and remove the <br /> - I need to remove everything (not insert into db) lines 1-14 - The html files have <p> instead of <br />...I'd like to replace the <p> with <br /> - I need to ignore the last three lines (have closing </body> and </html> lines and some other pure html stuff What is this the best way to do this? Quote Link to comment https://forums.phpfreaks.com/topic/243145-convert-raw-html-to-mysql/ Share on other sites More sharing options...
Ninjakreborn Posted July 29, 2011 Share Posted July 29, 2011 1. Crawl the HTML files, and save them in a file locally. 2. Use this: http://stackoverflow.com/questions/215896/how-to-use-php-to-delete-x-number-of-lines-from-the-beginning-of-a-text-file To strip out X number of lines from the file (in your case, 14) from the beginning. 3. Use standard regex for the string replaces (or str_replace). Then just strip off the last 3 lines (reverse engineer some of what you found in #2, or just str_replace over </body> and </html> with an empty space. Quote Link to comment https://forums.phpfreaks.com/topic/243145-convert-raw-html-to-mysql/#findComment-1248864 Share on other sites More sharing options...
bschultz Posted August 2, 2011 Author Share Posted August 2, 2011 Alright, I've figured out the code to do what I need it to...but as I've been looking through the static html pages, the code I want to start with doesn't always start on line 15. The line of code I want to start with is always the date (August 01, 2011 formatted)...how can I remove everything before the date when I don't know what line the date is on? Quote Link to comment https://forums.phpfreaks.com/topic/243145-convert-raw-html-to-mysql/#findComment-1250498 Share on other sites More sharing options...
bschultz Posted August 3, 2011 Author Share Posted August 3, 2011 Still plugging away at this. I've decide to try to search the array for the date (since the htm page is named by the date, I should be able to find it's value in the array). Here's the latest code: <?php foreach (glob("/home/briansch/public_html/testing/*.htm") as $filename) { //get all filenames that have a .htm extension $file = $filename; $find2[] = '.htm'; //remove the extension of the file to start the process of searching the html file for the date $replace2[] = ''; $text2 = str_replace($find2, $replace2, $filename); $new_date_line2 = $text2; //now we have the name of the file (which is a date)...with the directory path listed before it $find3[] = '/home/briansch/public_html/testing/'; // remove the directory path from the name of the file $replace3[] = ''; $text3 = str_replace($find3, $replace3, $new_date_line2); $new_date_line4 = $text3; //now we have the date of the file $mysql_date_format2 = date("F j, Y", strtotime($new_date_line4)); /// convert the file name (which is in m-d-y format) //echo $mysql_date_format3; $lines = file($filename); // put the lines of the file into an array $count = count($lines); // used below to remove the last three lines from the array //$key = array_search($mysql_date_format2, $lines); // this didn't do anything $findme[] = $mysql_date_format2; // tried this with and without the []...no difference...nothing echoed below $key = array_search( $findme, $lines ); echo "<br /><br />Line of Date is - $key<br />--------------<br />"; /// everything below this line works... $date_line = $lines[13]; // I will need to change the array number to the number found above in $key $find[] = '<br>'; $replace[] = ''; $text = str_replace($find, $replace, $date_line); $new_date_line = $text; $title_line = $lines[(14)]; // I will need to change the array number to the value of $key + 1 $find2[] = '</strong>'; $replace2[] = ''; $text2 = str_replace($find2, $replace2, $title_line); $new_title_line = $text2; $mysql_date_format = date("Y-m-d", strtotime($new_date_line)); foreach ($lines as $line_num => $line) { if ($line_num <= 14) { echo ""; } elseif ($line_num >= ($count - 3)) { echo ""; } else { $thisline = htmlspecialchars($line); $recipe .= $thisline; } } echo "$mysql_date_format<br />"; echo "$new_title_line<br />"; echo "$recipe<br /><br />---------------------------------<br /><br /><br /><br />"; } ?> As you can see in the comments, it's not finding the value of the date in the array. Am I off base in how I'm going about this? Thanks! Quote Link to comment https://forums.phpfreaks.com/topic/243145-convert-raw-html-to-mysql/#findComment-1251304 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.