Jump to content

Parsing HTML file to Database


locorecto

Recommended Posts

Hello Guys, thanks for reading my question. I have as set of html files with some information which I need to insert into a MySQL database.  The data consist of Dates and Historical facts which happened on those dates.

 

<p>
    <strong>June 26, 1721</strong> Following the recommendation of
    Rev. Cotton Mather, Dr. Zabdiel Boylston of Boston completes the first
    inoculation against smallpox in the U.S., injecting his own son and two of his
    slaves. 
</p>
<p>
    <strong>1736</strong> In New York, the city almshouse, located on Broadway
    near Park Row, opens an infirmary with six beds. This infirmary grows into
    Bellevue Hospital.
</p>
<p>
    <strong>May 11, 1751</strong> Benjamin Franklin and Dr. Thomas
    Bond receive a charter from the Pennsylvania legislature to open the first
    hospital in the American colonies for the sick poor and the insane. 
</p>
<p>
    <strong>1770</strong> Kings College awards the first M.D. degree in the
    colonies to Robert Tucker.
</p>
<p>
    <strong>June 13, 1771</strong> New York Hospital, the second in
    the colonies after the Pennsylvania Hospital, receives a royal charter from
    King 
</p>
<p>
    George III under the name Society of the Hospital in the City of
    New York in America, later changed to Society of New York Hospital. 
</p>
<p>
    <strong>Oct. 12, 1773</strong> The Public Hospital for Persons of
    Insane and Disordered Minds is established in Williamsburg, Virginia. It was
    the first building in North America devoted solely to the treatment of the
    mentally ill. 
</p>
<p>
    <strong>1791</strong> The Society of New York Hospital opens at a site on Broad­way
    between Duane and Worth Streets.
</p>

 

I have used the following php script to insert the corresponding data into the DB.

 

$dom_doc = new DOMDocument();
$html_file = file_get_contents('HIA.htm');

$dom_doc->loadHTML( $html_file );

$tags_p = $dom_doc->getElementsByTagName('p');	

foreach($tags_p as $key=>$tag) {
    $tag_value = $tag->nodeValue;
	$date = $tag->getAttribute('strong');
                $query = "INSERT INTO Milestones(Date, Text) VALUES('$tag_value', '$date')";
               mysql_query($query)or die('Value '.$Value.' and Date'.$date. 'could not be inserted. '.myslq_error() );
               }
echo "Done";

 

Here is the problem. When I go into the DB the Date fields are empty. Also the Text fields include the date at the beginning of the text as follow

 

June 26, 1721 Following the recommendation of

    Rev. Cotton Mather, Dr. Zabdiel Boylston of Boston completes the first

    inoculation against smallpox in the U.S., injecting his own son and two of his

    slaves.

 

I would like to have the date formatted into the Date field as m/d/Y, and have the Text field with only the text of the historical fact and not the date.

 

I appreciate your help in advance.

Link to comment
Share on other sites

I FOUND the way to solve this!

 

Just grab the first element it finds:

 

$dateNode = $tag->getElementsByTagName('strong');
$date = $dateNode->item(0)->nodeValue;

 

 

 

 

Now that you have the date, you can do a simple remove on your $tag_value variable to get rid of the date in the text:

 

$tag_value = $tag->nodeValue;
if(strpos($tag_value, $date))
        $tag_value = trim(str_replace($date, "", $tag_value));

 

 

 

 

Then when thats done, you should be good to go for the database insert.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.