Jump to content

Special Chars clearing text on XML MySQL Import


patsfans

Recommended Posts

I'm trying to import an XML file into my MYSQL database, and I'm running into an issue where if one of the XML fields has a special character, it's inserting the correct special character into the database but ommitting the  text.

 

Example:

 

<value1>TRADEMARKED TERM™</value1>

 

Inserts just the "™" symbol but all the text before it disappears and doesn't insert into the database.

 

Do you know why it's doing it?  I added the utf information for mysql prior to the insert which fixed it going in as jumbled characters but it didn't solve the problem of the text not going in at all.  Very strange.

 

Here is the code:

 

<?php

// initialize some variables

$currentTag = "";

 

// this array will hold the values for the SQL statement

$values = array();

 

// this array will hold allowed fields/elements

$allowedFields = array("sku", "value1", "value2");

 

// XML file to parse

$xml_file="file.xml";

 

// database parameters

$host = "localhost";

$user = "name";

$pass = "pass";

$db = "db";

$table = "table";

 

// called when parser finds start tag

function startElementHandler($parser, $name, $attributes)

{

global $currentTag;

$currentTag = $name;

}

 

// called when parser finds end tag

function endElementHandler($parser, $name)

{

global $values, $currentTag;

// import database link and table name

global $connection, $table;

 

// if ending <item> tag

// implies end of record

if (strtolower($name) == "xmlfield")

{

// NOTE: I dont know what the indices of $values are,

// so you might need to do a debug print_r($values)

// to find out what they are.

$sku = $values['sku'];

$value1 = $values['value1'] ? $values['value1'] : '';

$value2 = $values['value2'] ? $values['value2'] : '';

 

// generate the query string

$query = "INSERT INTO table ";

$query .= "(sku,value1,value2) ";

$query .= "VALUES('$sku', '$value1', '$value2');";

 

// uncomment for debug

// print $query;

 

// execute query

                        mysql_query("SET CHARACTER SET utf8");

mysql_query("SET NAMES utf8");

$result = mysql_query($query);

 

// reset all internal counters and arrays

$values = array();

$currentTag = "";

}

 

}

 

// called when parser finds cdata

function characterDataHandler($parser, $data)

{

 

global $currentTag, $values, $allowedFields;

// lowercase tag name

$currentTag = strtolower($currentTag);

 

// look for tag in $allowedFields[] array

// to see if it is to be included in query

if (in_array($currentTag, $allowedFields) && trim($data) != "")

{

// add field=>value pairs to $values array

$values[$currentTag] = mysql_escape_string($data);

}

}

 

// initialize parser

$xml_parser = xml_parser_create();

 

// turn off whitespace processing

xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE, TRUE);

// turn on case folding

xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, FALSE);

 

// set callback functions

xml_set_element_handler($xml_parser, "startElementHandler", "endElementHandler");

xml_set_character_data_handler($xml_parser, "characterDataHandler");

 

// open connection to database

$connection = mysql_connect($host, $user, $pass) or die ("Unable to connect!");

mysql_select_db($db) or die ("Unable to select database!");

 

// read XML file

if (!($fp = fopen($xml_file, "r")))

{

die("File I/O error: $xml_file");

}

 

// parse XML

while ($data = fread($fp, 4096))

{

// error handler

if (!xml_parse($xml_parser, $data, feof($fp)))

{

$error_code = xml_get_error_code($xml_parser);

die("XML parser error (error code " . $error_code . "): " . xml_error_string($error_code) . "<br>Error occurred at line " . xml_get_current_line_number($xml_parser));

}

}

 

// all done, clean up!

xml_parser_free($xml_parser);

mysql_close($connection);

 

?>

 

Thanks again, and I appreciate your help.

Just bumping to see if anyone might know the answer to this.  Still haven't quite figured it out yet.

 

The problem seems to be that it's parsing out the text prior to the fields even going into the database, so it seems I need to add some sort of utf command to the parsing script to keep the whole field together before it gets inserted into the database.

Just bumping this again to see if anyone might know the answer.  Does anyone think that changing a setting in the "initialize parser" portion may work or is it in the "set callback functions" that would need to be changed instead?  I've tried a few things so far but haven't found a solution yet...very frustrating.

 

Thanks again in advance for your assistance, and hopefully with your help we can mark this "solved" :D - Ian

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.