Jump to content

Convert UTF8 characters to ISO when exporting from MySQL DB to Excel file


Recommended Posts

Hi.

I'm trying to export the content of MySQL table to Excel file. Exportation is executed with php code below:

<?php

  $DB_TBLName = "reg_form";
  $filename = "excelfile";

  $dbc = mysql_connect('localhost', 'user', 'password')
    or die('Error connecting to MySQL server.');

  $db_name = mysql_select_db('database', $dbc); 

  $query = "SELECT first_name AS 'Pavardė', age AS 'Amžius', etc. etc. etc. FROM $DB_TBLName";
  
  $result = mysql_query($query)
    or die('Error querying database.');

  $file_ending = "xls";
  
header("Content-Type: application/octet-stream; charset=ISO-8859-13");
header("Content-Disposition: attachment; filename=$filename.xls");
header("Pragma: no-cache");
header("Expires: 0");


/******* Formatting for Excel *******/

$sep = "\t";

for ($i = 0; $i < mysql_num_fields($result); $i++) {
	echo mysql_field_name($result,$i) . "\t";
	}
	print("\n");

    while($row = mysql_fetch_row($result))
    {
        $schema_insert = "";
        for($j=0; $j<mysql_num_fields($result);$j++)
        {
            if(!isset($row[$j]))
                $schema_insert .= "NULL".$sep;
            elseif ($row[$j] != "")
                $schema_insert .= "$row[$j]".$sep;
            else
                $schema_insert .= "".$sep;
        }
        $schema_insert = str_replace($sep."$", "", $schema_insert);
 $schema_insert = preg_replace("/\r\n|\n\r|\n|\r/", " ", $schema_insert);
        $schema_insert .= "\t";
        print(trim($schema_insert));
        print "\n";
    }

?>

But the problem is that some language specific characters are not recognized in that excel file. I know that Unicode for Microsoft (Excel) means a little bit different (like everything as usual and everywhere). Then I did some research and I finaly I tried this trick:

$str = "ąčęėįšųūž";
$out = mb_convert_encoding($str, "ISO-8859-13", "UTF-8");
header('Content-type: application/octet-stream');
  header("Content-Disposition: attachment; filename=excelfile.csv");
  header("Pragma: no-cache");
  header("Expires: 0");

echo $out;

This works perfect. But as we can see mb_convert_encoding function works only for strings. But I need to convert all values from my table's columns and fields (see the first code paste above).
The Q is how can I convert all values from my table from UTF8 to ISO-8859-13?

If you change the charset information in the first code block, it should work.

As you're not converting between charsets at all, but still declare it as ISO-8859-13, it explains why those non-ASCII characters comes up corrupted.

Just had a second look at your code, and I see that you're not setting the (correct) charset anywhere. You'll also have to tell the MySQL connection to use UTF-8, otherwise you'll have problems there too.

As for how to "adapt to [your] code" I'm not even sure what you're looking to adapt even. As previously mentioned, you do not need to convert anything, so the code in your second block is unnecessary.

 

PS: You should also be looking to change to MySQLI or PDO, as the old mysql library is outdated and deprecated.

Christian's answer is correct, you don't have to convert anything, just make sure that your tables are on collation utf8_general_ci.

 

Change and add those:

$dbc = mysql_connect('localhost', 'user', 'password')
    or die('Error connecting to MySQL server.');

  $db_name = mysql_select_db('database', $dbc); 

//set charset to utf8
mysql_set_charset('utf8',$dbc);

  $query = "SELECT first_name AS 'Pavardė', age AS 'Amžius', etc. etc. etc. FROM $DB_TBLName";
  
  $result = mysql_query($query)
    or die('Error querying database.');

  $file_ending = "xls";
  // change also that
header("Content-Type: application/octet-stream; charset=utf-8");
Edited by jazzman1

I'm sorry guys.. This doesn't work. I'm totally sure that collation is OK, because I sorted out the problem by myself with correct UTF8 characters in Database two days ago. Check this out. So I'M SURE : ))
Actually, I have no ideas how to fix this because this is the first time when I'm up against charsets settings so I'm still asking any suggestions : )

Sorry, I don't use Microsoft products, but about the second one with a mb_convert_encoding() function try to get all results from mysql database into an array().

 

Something like that:

$str = array("ąčęėįšųūž","ąčęėįšųūž","ąčęėįšųūž");
    
    $out = mb_convert_encoding(implode(',', $str), "ISO-8859-13", "UTF-8");
    
    header('Content-type: application/octet-stream');
    header("Content-Disposition: attachment; filename=excelfile.csv");
    header("Pragma: no-cache");
    header("Expires: 0");
     
    echo $out;

OK. Now my code looks like that:

<?php
$DB_TBLName = "reg_form";
$filename = "excelfile";
$dbc = mysql_connect('localhost', 'user', 'pass')
    or die('Error connecting to MySQL server.');

$db_name = mysql_select_db('database', $dbc); 

mysql_set_charset('ISO-8859-13',$dbc);

$query = "SELECT first_name AS 'Vardas Pavardė', age AS 'Amžius', etc. etc. etc. from $DB_TBLName";

$result = mysql_query($query)
    or die('Error querying database.');

$file_ending = "xls";
  
header("Content-Type: application/octet-stream; charset=ISO-8859-13");
header("Content-Disposition: attachment; filename=$filename.xls");
header("Pragma: no-cache");
header("Expires: 0");


/******* Formatting Excel *******/
$sep = "\t";

//--1. start of printing column names as names of MySQL fields
for ($i = 0; $i < mysql_num_fields($result); $i++) {
	$str = mysql_field_name($result,$i) . "\t";
	$out = mb_convert_encoding($str, "ISO-8859-13", "UTF-8");
	//$out=mb_detect_encoding($str);
	echo $out;
	}
	print("\n");

//--2. start while loop to get data
while($row = mysql_fetch_row($result))
    {
        $schema_insert = "";
        for($j = 0; $j < mysql_num_fields($result);$j++)
        {
            if(!isset($row[$j]))
                $schema_insert .= "NULL".$sep;
            elseif ($row[$j] != "")
			{
				$word = mb_convert_encoding($row[$j], "ISO-8859-13", "ASCII"); 
				//$word = mb_detect_encoding($row[$j]);
				$schema_insert .= $word.$sep;
                		// $schema_insert .= "$row[$j]".$sep; 
			}
            else
                $schema_insert .= "".$sep;
        }
        $schema_insert = str_replace($sep."$", "", $schema_insert);
 	$schema_insert = preg_replace("/\r\n|\n\r|\n|\r/", " ", $schema_insert);
        $schema_insert .= "\t";
        // print(trim($schema_insert));
	$schema_insert = trim($schema_insert);
	$out1 = mb_convert_encoding($schema_insert, "ISO-8859-13", "ASCII");
	// print(trim($out1));
	echo $out1;
        print ("\n"); 
    }
?>

The thing is that only the first part (see the code //--1.) works properly. That's because strings come from the code. BUT there are some problems with the second part (see code //--2.). Still doesn't work. Notice there is a commented line under elseif sentence '$word = mb_detect_encoding($row[$j]);' which can output what type of characters are. When I open excel file it says that all fields (rows) are ASCII format. So that's why I put everywhere (in the second part of the code only) the conversion from ASCII to ISO function. Unfortunately, this doesn't work... WHY?!

Edited by Baltas

You're fetching ISO-formatted text from the DB, as evidenced by this line:

mysql_set_charset('ISO-8859-13',$dbc);
Yet, before adding the content to the excel document, you're converting them from UTF-8 to ISO-8859-13:

$out = mb_convert_encoding($str, "ISO-8859-13", "UTF-8");
I reckon you can see why that is a problem.

 

Either your data is saved as UTF-8 in the database, and you'll need to correct the MySQL charset; Or your data is saved as ISO-8859-13, and you don't need any converting.

 

Why you're converting the text from ISO-8859-13 to ASCII later, I have no idea. Suffice to say it's completely useless, and can even cause problems with displaying some of the special diacritics used by your language.

Preferably there shouldn't be any need for any converting of charsets what-so-ever, since Excel supports UTF-8 just fine.

Now. I suppose I understood that obvious mistake with fetching text from the DB, so I change it to

mysql_set_charset('UTF-8',$dbc);  //because data is saved as UTF-8 in the database???

Why you're converting the text from ISO-8859-13 to ASCII later, I have no idea. Suffice to say it's completely useless, and can even cause problems with displaying some of the special diacritics used by your language.

First of all, I'm converting the text from ASCII to ISO-8859-13 (ref. http://php.net/manual/en/function.mb-convert-encoding.php).
Secondly, I used this line:

$word = mb_detect_encoding($row[$j]);

to have a look what is the type of encoding of that text fetching from the database. This shows me:
post-146506-0-31184500-1365862687_thumb.png

As we can see encoding type is ASCII.

Therefore, I was using that conversion function (from ASCII to ISO-8859-13):

$out = mb_convert_encoding($str, "ISO-8859-13", "ASCII");

Then, if I don't use any conversions excel file refuses those characters...

Edited by Baltas

Ok, let's say that we have the next data in mysql database and all tables are on collation utf8_general_ci.

<?php

date_default_timezone_set("America/Toronto");

//connect to database
define('DB_SERVER', 'localhost');
define('DB_USER', 'jazzman');
define('DB_PASSWORD', 'password');
define('DB_NAME', 'test_database');
 
$conn = mysql_connect (DB_SERVER, DB_USER, DB_PASSWORD) or die(mysql_error());

mysql_select_db (DB_NAME,$conn) or die(mysql_error());

if(!$conn){
    die( "Sorry! There seems to be a problem connecting to our database.");
}

// set charset to utf8
mysql_set_charset('utf8',$conn); 

//create query to select as data from your table
$select = "SELECT * FROM posts";

//run mysql query and then count number of fields
$export = mysql_query ($select)  or die ("Sql error : " . mysql_error( ));

$fields = mysql_num_fields ($export);

//create csv header row, to contain table headers 
//with database field names

for ( $i = 0; $i < $fields; $i++ ) {
	$header = mysql_field_name( $export , $i ) . ",";
}
// set empty data array
$data = array(); 

while( $row = mysql_fetch_assoc($export) ) {

//$data[] = array($row['title'], $row['body']);

$data[] = $row;

}

echo '<pre>'.print_r($data,true).'</pre>';

/*
//create a file and send to browser for user to download
header("Content-type: application/vnd.ms-excel");
header("Content-disposition: csv" . date("Y-m-d") . ".csv");
header( "Content-disposition: filename="."test.csv");
print $header."\n". implode(',', $data[0]); 
exit;
*/

Results:

<pre>Array
(
[0] => Array
(
[id] => 1
[title] => The title
[body] => This is the post body.
[created] => 2013-04-10 07:24:15
[modified] =>
)

[1] => Array
(
[id] => 2
[title] => Тест
[body] => съдържание на български
[created] =>
[modified] =>
)

[2] => Array
(
[id] => 3
[title] => Sveiki atvyke
[body] => Sveiki atvyke Sveiki atvykę
[created] =>
[modified] =>
)

[3] => Array
(
[id] => 4
[title] => Ačiū, gerai
[body] => Sveiki atvyke Sveiki atvykę
[created] =>
[modified] =>
)

[4] => Array
(
[id] => 5
[title] => The title
[body] => This is a post body
[created] =>
[modified] =>
)

[5] => Array
(
[id] => 6
[title] => The title from netbeans
[body] => This is a netbeans body
[created] =>
[modified] =>
)

[6] => Array
(
[id] => 7
[title] => Aciu. Ačiū
[body] => Sveiki atvyke Sveiki atvykę
[created] =>
[modified] =>
)

[7] => Array
(
[id] => 8
[title] => Aciu. Ačiū
[body] => Sveiki atvyke Sveiki atvykę
[created] =>
[modified] =>
)

[8] => Array
(
[id] => 9
[title] => Aciu. Ačiū
[body] => Sveiki atvyke Sveiki atvykę
[created] =>
[modified] =>
)

)

</pre>

 

 

As you can see from the log file, we have a content with bulgarian, engish and lithuanian characters.

 

Now let's say we want to get only titles and bodies then save them into .cvs file from array #7 this one with lithuanian characters.

 

Sure if you want to save every bodies and titles you need to loop and explode them. 

 

I've got this result when I try to open the file into my openOffice software.

<?php

date_default_timezone_set("America/Toronto");

//connect to database
define('DB_SERVER', 'localhost');
define('DB_USER', 'jazzman');
define('DB_PASSWORD', 'password');
define('DB_NAME', 'test_database');
 
$conn = mysql_connect (DB_SERVER, DB_USER, DB_PASSWORD) or die(mysql_error());

mysql_select_db (DB_NAME,$conn) or die(mysql_error());

if(!$conn){
    die( "Sorry! There seems to be a problem connecting to our database.");
}

// set charset to utf8
mysql_set_charset('utf8',$conn); 

//create query to select as data from your table
$select = "SELECT * FROM posts";

//run mysql query and then count number of fields
$export = mysql_query ($select)  or die ("Sql error : " . mysql_error( ));

$fields = mysql_num_fields ($export);

//create csv header row, to contain table headers 
//with database field names

for ( $i = 0; $i < $fields; $i++ ) {
	$header = mysql_field_name( $export , $i ) . ",";
}
// set empty data array
$data = array(); 

while( $row = mysql_fetch_assoc($export) ) {
    
$data[] = array($row['title'], $row['body']);

//$data[] = $row;

}

// echo '<pre>'.print_r($data,true).'</pre>';

//create a file and send to browser for user to download
header("Content-type: application/vnd.ms-excel");
header("Content-disposition: csv" . date("Y-m-d") . ".csv");
header( "Content-disposition: filename="."test.csv");
print $header."\n". implode(',', $data[7]); 
exit;

Result:

 

 

 

post-124152-0-92411400-1365869252_thumb.png

Yes, your case is working.. But I've got no ideas where is the problem in my case : )

 

Even strange things happen.. when I try to export data straight from the phpMyAdmin (!) WHY like that even from database?!

 

Well, what charset microsoft exel used for? 

If the database contains erroneous characters, then you've used the wrong charset when inserting the data in the first place. In which case the characters are irrevocably corrupted, and the only thing that can fix that is a (semi-)manual search & replace. You'll need to find out which byte sequence matches what character, and then run the search & replace on all fields, for each of those characters.

Of course, you'll want to make sure that all fields and tables in the DB uses UTF-8 prior to this. So the easiest way to fix this, is to export the data, fix the tables, then search & replace each characters, and finally import the fixed SQL script. Make sure you've added a "USE NAMES utf8" statement at the top of it though, so that your data isn't corrupted again.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.