Jump to content

[SOLVED] AJAX Chatroom - Character Encoding


kratsg

Recommended Posts

I was wondering about this though because I do have an AJAX chatroom. It writes all messages to a text file and then reads these messages back for other users to view inside a <div>. What is curious is that when the chatroom automatically gets new messages, character entities like ♥ will show up as a heart, but anyone who refreshes (in which case it only grabs the last 15 messages) will get some weird funky coding. For example:

 

©

 

when refreshed looks like

 

©

 

or..

 

 

when refreshed looks like

 

∞

 

Below is the code used for posting the messages:

<?php
//filter so normal html tags can be displayed correctly afterwards
//this also strips all other tags that are non-html
$message = trim(strip_tags(stripslashes($_POST['contents'])));

//filter list for cursing/swearing, etc... will create the array by reading the lines from the filter.txt file
$file = fopen("filter.txt",'r');
$array = array();
while($line = fgets($file)){
	$line = trim($line,"\t\n\r\0\x0B");
	$array[] = '/'.$line.'/i';
}
fclose($file);

$message = preg_replace($array, '*blocked*', $message);
if(!preg_match('/\[url\]/',$message)){//posting a URL, ignore the next
if(preg_match('/\S{40,}/',$message)){//they posted a long line
echo "false";
return;
}
}

if(!empty($message)){

	$data = $user.": ".$message."\n".file_get_contents($filename);
		$file=fopen("$filename",'r+');
		if(fwrite($file,$data)){
			fclose($file);
			echo "true";
			return;
		}
} else {
	echo "false";
	return;
}
}


?>

**stuff may not match up, I deleted lines that are not going to affect the message posted.

 

When they are in the chatroom, this is what sends the messages back to them:

 

<?php
	$file = file($filename);
	$counters = count(file($filename));
if($_SESSION['chat_counter'] == $counter){echo null;return;} else {
	$data = null;
	$color = "user1";

	$diff = count($file) - $_SESSION['chat_counter'];
for($i=0;$i<$diff;$i++){
	$counter = $counters-$i;
	if($counter % 2){$color = "user1";} else {$color = "user2";}
	$data .="<span class='numbers'>(#$counter)  </span>[$color]".$file[$i];
}
	$_SESSION['chat_counter'] = $counters;

echo $data;
?>

When they refresh: 

[code]
<?php
$file = file($filename);
if(!isset($_SESSION['chat_counter'])){$_SESSION['chat_counter'] = 0;}
$_SESSION['chat_counter'] = count($file) - 15;
if(count($file) < 15){$_SESSION['chat_counter'] = 0;}

$color = "user1";
$data = null;
$counters = count($file);
$diff = count($file) - $_SESSION['chat_counter'];
for($i=0;$i<$diff;$i++){
$counter = $counters-$i;
if($counter % 2){$color = "user1";} else {$color = "user2";}
$data .="<span class='numbers'>(#$counter)  </span>[$color]".$file[$i];
}

$data = str_replace($find,$replace,str_replace("\n","</span></span></span><br>",$data));

?>

 

This is the <div> in which contains the chatroom messages:

 

<div id='contents' name='contents' style='overflow:scroll;width:700px;height:450px;font-family:tahoma;font-size:15px;border:1px black solid;text-align:left;padding-left:5px;background-image:url(\"images/chatbg.gif\");background-position:center;background-repeat: no-repeat;'>

 

When I post either one of those character entities, it appears normally in the text file as a copyright symbol, etc... so I have to imagine it has to do with the page encoding? I can't seem to fix it though.

 

Thanks ~kratsg

Link to comment
Share on other sites

You may want to set the server encoding:

header('Content-Type:text/html; charset=UTF-8');

Make sure in your <head> you define the page as UTF-8.

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

 

In your PHP code when you get and set text, make sure you use things such as iconv or utf8_encode() to make sure they're all the same encoding.

 

This should fix 99% of your problems.

Link to comment
Share on other sites

You may want to set the server encoding:

header('Content-Type:text/html; charset=UTF-8');

Make sure in your <head> you define the page as UTF-8.

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

 

In your PHP code when you get and set text, make sure you use things such as iconv or utf8_encode() to make sure they're all the same encoding.

 

This should fix 99% of your problems.

 

Will setting the headers affect anything else other than character entities?

 

Question:Like the fact that I use mysql_real_escape_string before I input my message, I do not need to unescape afterwards (since magic_quotes is off). If I set my text as utf8_encode() (IE: parsing it before I write it to the file), doesn't that mean I do not need to parse it when I call from the file? (use utf8_decode or anything before sending it to the chatroom?)

 

Thanks :-D

Link to comment
Share on other sites

Will setting the headers affect anything else other than character entities?

 

Question:Like the fact that I use mysql_real_escape_string before I input my message, I do not need to unescape afterwards (since magic_quotes is off). If I set my text as utf8_encode() (IE: parsing it before I write it to the file), doesn't that mean I do not need to parse it when I call from the file? (use utf8_decode or anything before sending it to the chatroom?)

 

It's a good general idea to keep everything encoded in the same encoding such as UTF-8. Your PHP may automatically use UTF-8 but your AJAX chat may use ISO-8859-1, and so when UTF ->ISO-8859-1 = mojimbake, aka those weird symbols.

 

Your chat will recieve UTF-8 from the server, send UTF-8 in a UTF environment, as you can see all the bases are set and it should be correct, as long as the file is encoded with UTF, it should not need to be parsed afterwards.

Link to comment
Share on other sites

So, will it be sufficient to just use the PHP code

header('Content-Type:text/html; charset=UTF-8');

 

at the top of each php page involved?

 

*Side note: the AJAX submissions uses headers, do those have to be modified as well? (My headers do not send any encoding so it's default apparently?)

Link to comment
Share on other sites

So, will it be sufficient to just use the PHP code

header('Content-Type:text/html; charset=UTF-8');

 

at the top of each php page involved?

 

*Side note: the AJAX submissions uses headers, do those have to be modified as well? (My headers do not send any encoding so it's default apparently?)

 

It's a good idea to place that header at first where needed, but usually servers will default to it.

 

The <head>'s meta tag will be most important in the roll of encoding, since your browser will read that , and essentially apply that or default encoding to everything on the web page. JS headers and encoding is not needed.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.