Jump to content

Strange subject encoding writing imap reader in php


king.oslo

Recommended Posts

Hello,

 

I am writing my first imap email reader:

 

<?php
$connection = imap_open('{' . $server . '}', $login, $password);
$mails = imap_fetch_overview($connection,"1:*", FT_UID);
$mails = object_to_array($mails);
$return = '<table width="100%">
<tr>
<td><b>#</b></td>
<td><b>From</b></td>
<td><b>Date / Time</b></td>
<td><b>Subject</b></td>
</tr>';

$size = count($mails); // Number of messages
$cmsg = 0; // This is used to have a continously number

for($i=$size-1; $i>=0; $i--) {
$cmsg++;
$return .= '<tr><td>' . $cmsg . '</td><td>' . $mails[$i]['from'] . '</td><td>' . $mails[$i]['date'] . '</td><td><a href="' . $_SERVER['PHP_SELF'] . '?id=' . $mails[$i]['msgno'] . '">' . $mails[$i]['subject'] . '</a></td></tr>';
}
$return .= '</table>';
print $return;a

imap_close($connection);
?>

 

This returns something like this (notice the email-subjects):

#	From				Date / Time				Subject

1	Trusepiker CC			Fri, 04 Sep 2009 03:17:10 +0200		=?UTF-8?B?SmlwcGlpaSEgRW5uw6UgZXQgc2FsZyE=?=
2	Gmail				Fri, 4 Sep 2009 01:47:10 +0100		=?windows-1252?Q?Gmail_Bekreftelse_=96_Send_e=2Dpost_som_salg=40trusepike?=	=?windows-1252?Q?r=2Eno?=
3	Trusepiker CC			Thu, 03 Sep 2009 20:13:56 +0200		=?UTF-8?B?SmlwcGlpaSEgRW5uw6UgZXQgc2FsZyE=?=
4	[email protected]		Thu, 03 Sep 2009 04:13:05 +0200		Subject: Betaling avvist.
5	[email protected]		Thu, 03 Sep 2009 03:40:50 +0200		Subject: Betaling avvist.
6	Trusepiker CC			Thu, 03 Sep 2009 03:36:17 +0200		=?UTF-8?B?SmlwcGlpaSEgRW5uw6UgZXQgc2FsZyE=?=
7	"[email protected]"		Tue, 01 Sep 2009 19:44:12 -0700		You've Added an Additional Email Address
8	"[email protected]"		Tue, 01 Sep 2009 19:44:02 -0700		You've Added an Additional Email Address
9	Marius Jonsson			Wed, 2 Sep 2009 03:33:23 +0100		Test

 

 

 

 

At first, I thought I had to use base64_decode() on the subjects, and all would be solved, but using base64_decode(), The script returned only strange symbols:

#	From			Date / Time					Subject

1	Trusepiker CC		Fri, 04 Sep 2009 03:17:10 +0200			Q1|)������������Ёͅ���
2	Gmail			Fri, 4 Sep 2009 01:47:10 +0100			�)ݣ5۝�f������z[�׶�,��&��`�Kk�ǩ�G��wh��v�d+�I�
3	Trusepiker CC		Thu, 03 Sep 2009 20:13:56 +0200			Q1|)������������Ёͅ���
4	[email protected]	Thu, 03 Sep 2009 04:13:05 +0200			J��y�Az֥�x��
5	[email protected]	Thu, 03 Sep 2009 03:40:50 +0200			J��y�Az֥�x��
6	Trusepiker CC		Thu, 03 Sep 2009 03:36:17 +0200			Q1|)������������Ёͅ���
7	"[email protected]"	Tue, 01 Sep 2009 19:44:12 -0700			b��x]y֧�b�*'jQ&j)@u�޲
8	"[email protected]"	Tue, 01 Sep 2009 19:44:02 -0700			b��x]y֧�b�*'jQ&j)@u�޲
9	Marius Jonsson		Wed, 2 Sep 2009 03:33:23 +0100			M�-

 

 

1. What do you think I need to do to make these subjects appear normally?

2. Are there any good tutorials on how to treat/decode/format imap information in php? I reckon I will need to format a lot of other things, such as the body of the emails.

 

Thank you!

 

Kind regard,

Marius

To me it looks like the client that is sending the message is attaching additional content, as to how to get rid of it.. maybe a preg_match() ? http://php.net/manual/en/function.preg-match.php

 

How do readers such as gmail and hotmail treat this? When I view these messages in gmail and hotmail, they appear normally.

 

Thanks,

Marius

I thought about that, but unfortunatly not all of them start and end with this pattern. Look at email #2.

 

I thought maybe somebody who have written an imap reader before had a function that translate all the most common encryptions.

Thank you for your suggestions.

 

I guess these things are some kind of encoding (maybe charset informatio?). I then think: when out of 9 emails, we face two types of subject encoding:

1: =?UTF-8?B? [b]TEXT[/b] =?=
2: =?windows-1252?Q? [b]TEXT[/b] ?=

 

After maybe 1 000 000 emails (just to pick a number), the users may be faced with potentially tens if not hundreds of different decodings. I thought maybe there was a mb_base64_decode() or an tutorial or some kind of easy way to decode these subjects (because anyone who has written an email reader will have faced this problem), so that I did not have to write regexp for all these potential charset encoding?

 

Sorry my english is poor, but I hope the essence comes through.

 

Thanks,

Marius

Thanks for that too.

'

I've found out more in this norwegian article: http://www.myrvold-data.no/e-post-utf-8-og-base64-encoding/

 

=?charset?encoding?encoded text?=

 

In our case:

=?UTF-8?B? TEXT =?=

 

First =? then the charset then ? then B for Base64 encoded subject, then the subject encoded with base64 and finally =?= to finish the subject title.

 

I will check out  http://www.horde.org/download/app/?app=imp when I have some more power for my laptop batteries. They will run out now.

 

Thanks and good night.

 

Marius

No idea RaythMistwalker, i am a beginner @ computers, however I found some more information:

 

The B in :

=?UTF-8?B? apparently stands for Base64

 

Any idea what kind of encoding the Q stands for here?:

=?windows-1252?Q?

 

Wikipedia says: quoted-printable

 

http://en.wikipedia.org/wiki/MIME

 

Final piece in the puzzle.

 

Thanks,

Marius

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.