Jump to content

Strange subject encoding writing imap reader in php


king.oslo

Recommended Posts

Hello,

 

I am writing my first imap email reader:

 

<?php
$connection = imap_open('{' . $server . '}', $login, $password);
$mails = imap_fetch_overview($connection,"1:*", FT_UID);
$mails = object_to_array($mails);
$return = '<table width="100%">
<tr>
<td><b>#</b></td>
<td><b>From</b></td>
<td><b>Date / Time</b></td>
<td><b>Subject</b></td>
</tr>';

$size = count($mails); // Number of messages
$cmsg = 0; // This is used to have a continously number

for($i=$size-1; $i>=0; $i--) {
$cmsg++;
$return .= '<tr><td>' . $cmsg . '</td><td>' . $mails[$i]['from'] . '</td><td>' . $mails[$i]['date'] . '</td><td><a href="' . $_SERVER['PHP_SELF'] . '?id=' . $mails[$i]['msgno'] . '">' . $mails[$i]['subject'] . '</a></td></tr>';
}
$return .= '</table>';
print $return;a

imap_close($connection);
?>

 

This returns something like this (notice the email-subjects):

#	From				Date / Time				Subject

1	Trusepiker CC			Fri, 04 Sep 2009 03:17:10 +0200		=?UTF-8?B?SmlwcGlpaSEgRW5uw6UgZXQgc2FsZyE=?=
2	Gmail				Fri, 4 Sep 2009 01:47:10 +0100		=?windows-1252?Q?Gmail_Bekreftelse_=96_Send_e=2Dpost_som_salg=40trusepike?=	=?windows-1252?Q?r=2Eno?=
3	Trusepiker CC			Thu, 03 Sep 2009 20:13:56 +0200		=?UTF-8?B?SmlwcGlpaSEgRW5uw6UgZXQgc2FsZyE=?=
4	oslo@kingoslo.com		Thu, 03 Sep 2009 04:13:05 +0200		Subject: Betaling avvist.
5	oslo@kingoslo.com		Thu, 03 Sep 2009 03:40:50 +0200		Subject: Betaling avvist.
6	Trusepiker CC			Thu, 03 Sep 2009 03:36:17 +0200		=?UTF-8?B?SmlwcGlpaSEgRW5uw6UgZXQgc2FsZyE=?=
7	"service@paypal.co.uk"		Tue, 01 Sep 2009 19:44:12 -0700		You've Added an Additional Email Address
8	"service@paypal.co.uk"		Tue, 01 Sep 2009 19:44:02 -0700		You've Added an Additional Email Address
9	Marius Jonsson			Wed, 2 Sep 2009 03:33:23 +0100		Test

 

 

 

 

At first, I thought I had to use base64_decode() on the subjects, and all would be solved, but using base64_decode(), The script returned only strange symbols:

#	From			Date / Time					Subject

1	Trusepiker CC		Fri, 04 Sep 2009 03:17:10 +0200			Q1|)������������Ёͅ���
2	Gmail			Fri, 4 Sep 2009 01:47:10 +0100			�)ݣ5۝�f������z[�׶�,��&��`�Kk�ǩ�G��wh��v�d+�I�
3	Trusepiker CC		Thu, 03 Sep 2009 20:13:56 +0200			Q1|)������������Ёͅ���
4	oslo@kingoslo.com	Thu, 03 Sep 2009 04:13:05 +0200			J��y�Az֥�x��
5	oslo@kingoslo.com	Thu, 03 Sep 2009 03:40:50 +0200			J��y�Az֥�x��
6	Trusepiker CC		Thu, 03 Sep 2009 03:36:17 +0200			Q1|)������������Ёͅ���
7	"service@paypal.co.uk"	Tue, 01 Sep 2009 19:44:12 -0700			b��x]y֧�b�*'jQ&j)@u�޲
8	"service@paypal.co.uk"	Tue, 01 Sep 2009 19:44:02 -0700			b��x]y֧�b�*'jQ&j)@u�޲
9	Marius Jonsson		Wed, 2 Sep 2009 03:33:23 +0100			M�-

 

 

1. What do you think I need to do to make these subjects appear normally?

2. Are there any good tutorials on how to treat/decode/format imap information in php? I reckon I will need to format a lot of other things, such as the body of the emails.

 

Thank you!

 

Kind regard,

Marius

Link to comment
Share on other sites

Thank you for your suggestions.

 

I guess these things are some kind of encoding (maybe charset informatio?). I then think: when out of 9 emails, we face two types of subject encoding:

1: =?UTF-8?B? [b]TEXT[/b] =?=
2: =?windows-1252?Q? [b]TEXT[/b] ?=

 

After maybe 1 000 000 emails (just to pick a number), the users may be faced with potentially tens if not hundreds of different decodings. I thought maybe there was a mb_base64_decode() or an tutorial or some kind of easy way to decode these subjects (because anyone who has written an email reader will have faced this problem), so that I did not have to write regexp for all these potential charset encoding?

 

Sorry my english is poor, but I hope the essence comes through.

 

Thanks,

Marius

Link to comment
Share on other sites

Thanks for that too.

'

I've found out more in this norwegian article: http://www.myrvold-data.no/e-post-utf-8-og-base64-encoding/

 

=?charset?encoding?encoded text?=

 

In our case:

=?UTF-8?B? TEXT =?=

 

First =? then the charset then ? then B for Base64 encoded subject, then the subject encoded with base64 and finally =?= to finish the subject title.

 

I will check out  http://www.horde.org/download/app/?app=imp when I have some more power for my laptop batteries. They will run out now.

 

Thanks and good night.

 

Marius

Link to comment
Share on other sites

No idea RaythMistwalker, i am a beginner @ computers, however I found some more information:

 

The B in :

=?UTF-8?B? apparently stands for Base64

 

Any idea what kind of encoding the Q stands for here?:

=?windows-1252?Q?

 

Wikipedia says: quoted-printable

 

http://en.wikipedia.org/wiki/MIME

 

Final piece in the puzzle.

 

Thanks,

Marius

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.