map200uk Posted May 2, 2007 Share Posted May 2, 2007 Hi, ive noticed sometimes i am getting some weird characters stored in my database as a result of reading an audio files tag data, such as ��������������� is there an easy way to possibly get rid of them? I was thinking regex them? altho im not sure what character it is, it appears as a [] in notepad and a triangle with a ? in the browser Quote Link to comment Share on other sites More sharing options...
effigy Posted May 2, 2007 Share Posted May 2, 2007 Looks like an encoding issue. What encoding is the data in and how are you decoding it (if at all)? Also, what about your database and the display? Quote Link to comment Share on other sites More sharing options...
arianhojat Posted May 2, 2007 Share Posted May 2, 2007 could be qoutes. ive had trouble with those. smart qoutes like microsoft puts in a word doc like a smart left qoute and right qoute, dont take up well when stored in database, maybe cause of database character set. anyway best option seems to be to loop through databaser and convert them into regualr qoutes... or convert the value to the entity on php output page... a fucntion to start u off: http://shiflett.org/blog/2005/oct/convert-smart-quotes-with-php Quote Link to comment Share on other sites More sharing options...
map200uk Posted May 2, 2007 Author Share Posted May 2, 2007 effigy, encoding? its just dragged off the mp3 metadata to a stirng and then put into the database as a varchar, no special encoding yet some files seem to work without any problems? just checked the id3 when i load the file in xmms and there's no weird characters shown, i cant see why it would do it for only some is the thing Quote Link to comment Share on other sites More sharing options...
effigy Posted May 2, 2007 Share Posted May 2, 2007 Encoding is crucial. I recommend reading this. At what point do the characters look incorrect: When you echo the data pulled from the file? Select them from the database? Quote Link to comment Share on other sites More sharing options...
map200uk Posted May 2, 2007 Author Share Posted May 2, 2007 when selected from the database, and they are stored in the database with these weird chars, i will do in a sec, however can i ask, why would it store 90% ok and do this for a few? map Quote Link to comment Share on other sites More sharing options...
effigy Posted May 2, 2007 Share Posted May 2, 2007 I'm not sure. What character set is your database/table using? Quote Link to comment Share on other sites More sharing options...
map200uk Posted May 2, 2007 Author Share Posted May 2, 2007 MySQL charset: UTF-8 Unicode (utf8) i cant figure this one out, as when i open the file up in a hex editor to check the contents i dont see any extra data, im out of ideas-:| Quote Link to comment Share on other sites More sharing options...
effigy Posted May 3, 2007 Share Posted May 3, 2007 What is the audio tag format that you're extracting? Mp3's ID3v2? Quote Link to comment Share on other sites More sharing options...
map200uk Posted May 3, 2007 Author Share Posted May 3, 2007 ID3V1 mate, and it appears fine when i view the tag data in say xmms or winamp and it is only happening for some files;| Quote Link to comment Share on other sites More sharing options...
effigy Posted May 3, 2007 Share Posted May 3, 2007 ID3v1 also lacked support for internationalization. While nominally the text was supposed to be encoded in ISO-8859-1, in practice the user's local encoding was usually used, and so mojibake are common in ID3v1 tags. -- Wikipedia Unfortunately--based on this information--there are no strict standards for ID3v1. As a result, you need to detect the encoding (unless you can safely assume that ISO-8859-1 was used all of the time), decode the data, and then re-encode the data into UTF-8 before it goes to the database. Quote Link to comment Share on other sites More sharing options...
map200uk Posted May 3, 2007 Author Share Posted May 3, 2007 oh, didnt realise that, bugger Quote Link to comment Share on other sites More sharing options...
effigy Posted May 3, 2007 Share Posted May 3, 2007 Do you know how these mp3 files were created? Are there a lot of international artists, and therefore "special characters"? You could try a straight ISO-8859-1 to UTF-8 conversion and see if that works... Quote Link to comment Share on other sites More sharing options...
map200uk Posted May 3, 2007 Author Share Posted May 3, 2007 as far as i can tell they all seem to be english names, artist/songtitle - all the audio is in english jus gonna try that now Quote Link to comment Share on other sites More sharing options...
map200uk Posted May 3, 2007 Author Share Posted May 3, 2007 altho surely if this was the case it would not echoe the incorrect weird chars when the tag is read direct from the mp3? Quote Link to comment Share on other sites More sharing options...
map200uk Posted May 3, 2007 Author Share Posted May 3, 2007 this is typical-i cant seem to re-produce the error now!!! ahhhh Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.