Jump to content

[SOLVED] Looking for a special character chart/table


gmcmudder

Recommended Posts

I'm working on a project using a text document that contains some special characters I need to remove.  I know I use to have a word or excel document that had a list of all the characters, but now I can't find it.  Does anyone know where a similar list might be on the web.  A Google search doesn't turn up any tables that contain all of the characters I need to remove and I can't see the characters in the text document for reference.  Most of the list I've found online only contain about 125 of the available characters that can be typed and have none of the characters I need to scan for in it.

 

Some of the characters that the original text document contains I need to remove are (hard return, unknown) etc.  Any help would be greatly appreciated.

Link to comment
Share on other sites

Such as this?

Have you tried a hex editor? Are you on Unix?

 

Thats the first table that I used and it doesn't list the characters that I'm looking for.  I'm using a hex editor, but for some reason my function isn't finding and removing the characters.

 

I think the problem is that the original text file is created on a mac and the web server is a windows based machine.  Here's one of the special characters in the text document that I can't seem to remove, I know if I can find the hex equivalent for it I can replace it with what needs to be there instead.

 

one of the characters I need to remove - Ê

Link to comment
Share on other sites

That is 0xCA.  I just did:

 

<?php

echo dechex(ord("Ê"));

?>

 

Thanks DarkWater, thats twice now you've helped me.  The project would be much easier if their editors would simply listen to me and not use things like that and hard returns.  They can tell me what the code is, what it's used for on the print machine but have no clue how not to use them in the text document for their data.

 

Now the question is, if I use that echo command to find the hex code, do I simply put the 0x before what it echo's' as the result?  Maybe my hex editor isn't that great because I thought the one I had before this one gave me all that information.  Any recommendations on a hex editor?

Link to comment
Share on other sites

Yes, you put 0x in front of it to represent it as HEX in php.  Then you can do:

 

$string = str_replace(chr(0xCA), '', $string);

 

I use GHex, but then again, I'm on Ubuntu.  I'm pretty sure XVI32 is what I used on Windows.

 

When working with more than one hex value would this statement work?

 

$cleantext = str_replace(chr(0x0B), "\n", $contents);

$cleantext = str_replace('chr(0xCA)|chr(somehexvaluehere)', ' ', $cleantext);

Link to comment
Share on other sites

No, I remember his last thread.  He gets some articles or something and the place where he gets them from has all this weird print characters in it or something.  Last time he had random vertical tabs in there (0x0B), and this solution worked.  Should work again. =P

Link to comment
Share on other sites

preg_replace( '/[^\\041-\\176\\s]/', '', $subject )

 

Will remove all characters not on a US keyboard. If you want to strip vertical tabs too, replace it with this

 

preg_replace( '/[^\\040-\\176\\r\\n\\t]/', '', $subject )

Link to comment
Share on other sites

No, I remember his last thread.  He gets some articles or something and the place where he gets them from has all this weird print characters in it or something.  Last time he had random vertical tabs in there (0x0B), and this solution worked.  Should work again. =P

 

Yeap, you remembered right, the random vertical tabs are the same as line breaks in a normal text file.  Their editing software uses the special characters to insert the articles for printing into the actual news printing machine.  They want to use the same file they use for the news print to enter the article data into a database.

I spoke with one of the editors today and the software developer for the news print software used key values (characters) that he felt weren't used in other programs anymore to create a text document that is fed into the print machine.  Finding and removing all of those characters from the text file has proven to be a bit of a challenge though.  Like finding out what the hex character for a hard return is, without actually seeing one.

Link to comment
Share on other sites

Lets try this, each character that I need to remove in the text document does a specific task on the news print machine.

 

ie -

 

the hex character chr(0x0B) would be a line break in an article.

While hex character chr(0xCA) is an added space in the line.

 

So I'm figuring out what each special hex character is suppose to represent and then replace it accordingly.  Anything else will be considered trailing garbage and then removed from the text file.

 

Where I would use something like -

 

preg_replace( '/[^\\041-\\176\\s]/', '', $subject )

Link to comment
Share on other sites

Got it, thanks for the help DarkWater and discomatt.  I ended up using both suggestions and got the text files information into the database without any errors at all (finally).  The XVI32 worked great as well.

 

Now if I can just get them to edit their own text file, check for line errors and make sure that the lines match up.  So an articles line would look like

 

"Earth's top scientist are working on the problem, but have no ideal when they will find a solution to the global warming crisis that we are currently facing."

 

instead of -

 

"Earth's top scientist are working

on the problem,                                              but have no ideal when they will find a solution to the

global    warming crisis that

                                        we are currently

 

facing."

 

This one is solved, thanks again everyone.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.