Jump to content

Get entity numbers (not names) for unique characters like ö? - for XML document


ultrus

Recommended Posts

Hello,

I created an XML file to import into an InDesign XML workflow. InDesign is tripping on it though as it doesn't like importing characters like 'ö', nor does it like to import the xml entity name of 'ö'. It does however enjoy importing the entity number of 'ö'.

 

How do I clean up a string to make this work using PHP? Here's what I have so far that doesn't seem to be doing anything yet that I can see (maybe my browser is converting it before I see the results?):

 

<?php
$foreignString = 'Dr. Wörner';
//This is what I want: Dr. W [&]#246;rner - take out brackets, forums are doing something odd with &#246;rner
//This is what I DON'T want: Dr. Wörner
//This is what I DON'T want: Dr. Wörner

//referenced from http://www.lazycat.org/php-convert-entities.php
$cleanString = htmlspecialchars(
  html_entity_decode($foreignString, ENT_QUOTES, 'UTF-8'), 
  ENT_QUOTES, 'UTF-8'
);

print $cleanString;
?>

 

Am I on the right track? Any thoughts on how to finalize?

 

Best regards,

 

Chris

No dice? I'm starting to rev up the Google engine on this one again today. I'll post if I find something, but feedback is welcome if you have thoughts. :)

 

Plan B:

I could make my own string replace function based on this table:

http://www.w3schools.com/tags/ref_entities.asp

Yeah I was mulling around there for a while. What I ended up doing that worked this morning is something like this:

	function replaceEntities($string) {
	$string = htmlentities($string);
	$string = str_replace(
		array(
			""",
			"'",
			"..."),
		array(
			"[&]#34;",
			"[&]#39;",
			"..."), //minus the brackets
		$string);
	return $string;
}

 

Worked great for what I needed. Thanks for the reply. :)

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.