Jump to content


Photo

How do you replace the comma for a delimeter


  • Please log in to reply
10 replies to this topic

#1 Olney

Olney
  • New Members
  • Pip
  • Newbie
  • 6 posts

Posted 04 October 2006 - 11:35 PM

I have a bit of code that I'm strugling with (I'm not a php Pro)
I hope it's ok to ask how to do this.

I have an open source script I'm trying to modify that takes tags into the submit form.

The problem is the delimeter is a comma  " ,"

I was to use it in Japanese where the double byte comma would look like this  "„ÄÅ" (If written in ASCII)

Can anyone give me advice on which part of the code to change?

function tags_normalize_string($string) {
	return preg_replace('/[;,]\s+$/', ",",$string);
}

function tags_insert_string($link, $lang, $string, $date = 0) {
	global $db;

	$string = tags_normalize_string($string);
	if ($date == 0) $date=time();
	$words = preg_split('/[,;]+/', $string);
	if ($words) {
		$db->query("delete from tags where tag_link_id = $link");
		foreach ($words as $word) {
			$word=trim($word);
			if (!$inserted[$word] && !empty($word)) {
				$db->query("insert into tags (tag_link_id, tag_lang, tag_words, tag_date) values ($link, '$lang', '$word', from_unixtime($date))");
				$inserted[$word] = true;
			}
		}
		return true;
	}
	return false;

}

function tags_get_string($link, $lang) {
	global $db;

	$counter = 0;
	$res = $db->get_col("select tag_words from tags where tag_link_id=$link and tag_lang='$lang'");
	if (!$res) return false;

	foreach ($db->get_col("select tag_words from tags where tag_link_id=$link and tag_lang='$lang'") as $word) {
		if($counter>0)  $string .= ', ';
		$string .= $word;
		$counter++;
	}
	return $string;
}


I'm almost sure this is where to change the code but not sure which part to change.
I've been going at it by trial & error with no luck.
Thanks in advance if anyone knows.

#2 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 05 October 2006 - 03:40 PM

I'm confused; do you want to use the double comma as a delimiter?

<meta charset="utf-8"/>
<pre>
<?php
	$double_comma = pack('c*', 0xE2, 0x80, 0x9E);
	$string = "A{$double_comma}B${double_comma}C${double_comma}";
	$result = preg_split("/$double_comma/", $string, -1, PREG_SPLIT_NO_EMPTY);
	print_r($result);
?>
</pre>

Array
(
    [0] => A
    [1] => B
    [2] => C
)


Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#3 Olney

Olney
  • New Members
  • Pip
  • Newbie
  • 6 posts

Posted 05 October 2006 - 04:17 PM

Thank you for the reply.
No I don't want to use a double comma
I would like to use the Japanese Double byte encoded comma

If I write it in ASCII Characters it would look like this

"„ÄÅ"

So currently
This is the delimiter " ,"
But I'm not sure where to put "„ÄÅ"
In the code to make it the delimiter

Hypothetically it's like saying instead of
" ,"
as the delimiter I would like to put " -" or something.

Either I would put the ASCII code or Native Japanese comma in the code
but I'm just not sure where, I'm stumped.



#4 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 05 October 2006 - 05:02 PM

I'm not familiar with this area, but it looks like you want something from the "Multibyte String" suite--maybe mb_convert_encoding.
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#5 Olney

Olney
  • New Members
  • Pip
  • Newbie
  • 6 posts

Posted 05 October 2006 - 05:45 PM

Thank you but using mb_convert would probably screw me up more than just finding out what part of the above code to change.
Even though it's Japanese the encode is still UTF-8.

Since Japanese won't type in a Latin comma, I'm just trying to take the latin comma out of the above code & put the Japanese comma in the actual code. This would be the delimeter.

Instead of thinking of it as a foreign language let's say I was trying to change the delimeter in the code from
" ," to " 45"

$words = preg_split('[b]/[,;]+[/b]/', $string);
	if ($words) {


So I know somewhere in the code it goes from the above to

$words = preg_split('/[b][45;][/b]+/', $string);
	if ($words) {


But I'm just not sure in the whole code what to change so that if a user types in
Cars 45 Trucks (from Cars, Trucks)

It makes sure it seperates the two terms Cars & Trucks.

function tags_normalize_string($string) {
	return preg_replace('/[;,]\s+$/', ",",$string);
}

function tags_insert_string($link, $lang, $string, $date = 0) {
	global $db;

	$string = tags_normalize_string($string);
	if ($date == 0) $date=time();
	$words = preg_split('/[,;]+/', $string);
	if ($words) {
		$db->query("delete from tags where tag_link_id = $link");
		foreach ($words as $word) {
			$word=trim($word);
			if (!$inserted[$word] && !empty($word)) {
				$db->query("insert into tags (tag_link_id, tag_lang, tag_words, tag_date) values ($link, '$lang', '$word', from_unixtime($date))");
				$inserted[$word] = true;
			}
		}
		return true;
	}
	return false;

}

function tags_get_string($link, $lang) {
	global $db;

	$counter = 0;
	$res = $db->get_col("select tag_words from tags where tag_link_id=$link and tag_lang='$lang'");
	if (!$res) return false;

	foreach ($db->get_col("select tag_words from tags where tag_link_id=$link and tag_lang='$lang'") as $word) {
		if($counter>0)  $string .= ', ';
		$string .= $word;
		$counter++;
	}
	return $string;
}




#6 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 05 October 2006 - 08:00 PM

It took me a while to figure out what the Japanese comma is. How about this?

<meta charset="utf-8">
<pre>
<?php
	echo 'Japanese comma = ', $j_comma = pack('c*', 0xE3, 0x80, 0x81), '<br/>';
	echo 'String delimited with Latin and Japanese comma = ', $string = "A{$j_comma}B,C{$j_comma}", '<br/>';
	$result = preg_split("/[,$j_comma]/", $string, -1, PREG_SPLIT_NO_EMPTY);
	print_r($result);
?>
</pre>
 
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#7 Olney

Olney
  • New Members
  • Pip
  • Newbie
  • 6 posts

Posted 05 October 2006 - 08:49 PM

I really thank you for taking the time to write the above code
But I'm still not sure how to change it for the code I got.
I don't want to modify the Japanese comma for the entire program just in the code I posted above.


By looking at your example

Do i do this?

function tags_normalize_string($string) {
	return preg_replace('/[b][,$j_comma][/b]\s+$/', ",",$string);
}

function tags_insert_string($link, $lang, $string, $date = 0) {
	global $db;

	$string = tags_normalize_string($string);
	if ($date == 0) $date=time();
	$words = preg_split('/[b][,$j_comma][/b]+/', $string);
	if ($words) {
		$db->query("delete from tags where tag_link_id = $link");
		foreach ($words as $word) {
			$word=trim($word);
			if (!$inserted[$word] && !empty($word)) {
				$db->query("insert into tags (tag_link_id, tag_lang, tag_words, tag_date) values ($link, '$lang', '$word', from_unixtime($date))");
				$inserted[$word] = true;
			}
		}
		return true;
	}
	return false;

}

function tags_get_string($link, $lang) {
	global $db;

	$counter = 0;
	$res = $db->get_col("select tag_words from tags where tag_link_id=$link and tag_lang='$lang'");
	if (!$res) return false;

	foreach ($db->get_col("select tag_words from tags where tag_link_id=$link and tag_lang='$lang'") as $word) {
		if($counter>0)  $string .= ', ';
		$string .= $word;
		$counter++;
	}
	return $string;
}


#8 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 05 October 2006 - 08:58 PM

An easier solution is a combination of /u and \x{...}. Try changing your preg_split line from $words = preg_split('/[,;]+/', $string); to $words = preg_split('/[,;\x{3001}]+/u', $string);
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#9 Olney

Olney
  • New Members
  • Pip
  • Newbie
  • 6 posts

Posted 05 October 2006 - 10:11 PM

I'm getting a bit more confused, & really thank you for keep trying to explain but.
The original code works completely fine except it's an ASCII comma.

I'm realizing it might be either of 3 places I need to change

THIS
function tags_normalize_string($string) {
return preg_replace('/[;,]\s+$/', ",",$string);
TO perhaps THIS
function tags_normalize_string($string) {
return preg_replace('/[,$j_comma]\s+$/', ",",$string);


This
$words = preg_split('/[,;]+/', $string);
To THIS
$words = preg_split('/[,$j_comma]]+/', $string);


& THIS
if($counter>0)  $string .= ', ';
$string .= $word;
TO
if($counter>0)  $string .= ', '; (Not sure maybe changed?)
$string .= $word;






#10 effigy

effigy
  • Staff Alumni
  • Advanced Member
  • 3,600 posts
  • LocationIL

Posted 05 October 2006 - 10:25 PM

Ah, I didn't see the counter. In that case, I would stick with the $j_comma method. You need to make sure that your regex is surrounded in double quotes so that the $j_comma variable will interpolate.
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

#11 Olney

Olney
  • New Members
  • Pip
  • Newbie
  • 6 posts

Posted 05 October 2006 - 10:46 PM

Hey guy
Thanks for the help but I got a feeling now this code change is just beyond my level.
I appreciate your time & hope that it helps someone else.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users