Jump to content

Regular Expression For Greek Words


strimpak

Recommended Posts

First of all this is my first post to this great site. Excuse me for not being such an expert in defining my exact problem. I am trying to use spellChecker extension for Horde Webmail. This extension uses aspell. This thing works like a charm for the English version of the spellchecker. I want to use this spellchecker for Greek language. It does not work for greek words. I debugged the whole source code to make a point. Below i ll describe the way this works.

 

When you click the spellcheck it calls the aspell as a cmd and this returns two arrays, one with the bad words and one with the suggestions. This arrays are converted to a Json string and passed to a javascript function that uses 2 regexp in order to match every entry of each bad word. These regular expression works with english strings as I referred like a charm. The way it works is trying to match every bad work and substitute it with a <span> element. The problem is that with greek strings the word matching does not work. Trying to find a solution I encoded to utf8 but this failed. After reading a lot on the internet I found a possible solution with XRegExp but I couldn't make it work.

 

My last chance was to transfer this work to php. So I made xmlHttpRequest for each bad word and use php to make the regexp replacements. But with preg_replace I could not make it work. Below is the actual javascript code:

//node is the bad word


var re_text = '<span index="'+ (i++) + '" class="spellcheckIncorrect">'+ node + '</span>';
//content is the whole string
content = content.replace(new RegExp("(?:^|\\B)" + RegExp.escape(node) + "(?:\\b|$)", 'g'), re_text);

// Go through and see if we matched anything inside a tag (i.e.
// class/spellcheckIncorrect is often matched if using a
// non-English lang).
content = content.replace(new RegExp("(<[^>]*)" + RegExp.escape(re_text) + "([^>]*>)", 'g'), '\$1' + node +'\$2');

 

So I would like to ask your valuable help to bypass my problem with the greek words. I am not a regular expression specialist but I realize that this regex tries to match all words like "node" as a word identified as a word using \b or as the first(^) or last word($) in the whole string , using g.

 

Can anyone suggest a way either using javascript or php (I have implementing to pass all needed information to php and back to javascript using xhr) or any other way.

 

 

The info about the server I use as a development and production is a solaris 10, php5, apache2.

 

Thanks in advance, and sorry if I couldn't describe well my problem.

Edited by strimpak
Link to comment
Share on other sites

//node is the bad word
var re_text = '<span index="'+ (i++) + '" class="spellcheckIncorrect">'+ node + '</span>';
//content is the whole string
content = content.replace(new RegExp("(?:^|\\B)" + RegExp.escape(node) + "(?:\\b|$)", 'g'), re_text);

// Go through and see if we matched anything inside a tag (i.e.
// class/spellcheckIncorrect is often matched if using a
// non-English lang).
content = content.replace(new RegExp("(<[^>]*)" + RegExp.escape(re_text) + "([^>]*>)", 'g'), '\$1' + node +'\$2');

 

Ok let me describe you what have I done till now.

I passed all the bad words and the content to a php script. I encode the word and contents to utf8 and I make a match like this:

$content = preg_replace("/".$node."/", $retext, $content);

This matches greek paterns and replaces with the span. I think it's a step forward. Now my problem is how to distinct that are separate words. The regexp /(?:^|\b)greekword(?:\b|$)/g does not work. I thing that preg_replace has as default no limit, so I threw the g. Does anyone have an idea how I can do it?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.