zintani

show the whole array

zintani replied to zintani's topic in Regex Help

Thanks for both of you. The second answer was what I intended.

code help

zintani posted a topic in PHP Coding Help

Hi, Can you tell me please what this message is shown when I execute this script while($row = mysql_fetch_array($query)) { $email = $row ['rvalue']; echo "$email "; {$sql =( "SELECT message.body FROM message JOIN recipientinfo ON message.mid = recipientinfo.mid WHERE message.sender = 'john.arnold@enron.com' AND recipientinfo.rvalue = '$email' AND recipientinfo.rtype = 'TO' "); $query = mysql_query ($sql)or die("Query: {$query} Error: ".mysql_error()); $show = mysql_num_rows($query); echo "$show "; } }

show the whole array

zintani posted a topic in Regex Help

Hi Guys, <?php $file = "I went back home to see my family as I was studying in China. By the time I arrived home I was so hungry and the weather was cloudy."; $words = array("see", "reader", "stud", "China", "cloudy", "hungry", "answer", "el", "prefer"); preg_match_all('/\b(' . implode("|", array_map("preg_quote", $words)) . ')/i', $file, $foundwords); foreach ($foundwords[0] as &$value) { echo " $value"; } $sim = array_count_values($foundwords[0]); print_r ($sim); print_r(count ($sim)); $max = max ($sim); echo " ".$max." "; foreach ($sim as $key=> $value) { $norm =($value/$max); echo $key. " = $norm "; } ?> and the results of this is as follows see stud China hungry cloudy 5 Array ( [see] => 1 [stud] => 1 [China] => 1 [hungry] => 1 [cloudy] => 1 ) 5 1 see = 1 stud = 1 China = 1 hungry = 1 cloudy = 1 and my purpose is to show the whole $words = array("see", "reader", "stud", "China", "cloudy", "hungry", "answer", "el", "prefer"); and to write 0 if they don't appear and one if they appear. NOT just what PREG_MATCH_ALL brings.

parsing email replies

zintani replied to zintani's topic in Regex Help

Thanks for your help but I found another corpus that doesn't need to parse emai replies.

insertion from for loop

zintani posted a topic in PHP Coding Help

Hello Guys, With this code I am able to match two strings, where one of them is standard and the other is changeable. This doesn't matter. What it does matter that I want to insert the values into a database consisted of a number of columns. As you can see in the code every iteration it gives one value of the whole insertion. I hope I made my question clear. <?php preg_match_all('/\b(' . implode("|", array_map("preg_quote", $dictionary)) . ')/i', $file, $foundwords); $sim = array_count_values($foundwords[0]); $max = max ($sim); foreach ($sim as $key=> $value) { $norm = ($value/$max); echo $key. " = $norm "; } $foundwords[0] = array_flip ($foundwords[0]); ksort($foundwords[0]); foreach ($foundwords[0] as $key=> $value) { echo " $key "; // [color=red]after here how can I insert all the values in one row[/color] } ?>

October 9, 2011

parsing email replies

zintani posted a topic in Regex Help

Hi Guys, I am working on a project and the obstacle now which blocks my way is to divide the email to fragments, each of which has an email topic or reply. For example: So the email will be divide into this different colours when it finds an email header and save those results into separate files. Another thing; is there a possible way to extract the headers from this text file? Thanks in advance.

tf-idf code help

zintani posted a topic in Third Party Scripts

Hello, I would like to know how to calculate identical words in the same document without specifying it, just to do it automatically for all words found in that document. <?php /** * Information Retrievel * * Class used to explore information retrieval theory and concepts. */ define("DOC_ID", 0); define("TERM_POSITION", 1); class IR { public $num_docs = 0; public $corpus_terms = array(); /* * Show Documents * * Helper function that shows the contents of your corpus documents. * * @param array $D document corpus as array of strings */ function show_docs($D) { $ndocs = count($D); echo $ndocs; for($doc_num=0; $doc_num < $ndocs; $doc_num++) { ?> Document #<?php echo ($doc_num+1); ?>: <?php echo $D[$doc_num]; ?> <?php } } /* * Create Index * * Creates an inverted index from the supplied corpus documents. * Inverted index stored in corpus_terms array. * * @param array $D document corpus as array of strings */ function create_index($D) { $this->num_docs = count($D); for($doc_num=0; $doc_num < $this->num_docs; $doc_num++) { // zero array containing document terms $doc_terms = array(); // simplified word tokenization process $doc_terms = explode(" ", $D[$doc_num]); // here is where the indexing of terms to document locations happens $num_terms = count($doc_terms); for($term_position=0; $term_position < $num_terms; $term_position++) { $term = strtolower($doc_terms[$term_position]); $this->corpus_terms[$term][]=array($doc_num, $term_position); } } } /* * Show Index * * Helper function that outputs inverted index in a standard format. */ function show_index() { // sort by key for alphabetically ordered output ksort($this->corpus_terms); print_r ($this->corpus_terms); // output a representation of the inverted index foreach($this->corpus_terms AS $term => $doc_locations) { echo "$term: "; foreach($doc_locations AS $doc_location) echo "{".$doc_location[DOC_ID].", ".$doc_location[TERM_POSITION]."} "; echo " "; } } /* * Term Frequency * * @param string $term * @return frequency of term in corpus */ function tf($term) { $term = strtolower($term); return count($this->corpus_terms[$term]); } /* * Number Documents With * * @param string $term * @return number of documents with term */ function ndw($term) { $term = strtolower($term); $doc_locations = $this->corpus_terms[$term]; $num_locations = count($doc_locations); $docs_with_term = array(); for($doc_location=0; $doc_location < $num_locations; $doc_location++) $docs_with_term[$i]++; return count($docs_with_term); } /* * Inverse Document Frequency * * @param string $term * @return inverse document frequency of term */ function idf($term) { return log(($this->num_docs)/$this->ndw($term)); } } $tf = $ir->tf($term); ?> This is the second code <?php include "php.php"; $D[0] = "Shipment of gold delivered in a fire delivery"; $D[1] = "Delivery of silver arrived in a silver truck of silver silver"; $D[2] = "Shipment of gold arrived in a silver truck"; $ir = new IR(); echo "Corpus:"; $ir->show_docs($D); $ir->create_index($D); echo "Inverted Index:"; $ir->show_index(); $term = "silver"; $tf = $ir->tf($term); $ndw = $ir->ndw($term); $idf = $ir->idf($term); echo ""; echo "Term Frequency of '$term' is $tf "; echo "Number Of Documents with $term is $ndw "; echo "Inverse Document Frequency of $term is $idf"; echo ""; ?> Instead of typing this how can I do it automatically for all terms appear. Furthermore, how can I do it not for all document but one by one. The output should be Words of document D [0] is //$D[0] = "Shipment of gold delivered in a fire delivery by a delivery man"; shipment = 1 // one time written. of = 1. gold = 1 delivered =1 in =1 a = 2 fire=1 delivery = 2 by =1 and so on for the others.

October 6, 2011

porter stemmer

zintani posted a topic in Third Party Scripts

Hello, Here there is a code to stem words, and the code is called porter stemmer. It is such a rich code, and the problem here I couldn't make it work, i tried to retrieve some information form the database but it didn't work for me. Any help <?php /** * Copyright (c) 2005 Richard Heyes (http://www.phpguru.org/) * * All rights reserved. * * This script is free software. */ /** * PHP5 Implementation of the Porter Stemmer algorithm. Certain elements * were borrowed from the (broken) implementation by Jon Abernathy. * * Usage: * * $stem = PorterStemmer::Stem($word); * * How easy is that? */ class PorterStemmer { /** * Regex for matching a consonant * @var string */ private static $regex_consonant = '(?:[bcdfghjklmnpqrstvwxz]|(?<=[aeiou])y|^y)'; /** * Regex for matching a vowel * @var string */ private static $regex_vowel = '(?:[aeiou]|(?<![aeiou])y)'; /** * Stems a word. Simple huh? * * @param string $word Word to stem * @return string Stemmed word */ public static function Stem($word) { if (strlen($word) <= 2) { print_r($word); } $word = self::step1ab($word); $word = self::step1c($word); $word = self::step2($word); $word = self::step3($word); $word = self::step4($word); $word = self::step5($word); return $word; } Stem("tjhgkbnlk."); /** * Step 1 */ private static function step1ab($word) { // Part a if (substr($word, -1) == 's') { self::replace($word, 'sses', 'ss') OR self::replace($word, 'ies', 'i') OR self::replace($word, 'ss', 'ss') OR self::replace($word, 's', ''); } // Part b if (substr($word, -2, 1) != 'e' OR !self::replace($word, 'eed', 'ee', 0)) { // First rule $v = self::$regex_vowel; // ing and ed if ( preg_match("#$v+#", substr($word, 0, -3)) && self::replace($word, 'ing', '') OR preg_match("#$v+#", substr($word, 0, -2)) && self::replace($word, 'ed', '')) { // Note use of && and OR, for precedence reasons // If one of above two test successful if ( !self::replace($word, 'at', 'ate') AND !self::replace($word, 'bl', 'ble') AND !self::replace($word, 'iz', 'ize')) { // Double consonant ending if ( self::doubleConsonant($word) AND substr($word, -2) != 'll' AND substr($word, -2) != 'ss' AND substr($word, -2) != 'zz') { $word = substr($word, 0, -1); } else if (self::m($word) == 1 AND self::cvc($word)) { $word .= 'e'; } } } } return $word; } /** * Step 1c * * @param string $word Word to stem */ private static function step1c($word) { $v = self::$regex_vowel; if (substr($word, -1) == 'y' && preg_match("#$v+#", substr($word, 0, -1))) { self::replace($word, 'y', 'i'); } return $word; } /** * Step 2 * * @param string $word Word to stem */ private static function step2($word) { switch (substr($word, -2, 1)) { case 'a': self::replace($word, 'ational', 'ate', 0) OR self::replace($word, 'tional', 'tion', 0); break; case 'c': self::replace($word, 'enci', 'ence', 0) OR self::replace($word, 'anci', 'ance', 0); break; case 'e': self::replace($word, 'izer', 'ize', 0); break; case 'g': self::replace($word, 'logi', 'log', 0); break; case 'l': self::replace($word, 'entli', 'ent', 0) OR self::replace($word, 'ousli', 'ous', 0) OR self::replace($word, 'alli', 'al', 0) OR self::replace($word, 'bli', 'ble', 0) OR self::replace($word, 'eli', 'e', 0); break; case 'o': self::replace($word, 'ization', 'ize', 0) OR self::replace($word, 'ation', 'ate', 0) OR self::replace($word, 'ator', 'ate', 0); break; case 's': self::replace($word, 'iveness', 'ive', 0) OR self::replace($word, 'fulness', 'ful', 0) OR self::replace($word, 'ousness', 'ous', 0) OR self::replace($word, 'alism', 'al', 0); break; case 't': self::replace($word, 'biliti', 'ble', 0) OR self::replace($word, 'aliti', 'al', 0) OR self::replace($word, 'iviti', 'ive', 0); break; } return $word; } /** * Step 3 * * @param string $word String to stem */ private static function step3($word) { switch (substr($word, -2, 1)) { case 'a': self::replace($word, 'ical', 'ic', 0); break; case 's': self::replace($word, 'ness', '', 0); break; case 't': self::replace($word, 'icate', 'ic', 0) OR self::replace($word, 'iciti', 'ic', 0); break; case 'u': self::replace($word, 'ful', '', 0); break; case 'v': self::replace($word, 'ative', '', 0); break; case 'z': self::replace($word, 'alize', 'al', 0); break; } return $word; } /** * Step 4 * * @param string $word Word to stem */ private static function step4($word) { switch (substr($word, -2, 1)) { case 'a': self::replace($word, 'al', '', 1); break; case 'c': self::replace($word, 'ance', '', 1) OR self::replace($word, 'ence', '', 1); break; case 'e': self::replace($word, 'er', '', 1); break; case 'i': self::replace($word, 'ic', '', 1); break; case 'l': self::replace($word, 'able', '', 1) OR self::replace($word, 'ible', '', 1); break; case 'n': self::replace($word, 'ant', '', 1) OR self::replace($word, 'ement', '', 1) OR self::replace($word, 'ment', '', 1) OR self::replace($word, 'ent', '', 1); break; case 'o': if (substr($word, -4) == 'tion' OR substr($word, -4) == 'sion') { self::replace($word, 'ion', '', 1); } else { self::replace($word, 'ou', '', 1); } break; case 's': self::replace($word, 'ism', '', 1); break; case 't': self::replace($word, 'ate', '', 1) OR self::replace($word, 'iti', '', 1); break; case 'u': self::replace($word, 'ous', '', 1); break; case 'v': self::replace($word, 'ive', '', 1); break; case 'z': self::replace($word, 'ize', '', 1); break; } return $word; } /** * Step 5 * * @param string $word Word to stem */ private static function step5($word) { // Part a if (substr($word, -1) == 'e') { if (self::m(substr($word, 0, -1)) > 1) { self::replace($word, 'e', ''); } else if (self::m(substr($word, 0, -1)) == 1) { if (!self::cvc(substr($word, 0, -1))) { self::replace($word, 'e', ''); } } } // Part b if (self::m($word) > 1 AND self::doubleConsonant($word) AND substr($word, -1) == 'l') { $word = substr($word, 0, -1); } return $word; } /** * Replaces the first string with the second, at the end of the string. If third * arg is given, then the preceding string must match that m count at least. * * @param string $str String to check * @param string $check Ending to check for * @param string $repl Replacement string * @param int $m Optional minimum number of m() to meet * @return bool Whether the $check string was at the end * of the $str string. True does not necessarily mean * that it was replaced. */ private static function replace(&$str, $check, $repl, $m = null) { $len = 0 - strlen($check); if (substr($str, $len) == $check) { $substr = substr($str, 0, $len); if (is_null($m) OR self::m($substr) > $m) { $str = $substr . $repl; } return true; } return false; } /** * What, you mean it's not obvious from the name? * * m() measures the number of consonant sequences in $str. if c is * a consonant sequence and v a vowel sequence, and <..> indicates arbitrary * presence, * * <c><v> gives 0 * <c>vc<v> gives 1 * <c>vcvc<v> gives 2 * <c>vcvcvc<v> gives 3 * * @param string $str The string to return the m count for * @return int The m count */ private static function m($str) { $c = self::$regex_consonant; $v = self::$regex_vowel; $str = preg_replace("#^$c+#", '', $str); $str = preg_replace("#$v+$#", '', $str); preg_match_all("#($v+$c+)#", $str, $matches); return count($matches[1]); } /** * Returns true/false as to whether the given string contains two * of the same consonant next to each other at the end of the string. * * @param string $str String to check * @return bool Result */ private static function doubleConsonant($str) { $c = self::$regex_consonant; return preg_match("#$c{2}$#", $str, $matches) AND $matches[0]{0} == $matches[0]{1}; } /** * Checks for ending CVC sequence where second C is not W, X or Y * * @param string $str String to check * @return bool Result */ private static function cvc($str) { $c = self::$regex_consonant; $v = self::$regex_vowel; return preg_match("#($c$v$c)$#", $str, $matches) AND strlen($matches[1]) == 3 AND $matches[1]{2} != 'w' AND $matches[1]{2} != 'x' AND $matches[1]{2} != 'y'; } } ?>

October 3, 2011

read file two and see how many words occur

zintani posted a topic in Regex Help

Hi, I have got two files, each of which is a text document. The first one has a bunch of words while the second one has a proper text document properties (talks about something). I would like to check the second file against the first one to see how many words of the file one mentioned in the second file. For example, File one: ('see', 'read', 'study', 'China', 'cloudy', 'hungry', 'answer'.) File two:( I went back home to see my family as I was studying in China. By the time I arrived home I was so hungry and the weather was cloudy). The output should be see read study cloudy hungry answer China Yes No Yes Yes Yes NO Yes where all the yes's mean the word did appear in the second file, nos mean the word didn't appear. Any suggestions please or any guidance on how to do it. I tried to use count but I am not good at loops, it makes my brain looping in a closed loop. Thanks in advance.

Removing a substring!

zintani replied to zintani's topic in Regex Help

Thanks thorpe, I think I didn't explain my question well. The question is if there is another string for example: $string1 = "My ambition is to become a professional php programmer. This ambition in my stage could be impossible but with persistent and commitment nothing is impossible. So I wish good luck for everyone." ; $string2 = "My ambition is to become a professional php programmer. I started learning php last year and It was php server side part and now I am working on regex part. This ambition in my stage could be impossible but with persistent and commitment nothing is impossible. So I wish good luck for everyone." ; and for both of them I need to start deleting from nothing is impossible.

Removing a substring!

zintani posted a topic in Regex Help

Hello, Thanks to the members of the forum, I was able to make a good process. Now I am trying to delete a specific string of words, for example: $string = "My ambition is to become a professional php programmer. This ambition in my stage could be impossible but with persistent and commitment nothing is impossible. So I wish good luck for everyone." ; I would lke to know if it is possible to delete "nothing is impossible" and the following words no matter what are these words. In this example "So I wish good luck for everyone.".

comparing two text documents

zintani replied to zintani's topic in Regex Help

Thanks mjdamato, I was doing the same idea for the array. $more = array ('/\ba\b/i','/\babout\b/i','/\babove\b/i','/\bacross\b/i','/\bafter\b/i','/\bagain\b/i', '/\bagainst\b/i','/\ball\b/i','/\balmost\b/i','/\balone\b/i','/\balong\b/i','/\balready\b/i', '/\balso\b/i','/\balthough\b/i','/\balways\b/i','/\bamong\b/i','/\ban\b/i','/\band\b/i','/\banother\b/i', '/\bany\b/i','/\banybody\b/i','/\banyone\b/i','/\banything\b/i','/\banywhere\b/i','/\bare\b/i','/\barea\b/i', '/\bareas\b/i') ; which was time consuming and I wanted to add /bcharacter/b automatically and here your code.

comparing two text documents

zintani replied to zintani's topic in Regex Help

Thanks ManiacDan, The code works just fine. So can I do it for more than one word. For example, to, at, from, the,.., etc.. echo preg_replace('/\band\b/i','/\bto\b/i','/\bfrom\b/i','/\bat\b/i' '', $a); This is my code but I got error message.

comparing two text documents

zintani posted a topic in Regex Help

Hello, Actually I would like to ask if there is a way to remove the stop words (the, and, is, are,..,etc) from a text without removing other words such as (stand) where the last three letters (and) are removed when to use replace function. After that, I managed to bring text documents saved in a database Mysql up and I would like to know if there is a way to compare the similarities between those documents.

Send hidden values

zintani replied to zintani's topic in PHP Coding Help

Thank you all, the problem has been solved, just was a small mistake in the punctuation.

Sign In

Posts

Joined

Last visited

Profile Information

zintani's Achievements

Member (2/5)

Reputation

show the whole array

code help

show the whole array

parsing email replies

insertion from for loop

parsing email replies

tf-idf code help

porter stemmer

read file two and see how many words occur

Removing a substring!

Removing a substring!

comparing two text documents

comparing two text documents

comparing two text documents

Send hidden values

Browse

Activity

Important Information