Pangu Posted August 17, 2015 Share Posted August 17, 2015 i have the following php-snipped to sort text-blocks (source is $narray[] in my example) by topic: <?php $narray[]="1 bla blb ala bla bla facebook dfg"; $narray[]="2 b la bl twitter ba la bla bl dfg a"; $narray[]="3 bla sdf asd fb la fg dfg blb ala bla bla clinton"; $narray[]="4 b lad fg bl obama ba la dfg clinton dsf bla bla"; $narray[]="5 bla blb dfg dfg ala bla bla ds fg mircosoft"; $narray[]="6 b la bl Obama bd fg sdf a la bla bla"; $narray[]="7 db la dbl obama bd dfg sdf ad la bla bla"; $narray[]="8 bla df gd sfg blb ala bla bla twitter"; $narray[]="9 s ons ti ges sdf as df"; $narray[]="10 Twitter s ons ti ges sdf as df"; $narray[]="11 s ons ti ges Obama sdf as df"; $narray[]="12 s Clinton ons ti ges sdf as df"; function extractCommonWords($string){ $stopWords = array('about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www'); $string = preg_replace('/\s\s+/i', '', $string); // replace whitespace $string = trim($string); // trim the string $string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too… $string = strtolower($string); // make it lowercase preg_match_all('/\b.*?\b/i', $string, $matchWords); $matchWords = $matchWords[0]; foreach ( $matchWords as $key=>$item ) { if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) { unset($matchWords[$key]); } } $wordCountArr = array(); if ( is_array($matchWords) ) { foreach ( $matchWords as $key => $val ) { $val = strtolower($val); if ( isset($wordCountArr[$val]) ) { $wordCountArr[$val]++; } else { $wordCountArr[$val] = 1; } } } arsort($wordCountArr); $wordCountArr = array_slice($wordCountArr, 0, 20); return $wordCountArr; } $anzahlnachrichten = count($narray); $text = implode(" ", $narray); echo "Text:<br>",$text,"<br><br>"; $words = extractCommonWords($text); echo "Found Keywords:<br><font color=red>",implode(', ', array_keys($words)),"</font>"; echo "<br><br>Sort by Keyword:<br>"; for ($i2=0; $i2<20;$i2++) { for ($i=0;$i<$anzahlnachrichten;$i++) { $keyword = array_keys($words)[$i2]; $textx=strtolower($narray[$i]); //echo $i, $keyword, "-", $textx,": "; if(strpos($textx,$keyword)!==false) { unset($narray[$i]); $a="<font color=red>"; $a.=$keyword; $a.="</font> :: "; $a.=$textx; $xarray[$i2][]=$a; } } } var_dump($xarray); echo "<br>/everything else without keyword:<br>"; var_dump($narray); ?> what i can't find out: if one textblock "$narray[x]" has more than one keywords, it should be combined to the other keywords, because i suggest it should have the same topic. how can i combine/grouped textblocks with same topic in my script? .-> in my example "obama" and "clinton" should be combined: there is text with only "clinton" and there is text with only "obama", but one text has "obama" AND "clinton" in it, therefore the script should dedect, that they are both the same topic ("humans"). any suggestions? thx Quote Link to comment Share on other sites More sharing options...
Pangu Posted August 17, 2015 Author Share Posted August 17, 2015 (edited) output should be something like this: obama&clinton: - 4 b lad fg bl obama ba la dfg clinton dsf bla bla - 6 b la bl obama bd fg sdf a la bla bla - 7 db la dbl obama bd dfg sdf ad la bla bla - 11 s ons ti ges obama sdf as df - 3 bla sdf asd fb la fg dfg blb ala bla bla clinton - 12 s Clinton ons ti ges sdf as df twitter: - 2 b la bl twitter ba la bla bl dfg a - 8 bla df gd sfg blb ala bla bla twitter - 10 twitter s ons ti ges sdf as df mircosoft - 5 bla blb dfg dfg ala bla bla ds fg mircosoft facebook - 1 bla blb ala bla bla facebook dfg everything else without relevant keyword:-9 s ons ti ges sdf as df Edited August 17, 2015 by Pangu Quote Link to comment Share on other sites More sharing options...
Barand Posted August 17, 2015 Share Posted August 17, 2015 My 0.02 worth <?php $narray[]="1 bla blb ala bla bla facebook dfg"; $narray[]="2 b la bl twitter ba la bla bl dfg a"; $narray[]="3 bla sdf asd fb la fg dfg blb ala bla bla clinton"; $narray[]="4 b lad fg bl obama ba la dfg clinton dsf bla bla"; $narray[]="5 bla blb dfg dfg ala bla bla ds fg mircosoft"; $narray[]="6 b la bl Obama bd fg sdf a la bla bla"; $narray[]="7 db la dbl obama bd dfg sdf ad la bla bla"; $narray[]="8 bla df gd sfg blb ala bla bla twitter"; $narray[]="9 s ons ti ges sdf about as df"; $narray[]="10 Twitter s ons ti ges sdf as df"; $narray[]="11 s ons ti ges Obama sdf as df"; $narray[]="12 s Clinton ons ti ges sdf as df"; $filtered = filter_my_array($narray); // keywords only array $kwindex = index_keywords($filtered); // index of keywords $keywords = array_keys($kwindex); $otheritems = []; // // combine indexes // foreach ($filtered as $k => $kwarr) { if (count($kwarr) == 0) { $otheritems[] = $k; } elseif (count($kwarr) > 1) { $newkw = join(' & ', $kwarr); $occurs = []; foreach ($kwarr as $kw) { if (isset($kwindex[$kw])) { $occurs = array_merge($occurs, $kwindex[$kw]); // combine individual lists unset($kwindex[$kw]); // then remove them } } sort($occurs); $kwindex[$newkw] = array_unique($occurs); // add the combined index } } // // create highlighting replacement textss // $replace = []; foreach ($keywords as $kw) { $replace[] = "<span class='hi'>$kw</span>"; } // // create output of the indexed list // ksort($kwindex); $output = ''; foreach ($kwindex as $kw => $items) { $output .= "<h4>$kw</h4><ul>"; foreach ($items as $i) { $output .= "<li>" . str_ireplace($keywords, $replace, $narray[$i]) . "</li>\n"; } $output .= "</ul>\n"; } if (count($otheritems) > 0) { $output .= "<h4>Non-keyword items</h4><ul>"; foreach ($otheritems as $i) { $output .= "<li>{$narray[$i]}</li>\n"; } $output .= "</ul>\n"; } /******************************************************************************* * helper functions ********************************************************************************/ function filter_my_array($array) { // reduces the lines of text to arrays of the keywords in the line $results = []; foreach ($array as $k => $str) { $str = strtolower($str); $a = array_filter(explode(' ', $str), 'remove_noise'); $results[$k] = $a; } return $results; } function remove_noise($x) { $stopWords = array('about','an','and','are','as','at','be','by','com','de','en','for','from', 'how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where', 'who','will','with','und','the','www'); return strlen($x) > 3 && !in_array($x, $stopWords); } function index_keywords($array) { // gets the line numbers containing each keyword $results = []; foreach ($array as $k => $kwarr) { foreach ($kwarr as $kw) { $results[$kw][] = $k; } } return $results; } ?> <html> <head> <title>Keyword Index</title> <style type='text/css'> .hi { font-weight: 700; color: red; } </style> </head> <body> <?=$output?> </body> </html> Results <html> <head> <title>Keyword Index</title> <style type="text/css"> .hi { font-weight: 700; color: red; } </style> </head> <body> <h4>facebook</h4><ul><li>1 bla blb ala bla bla <span class="hi">facebook</span> dfg</li> </ul> <h4>mircosoft</h4><ul><li>5 bla blb dfg dfg ala bla bla ds fg <span class="hi">mircosoft</span></li> </ul> <h4>obama & clinton</h4><ul><li>3 bla sdf asd fb la fg dfg blb ala bla bla <span class="hi">clinton</span></li> <li>4 b lad fg bl <span class="hi">obama</span> ba la dfg <span class="hi">clinton</span> dsf bla bla</li> <li>6 b la bl <span class="hi">obama</span> bd fg sdf a la bla bla</li> <li>7 db la dbl <span class="hi">obama</span> bd dfg sdf ad la bla bla</li> <li>11 s ons ti ges <span class="hi">obama</span> sdf as df</li> <li>12 s <span class="hi">clinton</span> ons ti ges sdf as df</li> </ul> <h4>twitter</h4><ul><li>2 b la bl <span class="hi">twitter</span> ba la bla bl dfg a</li> <li>8 bla df gd sfg blb ala bla bla <span class="hi">twitter</span></li> <li>10 <span class="hi">twitter</span> s ons ti ges sdf as df</li> </ul> <h4>Non-keyword items</h4><ul><li>9 s ons ti ges sdf about as df</li> </ul> </body> </html> 1 Quote Link to comment Share on other sites More sharing options...
Pangu Posted August 18, 2015 Author Share Posted August 18, 2015 (edited) thanks!! but if i use a different dataset: $narray[]="1 bla web20 blb ala bla bla facebook dfg"; $narray[]="2 b la bl twitter ba la bla bl dfg a"; $narray[]="3 bla sdf asd fb la fg dfg blb ala bla bla clinton"; $narray[]="4 b lad fg bl obama ba la dfg clinton dsf bla bla"; $narray[]="5 bla blb dfg dfg ala bla bla ds fg mircosoft"; $narray[]="6 b la bl Obama bd fg sdf a la bla bla"; $narray[]="7 db la dbl obama bd dfg sdf ad la bla bla"; $narray[]="8 bla df gd sfg blb ala bla bla twitter"; $narray[]="9 s ons ti ges sdf about as df"; $narray[]="10 Twitter s ons web20 ti ges sdf as df"; $narray[]="11 s ons ti ges Obama sdf as df"; $narray[]="12 s Clinton ons ti ges sdf as df"; $narray[]="13 s mircosoft ons facebook ti ges sdf as df"; i get double entries: 13 & 10 -> how can i get a result like: obama & clinton 3 bla sdf asd fb la fg dfg blb ala bla bla clinton 4 b lad fg bl obama ba la dfg clinton dsf bla bla 6 b la bl obama bd fg sdf a la bla bla 7 db la dbl obama bd dfg sdf ad la bla bla 11 s ons ti ges obama sdf as df 12 s clinton ons ti ges sdf as df twitter & web20-web20 & facebook-mircosoft & facebook 2 b la bl twitter ba la bla bl dfg a 8 bla df gd sfg blb ala bla bla twitter 10 twitter s ons web20 ti ges sdf as df 5 bla blb dfg dfg ala bla bla ds fg mircosoft 13 s mircosoft ons facebook ti ges sdf as df 1 bla web20 blb ala bla bla facebook dfg Non-keyword items 9 s ons ti ges sdf about as df -> "if one keyword is the same keyword as one of the keywords in another group, merge text from both of them in one groupe" Edited August 18, 2015 by Pangu Quote Link to comment Share on other sites More sharing options...
Barand Posted August 18, 2015 Share Posted August 18, 2015 -> how can i get a result like: by changing the code to meet the new requirements Quote Link to comment Share on other sites More sharing options...
Barand Posted August 19, 2015 Share Posted August 19, 2015 here's a replacement "combine indexes" section of the code // // combine indexes // uasort($filtered, function($a,$b) {return count($b) - count($a);}); foreach ($filtered as $i=>$a) { foreach ($filtered as $j=>$b) { if ($i==$j) continue; if (count($a)<2 || count($b)<2) continue; if (array_intersect($a, $b)) { $filtered[$j] = array_unique(array_merge($a,$b)); } } } foreach ($filtered as $k => $kwarr) { if (count($kwarr) == 0) { $otheritems[] = $k; } elseif (count($kwarr) > 1) { $newkw = join(' & ', $kwarr); $occurs = []; foreach ($kwarr as $kw) { if (isset($kwindex[$kw])) { $occurs = array_merge($occurs, $kwindex[$kw]); // combine individual lists unset($kwindex[$kw]); // then remove them } } sort($occurs); $newkw = join(' & ', $kwarr); $kwindex[$newkw] = array_unique($occurs); // add the combined index } } $kwindex = array_filter($kwindex); Quote Link to comment Share on other sites More sharing options...
Pangu Posted August 23, 2015 Author Share Posted August 23, 2015 (edited) thank you very much. this helps me a lot! neverthelesee it seems that it has problems in some cases with bigger data-sets: -> let's say i use this data ("headlines about Donald Trump"): $narray[]="Trump denounces violence after supporters beat Mexican man"; $narray[]="Doyle: What my dad could teach Donald Trump"; $narray[]="Bush slams Trump, defends using anchor babies"; $narray[]="Coming up Trumps: could a British TV star do a Donald and enter politics?"; $narray[]="Watch Rachel Maddow Explain Donald Trump’s ‘Genius’ Campaign on Tonight Show"; $narray[]="Trump touts making Time cover while taking heat over attack"; $narray[]="First Draft: Today in Politics: Rivals Can No Longer Ignore Donald Trump’s Long Shadow"; $narray[]="Donald Trump insists he’s conservative"; $narray[]="GOP candidates hold dueling town halls"; $narray[]="New York City has no way to fire Donald Trump"; $narray[]="Donald Trump pushes birthright citizenship to forefront of political debate"; $narray[]="Jeb Bush takes fight to Donald Trump in N.H."; $narray[]="Rand Paul explains why he wants to stop ‘birthright citizenship’"; $narray[]="Trump attacks Facebook over foreigners"; $narray[]="Donald Trump tops GOP field in Florida, Pennsylvania, second in Ohio"; $narray[]="Donald Trump draws New Hampshire town hall crowd wild; jabs Jeb Bush"; $narray[]="While in Vegas, O’Malley makes an appearance in front of Trump’s hotel"; $narray[]="Trump’s immigration plan has GOP rivals on edge"; $narray[]="Donald Trump calls out Mark Zuckerberg on immigration"; $narray[]="Deny citizenship to babies illegal immigrants in US: Donald Trump"; $narray[]="Donald Trump takes a break from the campaign trail to join a long list of celebrities to perform jury duty"; $narray[]="Trump: Deny citizenship to babies of people illegally in US"; $narray[]="Trump Says He Would Deport Illegal Immigrants"; $narray[]="From campaign to court: Trump reports for jury duty in NYC"; $narray[]="Donald Trump says he will ‘deport millions of illegal immigrants’"; $narray[]="Trump outlines immigration specifics"; $narray[]="Donald Trump to Iowa boy: ‘I am Batman’"; $narray[]="Trump blunt but vague: No birthright citizenship, millions of illegal immigrants ‘have to go’"; $narray[]="Trump: end ‘birthright citizenship’"; $narray[]="Trump: Deport children of immigrants living illegally in US"; $narray[]="DNC blasts Donald Trump, Jeb Bush for comments about women"; $narray[]="Trump says would raise visa fees to pay for Mexican border wall"; $narray[]="What does Donald Trump think of immigrants, Saudi Arabia and the Iran nuclear deal?"; $narray[]="Donald Trump Releases Plan to Combat Illegal Immigration"; $narray[]="Donald Trump releases his immigration policy on his GOP presidential campaign website"; $narray[]="Donald Trump warns that Iran deal will lead to Nuclear Holocaust"; $narray[]="Trump details domestic, foreign policies, answers critics, matches fellow challengers"; $narray[]="Donald Trump’s legacy of luxury"; $narray[]="Clinton defends, Trump attacks Saturday at the high-profile Iowa State Fair"; $narray[]="Donald Trump says he would deport all illegal immigrants as president"; $narray[]="Donald Trump breaks the rules at the Iowa State Fair"; $narray[]="Thanks, Donald, but I don’t want to be ‘cherished’ | Barbara Ellen"; $narray[]="Front-runners skirt the soapbox"; $narray[]="Hillary Clinton, Donald Trump and the Trumpcopter descend on the Iowa State Fair"; $narray[]="Op-Ed Columnist: Introducing Donald Trump, Diplomat"; $narray[]="Trump forced to break from campaign trail for jury duty, skipped five summonses since 2006"; $narray[]="Donald Trump forced to take break from campaign trail for jury service"; $narray[]="Tables turned on Trump’s chief tormentor"; $narray[]="Donald Trump will serve jury duty in NYC next week"; + add "Donald" and "Trump" to the stopwords-list-array. -> i get the following result: Array ( [1] => Array ( [1] => Coming up Trumps: could a British TV star do a Donald and enter politics? ) [2] => Array ( [2] => Trump details domestic, foreign policies, answers critics, matches fellow challengers ) [3] => Array ( [3] => Doyle: What my dad could teach Donald Trump ) [4] => Array ( [4] => Front-runners skirt the soapbox ) [5] => Array ( [5] => Donald Trump insists he’s conservative ) [6] => Array ( [6] => Donald Trump to Iowa boy: ‘I am Batman’ [7] => Clinton defends, Trump attacks Saturday at the high-profile Iowa State Fair [8] => Donald Trump breaks the rules at the Iowa State Fair [9] => Hillary Clinton, Donald Trump and the Trumpcopter descend on the Iowa State Fair ) [7] => Array ( [10] => Trump touts making Time cover while taking heat over attack [11] => Trump attacks Facebook over foreigners [12] => Clinton defends, Trump attacks Saturday at the high-profile Iowa State Fair ) [8] => Array ( [13] => Bush slams Trump, defends using anchor babies [14] => Watch Rachel Maddow Explain Donald Trump’s ‘Genius’ Campaign on Tonight Show [15] => Trump touts making Time cover while taking heat over attack [16] => First Draft: Today in Politics: Rivals Can No Longer Ignore Donald Trump’s Long Shadow [17] => Donald Trump pushes birthright citizenship to forefront of political debate [18] => Jeb Bush takes fight to Donald Trump in N.H. [19] => Donald Trump draws New Hampshire town hall crowd wild; jabs Jeb Bush [20] => While in Vegas, O’Malley makes an appearance in front of Trump’s hotel [21] => Trump’s immigration plan has GOP rivals on edge [22] => Donald Trump calls out Mark Zuckerberg on immigration [23] => Deny citizenship to babies illegal immigrants in US: Donald Trump [24] => Donald Trump takes a break from the campaign trail to join a long list of celebrities to perform jury duty [25] => Trump: Deny citizenship to babies of people illegally in US [26] => Trump Says He Would Deport Illegal Immigrants [27] => From campaign to court: Trump reports for jury duty in NYC [28] => Donald Trump says he will ‘deport millions of illegal immigrants’ [29] => Trump outlines immigration specifics [30] => Trump blunt but vague: No birthright citizenship, millions of illegal immigrants ‘have to go’ [31] => Trump: Deport children of immigrants living illegally in US [32] => DNC blasts Donald Trump, Jeb Bush for comments about women [33] => Trump says would raise visa fees to pay for Mexican border wall [34] => Donald Trump Releases Plan to Combat Illegal Immigration [35] => Donald Trump releases his immigration policy on his GOP presidential campaign website [36] => Donald Trump’s legacy of luxury [37] => Donald Trump says he would deport all illegal immigrants as president [38] => Trump forced to break from campaign trail for jury duty, skipped five summonses since 2006 [39] => Donald Trump forced to take break from campaign trail for jury service [40] => Tables turned on Trump’s chief tormentor [41] => Donald Trump will serve jury duty in NYC next week ) [9] => Array ( [42] => Trump denounces violence after supporters beat Mexican man [43] => Trump says would raise visa fees to pay for Mexican border wall ) [10] => Array ( [44] => Donald Trump pushes birthright citizenship to forefront of political debate [45] => Rand Paul explains why he wants to stop ‘birthright citizenship’ [46] => Trump: Deny citizenship to babies of people illegally in US [47] => Trump blunt but vague: No birthright citizenship, millions of illegal immigrants ‘have to go’ [48] => Trump: end ‘birthright citizenship’ [49] => Trump: Deport children of immigrants living illegally in US [50] => Donald Trump says he would deport all illegal immigrants as president ) [11] => Array ( [51] => Thanks, Donald, but I don’t want to be ‘cherished’ | Barbara Ellen ) [12] => Array ( [52] => Donald Trump tops GOP field in Florida, Pennsylvania, second in Ohio ) [13] => Array ( [53] => Bush slams Trump, defends using anchor babies [54] => GOP candidates hold dueling town halls [55] => Donald Trump draws New Hampshire town hall crowd wild; jabs Jeb Bush [56] => DNC blasts Donald Trump, Jeb Bush for comments about women [57] => Op-Ed Columnist: Introducing Donald Trump, Diplomat ) [14] => Array ( [58] => Rand Paul explains why he wants to stop ‘birthright citizenship’ ) [15] => Array ( [59] => What does Donald Trump think of immigrants, Saudi Arabia and the Iran nuclear deal? [60] => Donald Trump warns that Iran deal will lead to Nuclear Holocaust ) [16] => Array ( [61] => New York City has no way to fire Donald Trump ) ) -> if you now look at [6] and [7] [6] => Array ([6] => Donald Trump to Iowa boy: ‘I am Batman’[7] => Clinton defends, Trump attacks Saturday at the high-profile Iowa State Fair[8] => Donald Trump breaks the rules at the Iowa State Fair[9] => Hillary Clinton, Donald Trump and the Trumpcopter descend on the Iowa State Fair)[7] => Array([10] => Trump touts making Time cover while taking heat over attack[11] => Trump attacks Facebook over foreigners[12] => Clinton defends, Trump attacks Saturday at the high-profile Iowa State Fair) [6]-[7] and [7]-[12] is double entry!? -> can't figure out, why/any suggestions to solve this? thx Edited August 23, 2015 by Pangu Quote Link to comment Share on other sites More sharing options...
CroNiX Posted August 23, 2015 Share Posted August 23, 2015 http://php.net/manual/en/function.array-unique.php Quote Link to comment Share on other sites More sharing options...
Pangu Posted August 23, 2015 Author Share Posted August 23, 2015 (edited) array_unique alone doesn't work beacuse in my example: all elements from: [6] => Array ( Iowa ) and: [7] => Array( attacks + Iowa ) should be merged, because: [6] => Donald Trump to Iowa boy: ‘I am Batman’[8] => Donald Trump breaks the rules at the Iowa State Fair[9] => Hillary Clinton, Donald Trump and the Trumpcopter descend on the Iowa State Fair + [11] => Trump attacks Facebook over foreigners[12] => Clinton defends, Trump attacks Saturday at the high-profile Iowa State Fair -> problems seems to be tricky, any suggestions? thx Edited August 23, 2015 by Pangu Quote Link to comment Share on other sites More sharing options...
Pangu Posted August 23, 2015 Author Share Posted August 23, 2015 (edited) to put it more simple/general, see this example: example of keyword combinations in the title: -A & B -B -C & B & G -D -E & F -F -G -G & H should give: -Topic1: every titles containing any or more of keyword: A, B, C, G, H-Topic2: every titles containing keyword: D-Topic3: every titles containing keyword: E and/or F Edited August 23, 2015 by Pangu Quote Link to comment Share on other sites More sharing options...
Pangu Posted August 23, 2015 Author Share Posted August 23, 2015 (edited) this example is working, but with the big date-set from the begining, it isn't.. ?:/ Edited August 23, 2015 by Pangu Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted August 23, 2015 Share Posted August 23, 2015 Time to use a database and do searches or fetch only the data you need. Quote Link to comment Share on other sites More sharing options...
Pangu Posted August 23, 2015 Author Share Posted August 23, 2015 (edited) i already fetched the data i need by database. usually about 10-100 sentences (each = "$narray[]"). now i want to sort it by script, so that same topic-sentences ("$narray[]") are sort together, like above but working Edited August 23, 2015 by Pangu Quote Link to comment Share on other sites More sharing options...
Barand Posted August 23, 2015 Share Posted August 23, 2015 (edited) Plan C <?php $narray[]="Trump denounces violence after supporters beat Mexican man"; $narray[]="Doyle: What my dad could teach Donald Trump"; $narray[]="Bush slams Trump, defends using anchor babies"; $narray[]="Coming up Trumps: could a British TV star do a Donald and enter politics?"; $narray[]="Watch Rachel Maddow Explain Donald Trump’s ‘Genius’ Campaign on Tonight Show"; $narray[]="Trump touts making Time cover while taking heat over attack"; $narray[]="First Draft: Today in Politics: Rivals Can No Longer Ignore Donald Trump’s Long Shadow"; $narray[]="Donald Trump insists he’s conservative"; $narray[]="GOP candidates hold dueling town halls"; $narray[]="New York City has no way to fire Donald Trump"; $narray[]="Donald Trump pushes birthright citizenship to forefront of political debate"; $narray[]="Jeb Bush takes fight to Donald Trump in N.H."; $narray[]="Rand Paul explains why he wants to stop ‘birthright citizenship’"; $narray[]="Trump attacks Facebook over foreigners"; $narray[]="Donald Trump tops GOP field in Florida, Pennsylvania, second in Ohio"; $narray[]="Donald Trump draws New Hampshire town hall crowd wild; jabs Jeb Bush"; $narray[]="While in Vegas, O’Malley makes an appearance in front of Trump’s hotel"; $narray[]="Trump’s immigration plan has GOP rivals on edge"; $narray[]="Donald Trump calls out Mark Zuckerberg on immigration"; $narray[]="Deny citizenship to babies illegal immigrants in US: Donald Trump"; $narray[]="Donald Trump takes a break from the campaign trail to join a long list of celebrities to perform jury duty"; $narray[]="Trump: Deny citizenship to babies of people illegally in US"; $narray[]="Trump Says He Would Deport Illegal Immigrants"; $narray[]="From campaign to court: Trump reports for jury duty in NYC"; $narray[]="Donald Trump says he will ‘deport millions of illegal immigrants’"; $narray[]="Trump outlines immigration specifics"; $narray[]="Donald Trump to Iowa boy: ‘I am Batman’"; $narray[]="Trump blunt but vague: No birthright citizenship, millions of illegal immigrants ‘have to go’"; $narray[]="Trump: end ‘birthright citizenship’"; $narray[]="Trump: Deport children of immigrants living illegally in US"; $narray[]="DNC blasts Donald Trump , Jeb Bush for comments about women"; $narray[]="Trump says would raise visa fees to pay for Mexican border wall"; $narray[]="What does Donald Trump think of immigrants, Saudi Arabia and the Iran nuclear deal?"; $narray[]="Donald Trump Releases Plan to Combat Illegal Immigration"; $narray[]="Donald Trump releases his immigration policy on his GOP presidential campaign website"; $narray[]="Donald Trump warns that Iran deal will lead to Nuclear Holocaust"; $narray[]="Trump details domestic, foreign policies, answers critics, matches fellow challengers"; $narray[]="Donald Trump’s legacy of luxury"; $narray[]="Clinton defends, Trump attacks Saturday at the high-profile Iowa State Fair"; $narray[]="Donald Trump says he would deport all illegal immigrants as president"; $narray[]="Donald Trump breaks the rules at the Iowa State Fair"; $narray[]="Thanks, Donald, but I don’t want to be ‘cherished’ | Barbara Ellen"; $narray[]="Front-runners skirt the soapbox"; $narray[]="Hillary Clinton, Donald Trump and the Trumpcopter descend on the Iowa State Fair"; $narray[]="Op-Ed Columnist: Introducing Donald Trump, Diplomat"; $narray[]="Trump forced to break from campaign trail for jury duty, skipped five summonses since 2006"; $narray[]="Donald Trump forced to take break from campaign trail for jury service"; $narray[]="Tables turned on Trump’s chief tormentor"; $narray[]="Donald Trump will serve jury duty in NYC next week"; $filtered = filter_my_array($narray); // keywords only array $kwindex = index_keywords($filtered); // index of keywords $keywords = array_keys($kwindex); // // find items with no keywords // $otheritems = []; foreach ($filtered as $k=>$v) { if (count($v)==0) $otheritems[] = $k; } // // combine indexes // uasort($filtered, function($a,$b) {return count($b) - count($a);}); $k = count($filtered); for ($x=0; $x<2; $x++) { for ($i=0; $i<$k-1; $i++) { for ($j=$i+1; $j<$k; $j++) { $a = $filtered[$i]; $b = $filtered[$j]; if (array_intersect($a, $b)) { $filtered[$i] = array_unique(array_merge($a,$b)); $filtered[$j]=[]; } } } } foreach ($filtered as $k => $kwarr) { if (count($kwarr) == 0) { continue; } elseif (count($kwarr) > 1) { sort($kwarr); $newkw = join(' - ', $kwarr); $occurs = []; foreach ($kwarr as $kw) { if (isset($kwindex[$kw])) { $occurs = array_merge($occurs, $kwindex[$kw]); // combine individual lists unset($kwindex[$kw]); // then remove them } } sort($occurs); $kwindex[$newkw] = array_unique($occurs); // add the combined index } } // // create highlighting replacement textss // $replace = []; foreach ($keywords as $kw) { $replace[] = "<span class='hi'>$kw</span>"; } // // create output of the indexed list // ksort($kwindex); $output = ''; foreach ($kwindex as $kw => $items) { if (count($items)==0) continue; $output .= "<h4>$kw</h4><ul>"; foreach ($items as $i) { $output .= "<li>" . str_ireplace($keywords, $replace, $narray[$i]) . "</li>\n"; } $output .= "</ul>\n"; } if (count($otheritems) > 0) { $output .= "<h4>Non-keyword items</h4><ul>"; foreach ($otheritems as $i) { $output .= "<li>{$narray[$i]}</li>\n"; } $output .= "</ul>\n"; } /******************************************************************************* * helper functions ********************************************************************************/ function filter_my_array($array) { // reduces the lines of text to arrays of the keywords in the line $results = []; foreach ($array as $k => $str) { $str = no_punc($str); $a = array_filter(explode(' ', $str), 'remove_noise'); $results[$k] = $a; } return $results; } function remove_noise($x) { $stopWords = array('about','an','and','are','as','at','be','by','com','de','en','for','from', 'how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where', 'who','will','with','und','the','www','donald','trump'); return strlen($x) > 4 && !in_array(strtolower($x), $stopWords); } function index_keywords($array) { // gets the line numbers containing each keyword $results = []; foreach ($array as $k => $kwarr) { foreach ($kwarr as $kw) { $results[$kw][] = $k; } } return $results; } function no_punc($str) { $allow = array_merge([32], range(ord('a'), ord('z')), range(ord('0'), ord('9'))); $k = strlen($str); $res = ''; $str = strtolower($str); for ($i=0; $i<$k; $i++) { if (in_array(ord($str[$i]), $allow) ) { $res .= $str[$i]; } else $res .= ' '; } return $res; } ?> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>Keyword Index</title> <style type='text/css'> .hi { font-weight: 700; color: red; } </style> </head> <body> <?=$output?> </body> </html> output <html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>Keyword Index</title> <style type="text/css"> .hi { font-weight: 700; color: red; } </style> </head> <body> <h4>after - anchor - arabia - attacks - babies - birthright - blunt - border - break - breaks - british - calls - campaign - celebrities - children - citizenship - clinton - combat - coming - could - court - debate - defends - denounces - deport - descend - doyle - draft - enter - explain - explains - facebook - fight - first - forced - forefront - foreigners - genius - hillary - holocaust - ignore - illegal - illegally - immigrants - immigration - living - longer - maddow - mexican - millions - nuclear - outlines - people - perform - policy - political - politics - president - presidential - profile - pushes - rachel - raise - releases - reports - rivals - rules - saturday - saudi - service - shadow - since - skipped - slams - specifics - state - summonses - supporters - takes - teach - think - today - tonight - trail - trumpcopter - trumps - using - vague - violence - wants - warns - watch - website - would - zuckerberg</h4><ul><li>Trump <span class="hi">denounces</span> <span class="hi">violence</span> <span class="hi">after</span> <span class="hi">supporters</span> beat <span class="hi">mexican</span> man</li> <li><span class="hi">doyle</span>: What my dad <span class="hi">could</span> <span class="hi">teach</span> Donald Trump</li> <li>Bush <span class="hi">slams</span> Trump, <span class="hi">defends</span> <span class="hi">using</span> <span class="hi">anchor</span> <span class="hi">babies</span></li> <li><span class="hi">coming</span> up <span class="hi">trumps</span>: <span class="hi">could</span> a <span class="hi">british</span> TV star do a Donald and <span class="hi">enter</span> <span class="hi">politics</span>?</li> <li><span class="hi">watch</span> <span class="hi">rachel</span> <span class="hi">maddow</span> <span class="hi">explain</span> Donald Trump’s ‘<span class="hi">genius</span>’ <span class="hi">campaign</span> on <span class="hi">tonight</span> Show</li> <li><span class="hi">first</span> <span class="hi">draft</span>: <span class="hi">today</span> in <span class="hi">politics</span>: <span class="hi">rivals</span> Can No <span class="hi">longer</span> <span class="hi">ignore</span> Donald Trump’s Long <span class="hi">shadow</span></li> <li>Donald Trump <span class="hi">pushes</span> <span class="hi">birthright</span> <span class="hi">citizenship</span> to <span class="hi">fore<span class="hi">front</span></span> of <span class="hi">political</span> <span class="hi">debate</span></li> <li>Jeb Bush <span class="hi">takes</span> <span class="hi">fight</span> to Donald Trump in N.H.</li> <li>Rand Paul <span class="hi">explain</span>s why he <span class="hi">wants</span> to stop ‘<span class="hi">birthright</span> <span class="hi">citizenship</span>’</li> <li>Trump <span class="hi">attack</span>s <span class="hi">facebook</span> over <span class="hi"><span class="hi">foreign</span>ers</span></li> <li>Trump’s <span class="hi">immigration</span> plan has GOP <span class="hi">rivals</span> on edge</li> <li>Donald Trump <span class="hi">calls</span> out Mark <span class="hi">zuckerberg</span> on <span class="hi">immigration</span></li> <li>Deny <span class="hi">citizenship</span> to <span class="hi">babies</span> <span class="hi">illegal</span> <span class="hi">immigrants</span> in US: Donald Trump</li> <li>Donald Trump <span class="hi">takes</span> a <span class="hi">break</span> from the <span class="hi">campaign</span> <span class="hi">trail</span> to join a long list of <span class="hi">celebrities</span> to <span class="hi">perform</span> jury duty</li> <li>Trump: Deny <span class="hi">citizenship</span> to <span class="hi">babies</span> of <span class="hi">people</span> <span class="hi">illegal</span>ly in US</li> <li>Trump Says He <span class="hi">would</span> <span class="hi">deport</span> <span class="hi">illegal</span> <span class="hi">immigrants</span></li> <li>From <span class="hi">campaign</span> to <span class="hi">court</span>: Trump <span class="hi">reports</span> for jury duty in NYC</li> <li>Donald Trump says he will ‘<span class="hi">deport</span> <span class="hi">millions</span> of <span class="hi">illegal</span> <span class="hi">immigrants</span>’</li> <li>Trump <span class="hi">outlines</span> <span class="hi">immigration</span> <span class="hi">specifics</span></li> <li>Trump <span class="hi">blunt</span> but <span class="hi">vague</span>: No <span class="hi">birthright</span> <span class="hi">citizenship</span>, <span class="hi">millions</span> of <span class="hi">illegal</span> <span class="hi">immigrants</span> ‘have to go’</li> <li>Trump: end ‘<span class="hi">birthright</span> <span class="hi">citizenship</span>’</li> <li>Trump: <span class="hi">deport</span> <span class="hi">children</span> of <span class="hi">immigrants</span> <span class="hi">living</span> <span class="hi">illegal</span>ly in US</li> <li>Trump says <span class="hi">would</span> <span class="hi">raise</span> visa fees to pay for <span class="hi">mexican</span> <span class="hi">border</span> wall</li> <li>What does Donald Trump <span class="hi">think</span> of <span class="hi">immigrants</span>, <span class="hi">saudi</span> <span class="hi">arabia</span> and the Iran <span class="hi">nuclear</span> deal?</li> <li>Donald Trump <span class="hi">releases</span> Plan to <span class="hi">combat</span> <span class="hi">illegal</span> <span class="hi">immigration</span></li> <li>Donald Trump <span class="hi">releases</span> his <span class="hi">immigration</span> <span class="hi">policy</span> on his GOP <span class="hi"><span class="hi">president</span>ial</span> <span class="hi">campaign</span> <span class="hi">website</span></li> <li>Donald Trump <span class="hi">warns</span> that Iran deal will lead to <span class="hi">nuclear</span> <span class="hi">holocaust</span></li> <li><span class="hi">clinton</span> <span class="hi">defends</span>, Trump <span class="hi">attack</span>s <span class="hi">saturday</span> at the high-<span class="hi">profile</span> Iowa <span class="hi">state</span> Fair</li> <li>Donald Trump says he <span class="hi">would</span> <span class="hi">deport</span> all <span class="hi">illegal</span> <span class="hi">immigrants</span> as <span class="hi">president</span></li> <li>Donald Trump <span class="hi">break</span>s the <span class="hi">rules</span> at the Iowa <span class="hi">state</span> Fair</li> <li><span class="hi">hillary</span> <span class="hi">clinton</span>, Donald Trump and the <span class="hi">trumpcopter</span> <span class="hi">descend</span> on the Iowa <span class="hi">state</span> Fair</li> <li>Trump <span class="hi">forced</span> to <span class="hi">break</span> from <span class="hi">campaign</span> <span class="hi">trail</span> for jury duty, <span class="hi">skipped</span> five <span class="hi">summonses</span> <span class="hi">since</span> 2006</li> <li>Donald Trump <span class="hi">forced</span> to take <span class="hi">break</span> from <span class="hi">campaign</span> <span class="hi">trail</span> for jury <span class="hi">service</span></li> </ul> <h4>answers - challengers - critics - details - domestic - fellow - foreign - matches - policies</h4><ul><li>Trump <span class="hi">details</span> <span class="hi">domestic</span>, <span class="hi">foreign</span> <span class="hi">policies</span>, <span class="hi">answers</span> <span class="hi">critics</span>, <span class="hi">matches</span> <span class="hi">fellow</span> <span class="hi">challengers</span></li> </ul> <h4>appearance - attack - cover - front - hotel - makes - making - malley - runners - skirt - soapbox - taking - touts - vegas - while</h4><ul><li>Trump <span class="hi">touts</span> <span class="hi">making</span> Time <span class="hi">cover</span> <span class="hi">while</span> <span class="hi">taking</span> heat over <span class="hi">attack</span></li> <li><span class="hi">while</span> in <span class="hi">vegas</span>, O’<span class="hi">malley</span> <span class="hi">makes</span> an <span class="hi">appearance</span> in <span class="hi">front</span> of Trump’s <span class="hi">hotel</span></li> <li><span class="hi">front</span>-<span class="hi">runners</span> <span class="hi">skirt</span> the <span class="hi">soapbox</span></li> </ul> <h4>barbara - cherished - ellen - thanks</h4><ul><li><span class="hi">thanks</span>, Donald, but I don’t want to be ‘<span class="hi">cherished</span>’ | <span class="hi">barbara</span> <span class="hi">ellen</span></li> </ul> <h4>batman</h4><ul><li>Donald Trump to Iowa boy: ‘I am <span class="hi">batman</span>’</li> </ul> <h4>blasts - comments - women</h4><ul><li>DNC <span class="hi">blasts</span> Donald Trump , Jeb Bush for <span class="hi">comments</span> about <span class="hi">women</span></li> </ul> <h4>candidates - dueling - halls</h4><ul><li>GOP <span class="hi">candidates</span> hold <span class="hi">dueling</span> town <span class="hi">halls</span></li> </ul> <h4>chief - tables - tormentor - turned</h4><ul><li><span class="hi">tables</span> <span class="hi">turned</span> on Trump’s <span class="hi">chief</span> <span class="hi">tormentor</span></li> </ul> <h4>columnist - diplomat - introducing</h4><ul><li>Op-Ed <span class="hi">columnist</span>: <span class="hi">introducing</span> Donald Trump, <span class="hi">diplomat</span></li> </ul> <h4>conservative - insists</h4><ul><li>Donald Trump <span class="hi">insists</span> he’s <span class="hi">conservative</span></li> </ul> <h4>crowd - draws - hampshire</h4><ul><li>Donald Trump <span class="hi">draws</span> New <span class="hi">hampshire</span> town hall <span class="hi">crowd</span> wild; jabs Jeb Bush</li> </ul> <h4>field - florida - pennsylvania - second</h4><ul><li>Donald Trump tops GOP <span class="hi">field</span> in <span class="hi">florida</span>, <span class="hi">pennsylvania</span>, <span class="hi">second</span> in Ohio</li> </ul> <h4>legacy - luxury</h4><ul><li>Donald Trump’s <span class="hi">legacy</span> of <span class="hi">luxury</span></li> </ul> <h4>serve</h4><ul><li>Donald Trump will <span class="hi">serve</span> jury duty in NYC next week</li> </ul> <h4>Non-keyword items</h4><ul><li>New York City has no way to fire Donald Trump</li> </ul> </body></html> Edited August 23, 2015 by Barand Quote Link to comment Share on other sites More sharing options...
Pangu Posted August 24, 2015 Author Share Posted August 24, 2015 (edited) thx again! this seems quite good, but unfortunately not working 100% correct: e.g. the headline: Donald Trump to Iowa boy: ‘I am batman’ -> why is it on it's own topic? it should be merged to the headlines containing "Iowa"!? Edited August 24, 2015 by Pangu Quote Link to comment Share on other sites More sharing options...
Barand Posted August 24, 2015 Share Posted August 24, 2015 I increased the "noise" threshold to ignore words of 4 or less characters function remove_noise($x) { $stopWords = array('about','an','and','are','as','at','be','by','com','de','en','for','from', 'how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where', 'who','will','with','und','the','www','donald','trump'); return strlen($x) > 4 && !in_array(strtolower($x), $stopWords);} If you change it to 3 then it will pick up "Iowa" Quote Link to comment Share on other sites More sharing options...
Pangu Posted August 24, 2015 Author Share Posted August 24, 2015 (edited) ok thx! think now it needs just one more step, to get good results: think, now that your script found the relevant keywords (red), it should sort them by appeareance (number): in this example: Trump denounces violence after supporters beat mexican mandoyle: What my dad could teach Donald Trumpbush slams Trump, defends using anchor babiescoming up trumps: could a british TV star do a Donald and enter politics?watch rachel maddow explain Donald Trump’s ‘genius’ campaign on tonight showTrump touts making time cover while taking heat over attackfirst draft: today in politics: rivals Can No longer ignore Donald Trump’s long shadowGOP candidates hold dueling town hallsDonald Trump pushes birthright citizenship to forefront of political debateJeb bush takes fight to Donald Trump in N.H.rand paul explains why he wants to stop ‘birthright citizenship’Trump attacks facebook over foreignersDonald Trump draws New hampshire town hall crowd wild; jabs Jeb bushwhile in vegas, O’Malley makes an appearance in front of Trump’s hotelTrump’s immigration plan has GOP rivals on edgedeny citizenship to babies illegal immigrants in US: Donald TrumpDonald Trump takes a break from the campaign trail to join a long list of celebrities to perform jury dutyTrump: deny citizenship to babies of people illegally in USTrump says He would deport illegal immigrantsFrom campaign to court: Trump reports for jury duty in NYCDonald Trump says he will ‘deport millions of illegal immigrants’Donald Trump to iowa boy: ‘I am batman’Trump blunt but vague: No birthright citizenship, millions of illegal immigrants ‘have to go’Trump: end ‘birthright citizenship’Trump: deport children of immigrants living illegally in USDNC blasts Donald Trump , Jeb bush for comments about womenTrump says would raise visa fees to pay for mexican border wallWhat does Donald Trump think of immigrants, saudi arabia and the iran nuclear deal?Donald Trump releases plan to combat illegal immigrationDonald Trump releases his immigration policy on his GOP presidential campaign websiteDonald Trump warns that iran deal will lead to nuclear holocaustclinton defends, Trump attacks saturday at the high-profile iowa state fairDonald Trump says he would deport all illegal immigrants as presidentDonald Trump breaks the rules at the iowa state fairhillary clinton, Donald Trump and the trumpcopter descend on the iowa state fairTrump forced to break from campaign trail for jury duty, skipped five summonses since 2006Donald Trump forced to take break from campaign trail for jury serviceDonald Trump will serve jury duty in NYC next week should be sort to: immigra-nts (10x): Trump’s immigration plan has GOP rivals on edgeDonald Trump releases plan to combat illegal immigrationDonald Trump releases his immigration policy on his GOP presidential campaign websiteDonald Trump says he will ‘deport millions of illegal immigrants’Trump blunt but vague: No birthright citizenship, millions of illegal immigrants ‘have to go’deny citizenship to babies illegal immigrants in US: Donald TrumpTrump: deport children of immigrants living illegally in USDonald Trump says he would deport all illegal immigrants as presidentTrump says He would deport illegal immigrantsWhat does Donald Trump think of immigrants, saudi arabia and the iran nuclear deal?jury (5x)From campaign to court: Trump reports for jury duty in NYCTrump forced to break from campaign trail for jury duty, skipped five summonses since 2006Donald Trump forced to take break from campaign trail for jury serviceDonald Trump will serve jury duty in NYC next weekDonald Trump takes a break from the campaign trail to join a long list of celebrities to perform jury dutycitizenship (4x)Trump: deny citizenship to babies of people illegally in USTrump: end ‘birthright citizenship’Donald Trump pushes birthright citizenship to forefront of political debaterand paul explains why he wants to stop ‘birthright citizenship’iowa: (4x)clinton defends, Trump attacks saturday at the high-profile iowa state fairDonald Trump breaks the rules at the iowa state fairhillary clinton, Donald Trump and the trumpcopter descend on the iowa state fairDonald Trump to iowa boy: ‘I am batman’bush (3x)Jeb bush takes fight to Donald Trump in N.H.bush slams Trump, defends using anchor babiesDNC blasts Donald Trump , Jeb bush for comments about womentown (2x)GOP candidates hold dueling town hallsDonald Trump draws New hampshire town hall crowd wild; jabs Jeb bushotherTrump denounces violence after supporters beat mexican mandoyle: What my dad could teach Donald Trumpcoming up trumps: could a british TV star do a Donald and enter politics?watch rachel maddow explain Donald Trump’s ‘genius’ campaign on tonight showfirst draft: today in politics: rivals Can No longer ignore Donald Trump’s long shadowTrump attacks facebook over foreignersTrump touts making time cover while taking heat over attackwhile in vegas, O’Malley makes an appearance in front of Trump’s hotelDonald Trump warns that iran deal will lead to nuclear holocaustTrump says would raise visa fees to pay for mexican border wall Edited August 24, 2015 by Pangu Quote Link to comment Share on other sites More sharing options...
Barand Posted August 24, 2015 Share Posted August 24, 2015 (Final) Plan D <?php include('db_inc.php'); error_reporting(-1); $mysqli = new mysqli(HOST,USERNAME,PASSWORD,'test'); ?> <?php $narray[]="Trump denounces violence after supporters beat Mexican man"; $narray[]="Doyle: What my dad could teach Donald Trump"; $narray[]="Bush slams Trump, defends using anchor babies"; $narray[]="Coming up Trumps: could a British TV star do a Donald and enter politics?"; $narray[]="Watch Rachel Maddow Explain Donald Trump’s ‘Genius’ Campaign on Tonight Show"; $narray[]="Trump touts making Time cover while taking heat over attack"; $narray[]="First Draft: Today in Politics: Rivals Can No Longer Ignore Donald Trump’s Long Shadow"; $narray[]="Donald Trump insists he’s conservative"; $narray[]="GOP candidates hold dueling town halls"; $narray[]="New York City has no way to fire Donald Trump"; $narray[]="Donald Trump pushes birthright citizenship to forefront of political debate"; $narray[]="Jeb Bush takes fight to Donald Trump in N.H."; $narray[]="Rand Paul explains why he wants to stop ‘birthright citizenship’"; $narray[]="Trump attacks Facebook over foreigners"; $narray[]="Donald Trump tops GOP field in Florida, Pennsylvania, second in Ohio"; $narray[]="Donald Trump draws New Hampshire town hall crowd wild; jabs Jeb Bush"; $narray[]="While in Vegas, O’Malley makes an appearance in front of Trump’s hotel"; $narray[]="Trump’s immigration plan has GOP rivals on edge"; $narray[]="Donald Trump calls out Mark Zuckerberg on immigration"; $narray[]="Deny citizenship to babies illegal immigrants in US: Donald Trump"; $narray[]="Donald Trump takes a break from the campaign trail to join a long list of celebrities to perform jury duty"; $narray[]="Trump: Deny citizenship to babies of people illegally in US"; $narray[]="Trump Says He Would Deport Illegal Immigrants"; $narray[]="From campaign to court: Trump reports for jury duty in NYC"; $narray[]="Donald Trump says he will ‘deport millions of illegal immigrants’"; $narray[]="Trump outlines immigration specifics"; $narray[]="Donald Trump to Iowa boy: ‘I am Batman’"; $narray[]="Trump blunt but vague: No birthright citizenship, millions of illegal immigrants ‘have to go’"; $narray[]="Trump: end ‘birthright citizenship’"; $narray[]="Trump: Deport children of immigrants living illegally in US"; $narray[]="DNC blasts Donald Trump , Jeb Bush for comments about women"; $narray[]="Trump says would raise visa fees to pay for Mexican border wall"; $narray[]="What does Donald Trump think of immigrants, Saudi Arabia and the Iran nuclear deal?"; $narray[]="Donald Trump Releases Plan to Combat Illegal Immigration"; $narray[]="Donald Trump releases his immigration policy on his GOP presidential campaign website"; $narray[]="Donald Trump warns that Iran deal will lead to Nuclear Holocaust"; $narray[]="Trump details domestic, foreign policies, answers critics, matches fellow challengers"; $narray[]="Donald Trump’s legacy of luxury"; $narray[]="Clinton defends, Trump attacks Saturday at the high-profile Iowa State Fair"; $narray[]="Donald Trump says he would deport all illegal immigrants as president"; $narray[]="Donald Trump breaks the rules at the Iowa State Fair"; $narray[]="Thanks, Donald, but I don’t want to be ‘cherished’ | Barbara Ellen"; $narray[]="Front-runners skirt the soapbox"; $narray[]="Hillary Clinton, Donald Trump and the Trumpcopter descend on the Iowa State Fair"; $narray[]="Op-Ed Columnist: Introducing Donald Trump, Diplomat"; $narray[]="Trump forced to break from campaign trail for jury duty, skipped five summonses since 2006"; $narray[]="Donald Trump forced to take break from campaign trail for jury service"; $narray[]="Tables turned on Trump’s chief tormentor"; $narray[]="Donald Trump will serve jury duty in NYC next week"; $filtered = filter_my_array($narray); // keywords only array $keywords = []; $kwindex = index_keywords($filtered, $keywords); // index of keywords uksort($keywords, function($a,$b){return strlen($b) - strlen($a);}); // // find items with no keywords // $otheritems = []; foreach ($filtered as $k=>$v) { if (count($v)==0) $otheritems[] = $k; } // // combine indexes // uasort($filtered, function($a,$b) {return count($b) - count($a);}); $k = count($filtered); for ($x=0; $x<2; $x++) { for ($i=0; $i<$k-1; $i++) { for ($j=$i+1; $j<$k; $j++) { $a = $filtered[$i]; $b = $filtered[$j]; if (array_intersect($a, $b)) { $filtered[$i] = array_unique(array_merge($a,$b)); $filtered[$j]=[]; } } } } foreach ($filtered as $k => $kwarr) { if (count($kwarr) == 0) { continue; } elseif (count($kwarr) > 1) { sort($kwarr); $kwarrcounted = append_counts($kwarr, $keywords); $newkw = join(' - ', $kwarrcounted); $occurs = []; foreach ($kwarr as $kw) { if (isset($kwindex[$kw])) { $occurs = array_merge($occurs, $kwindex[$kw]); // combine individual lists unset($kwindex[$kw]); // then remove them } } sort($occurs); $kwindex[$newkw] = array_unique($occurs); // add the combined index } } // // create highlighting replacement textss // $replace = $srch = []; foreach ($keywords as $kw=>$count) { $srch[] = $kw; $replace[] = "<span class='hi'>$kw</span>"; } // // create output of the indexed list // ksort($kwindex); $output = ''; foreach ($kwindex as $kw => $items) { if (count($items)==0) continue; $output .= "<h4>$kw</h4><ul>"; foreach ($items as $i) { $output .= "<li>" . str_ireplace($srch, $replace, $narray[$i]) . "</li>\n"; } $output .= "</ul>\n"; } if (count($otheritems) > 0) { $output .= "<h4>Non-keyword items</h4><ul>"; foreach ($otheritems as $i) { $output .= "<li>{$narray[$i]}</li>\n"; } $output .= "</ul>\n"; } /******************************************************************************* * helper functions ********************************************************************************/ function filter_my_array($array) { // reduces the lines of text to arrays of the keywords in the line $results = []; foreach ($array as $k => $str) { $str = no_punc($str); $a = array_filter(explode(' ', $str), 'remove_noise'); $results[$k] = $a; } return $results; } function remove_noise($x) { $stopWords = array('about','an','and','are','as','at','be','by','com','de','en','for','from', 'how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where', 'who','will','with','und','the','www','donald','trump'); return strlen($x) > 3 && !in_array(strtolower($x), $stopWords); } function index_keywords($array, &$kwords) { // gets the line numbers containing each keyword $results = []; foreach ($array as $k => $kwarr) { foreach ($kwarr as $kw) { $results[$kw][] = $k; if (isset($kwords[$kw])) { ++$kwords[$kw]; // count keyword usage } else { $kwords[$kw]=1; } } } return $results; } function no_punc($str) { $allow = array_merge([32,39], range(ord('a'), ord('z')), range(ord('0'), ord('9'))); $k = strlen($str); $res = ''; $str = strtolower($str); for ($i=0; $i<$k; $i++) { if (in_array(ord($str[$i]), $allow) ) { $res .= $str[$i]; } else $res .= ' '; } return $res; } function append_counts($karr, $keywords) { $res = []; foreach ($karr as $k=>$word) { $n = $keywords[$word]; $res[$k] = "$word<span class='count'>({$n}x)</span>"; } return $res; } ?> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>Keyword Index</title> <style type='text/css'> .hi { font-weight: 700; color: red; } .count { font-weight: 100; color: #f88; } </style> </head> <body> <?=$output?> </body> </html> 1 Quote Link to comment Share on other sites More sharing options...
Pangu Posted August 25, 2015 Author Share Posted August 25, 2015 thx, but for the final results, how can i sort: 2006(1x) - after(1x) - anchor(1x) - appearance(1x) - arabia(1x) - attack(1x) - attacks(2x) - babies(3x) - batman(1x) - beat(1x) - birthright(4x) - blasts(1x) - blunt(1x) - border(1x) - break(3x) - breaks(1x) - british(1x) - bush(4x) - calls(1x) - campaign(6x) - candidates(1x) - celebrities(1x) - children(1x) - citizenship(6x) - clinton(2x) - combat(1x) - coming(1x) - comments(1x) - could(2x) - court(1x) - cover(1x) - crowd(1x) - deal(2x) - debate(1x) - defends(2x) - denounces(1x) - deny(2x) - deport(4x) - descend(1x) - does(1x) - doyle(1x) - draft(1x) - draws(1x) - dueling(1x) - duty(4x) - edge(1x) - enter(1x) - explain(1x) - explains(1x) - facebook(1x) - fair(3x) - fees(1x) - fight(1x) - first(1x) - five(1x) - forced(2x) - forefront(1x) - foreigners(1x) - front(2x) - genius(1x) - hall(1x) - halls(1x) - hampshire(1x) - have(1x) - heat(1x) - high(1x) - hillary(1x) - hold(1x) - holocaust(1x) - hotel(1x) - ignore(1x) - illegal(6x) - illegally(2x) - immigrants(7x) - immigration(5x) - iowa(4x) - iran(2x) - jabs(1x) - join(1x) - jury(5x) - lead(1x) - list(1x) - living(1x) - long(2x) - longer(1x) - maddow(1x) - makes(1x) - making(1x) - malley(1x) - mark(1x) - mexican(2x) - millions(2x) - next(1x) - nuclear(2x) - outlines(1x) - over(2x) - paul(1x) - people(1x) - perform(1x) - plan(2x) - policy(1x) - political(1x) - politics(2x) - president(1x) - presidential(1x) - profile(1x) - pushes(1x) - rachel(1x) - raise(1x) - rand(1x) - releases(2x) - reports(1x) - rivals(2x) - rules(1x) - runners(1x) - saturday(1x) - saudi(1x) - says(4x) - serve(1x) - service(1x) - shadow(1x) - show(1x) - since(1x) - skipped(1x) - skirt(1x) - slams(1x) - soapbox(1x) - specifics(1x) - star(1x) - state(3x) - stop(1x) - summonses(1x) - supporters(1x) - take(1x) - takes(2x) - taking(1x) - teach(1x) - think(1x) - time(1x) - today(1x) - tonight(1x) - touts(1x) - town(2x) - trail(3x) - trumpcopter(1x) - trumps(1x) - using(1x) - vague(1x) - vegas(1x) - violence(1x) - visa(1x) - wall(1x) - wants(1x) - warns(1x) - watch(1x) - website(1x) - week(1x) - while(2x) - wild(1x) - women(1x) - would(3x) - zuckerberg(1x) Trump denounces violence after supporters beat mexican man doyle: What my dad could teach Donald Trump bush slams Trump, defends using anchor babies coming up trumps: could a british TV star do a Donald and enter politics? watch rachel maddow explain Donald Trump’s ‘genius’ campaign on tonight show Trump touts making time cover while taking heat over attack first draft: today in politics: rivals Can No longer ignore Donald Trump’s long shadow GOP candidates hold dueling town halls Donald Trump pushes birthright citizenship to forefront of political debate Jeb bush takes fight to Donald Trump in N.H. rand paul explains why he wants to stop ‘birthright citizenship’ Trump attacks facebook over foreigners Donald Trump draws New hampshire town hall crowd wild; jabs Jeb bush while in vegas, O’malley makes an appearance in front of Trump’s hotel Trump’s immigration plan has GOP rivals on edge Donald Trump calls out mark zuckerberg on immigration deny citizenship to babies illegal immigrants in US: Donald Trump Donald Trump takes a break from the campaign trail to join a long list of celebrities to perform jury duty Trump: deny citizenship to babies of people illegally in US Trump says He would deport illegal immigrants From campaign to court: Trump reports for jury duty in NYC Donald Trump says he will ‘deport millions of illegal immigrants’ Trump outlines immigration specifics Donald Trump to iowa boy: ‘I am batman’ Trump blunt but vague: No birthright citizenship, millions of illegal immigrants ‘have to go’ Trump: end ‘birthright citizenship’ Trump: deport children of immigrants living illegally in US DNC blasts Donald Trump , Jeb bush for comments about women Trump says would raise visa fees to pay for mexican border wall What does Donald Trump think of immigrants, saudi arabia and the iran nuclear deal? Donald Trump releases plan to combat illegal immigration Donald Trump releases his immigration policy on his GOP presidential campaign website Donald Trump warns that iran deal will lead to nuclear holocaust clinton defends, Trump attacks saturday at the high-profile iowa state fair Donald Trump says he would deport all illegal immigrants as president Donald Trump breaks the rules at the iowa state fair front-runners skirt the soapbox hillary clinton, Donald Trump and the trumpcopter descend on the iowa state fair Trump forced to break from campaign trail for jury duty, skipped five summonses since 2006 Donald Trump forced to take break from campaign trail for jury service Donald Trump will serve jury duty in NYC next week to immigra-nts (10x): Trump’s immigration plan has GOP rivals on edgeDonald Trump releases plan to combat illegal immigrationDonald Trump releases his immigration policy on his GOP presidential campaign websiteDonald Trump says he will ‘deport millions of illegal immigrants’Trump blunt but vague: No birthright citizenship, millions of illegal immigrants ‘have to go’deny citizenship to babies illegal immigrants in US: Donald TrumpTrump: deport children of immigrants living illegally in USDonald Trump says he would deport all illegal immigrants as presidentTrump says He would deport illegal immigrantsWhat does Donald Trump think of immigrants, saudi arabia and the iran nuclear deal?jury (5x)From campaign to court: Trump reports for jury duty in NYCTrump forced to break from campaign trail for jury duty, skipped five summonses since 2006Donald Trump forced to take break from campaign trail for jury serviceDonald Trump will serve jury duty in NYC next weekDonald Trump takes a break from the campaign trail to join a long list of celebrities to perform jury dutycitizenship (4x)Trump: deny citizenship to babies of people illegally in USTrump: end ‘birthright citizenship’Donald Trump pushes birthright citizenship to forefront of political debaterand paul explains why he wants to stop ‘birthright citizenship’iowa: (4x)clinton defends, Trump attacks saturday at the high-profile iowa state fairDonald Trump breaks the rules at the iowa state fairhillary clinton, Donald Trump and the trumpcopter descend on the iowa state fairDonald Trump to iowa boy: ‘I am batman’bush (3x)Jeb bush takes fight to Donald Trump in N.H.bush slams Trump, defends using anchor babiesDNC blasts Donald Trump , Jeb bush for comments about womentown (2x)GOP candidates hold dueling town hallsDonald Trump draws New hampshire town hall crowd wild; jabs Jeb bushotherTrump denounces violence after supporters beat mexican mandoyle: What my dad could teach Donald Trumpcoming up trumps: could a british TV star do a Donald and enter politics?watch rachel maddow explain Donald Trump’s ‘genius’ campaign on tonight showfirst draft: today in politics: rivals Can No longer ignore Donald Trump’s long shadowTrump attacks facebook over foreignersTrump touts making time cover while taking heat over attackwhile in vegas, O’Malley makes an appearance in front of Trump’s hotelDonald Trump warns that iran deal will lead to nuclear holocaustTrump says would raise visa fees to pay for mexican border wall "sort by appearance of keywords the script found" Quote Link to comment Share on other sites More sharing options...
Barand Posted August 25, 2015 Share Posted August 25, 2015 That is a completely different output from the problem you initially posed. Are you deliberately wasting my time? Quote Link to comment Share on other sites More sharing options...
Pangu Posted August 25, 2015 Author Share Posted August 25, 2015 it still is the same task as in my first post!? if one textblock "$narray[x]" has more than one keywords, it should be combined to the other keywords, because i suggest it should have the same topic. how can i combine/grouped textblocks with same topic in my script? Quote Link to comment Share on other sites More sharing options...
Barand Posted August 25, 2015 Share Posted August 25, 2015 The complex part was combining the keywords. There is no combining required in the latest requirement of yours. Anyway, here's the code with keywords sorted by count, as the info required was in the arrays. <?php $narray[]="Trump denounces violence after supporters beat Mexican man"; $narray[]="Doyle: What my dad could teach Donald Trump"; $narray[]="Bush slams Trump, defends using anchor babies"; $narray[]="Coming up Trumps: could a British TV star do a Donald and enter politics?"; $narray[]="Watch Rachel Maddow Explain Donald Trump’s ‘Genius’ Campaign on Tonight Show"; $narray[]="Trump touts making Time cover while taking heat over attack"; $narray[]="First Draft: Today in Politics: Rivals Can No Longer Ignore Donald Trump’s Long Shadow"; $narray[]="Donald Trump insists he’s conservative"; $narray[]="GOP candidates hold dueling town halls"; $narray[]="New York City has no way to fire Donald Trump"; $narray[]="Donald Trump pushes birthright citizenship to forefront of political debate"; $narray[]="Jeb Bush takes fight to Donald Trump in N.H."; $narray[]="Rand Paul explains why he wants to stop ‘birthright citizenship’"; $narray[]="Trump attacks Facebook over foreigners"; $narray[]="Donald Trump tops GOP field in Florida, Pennsylvania, second in Ohio"; $narray[]="Donald Trump draws New Hampshire town hall crowd wild; jabs Jeb Bush"; $narray[]="While in Vegas, O’Malley makes an appearance in front of Trump’s hotel"; $narray[]="Trump’s immigration plan has GOP rivals on edge"; $narray[]="Donald Trump calls out Mark Zuckerberg on immigration"; $narray[]="Deny citizenship to babies illegal immigrants in US: Donald Trump"; $narray[]="Donald Trump takes a break from the campaign trail to join a long list of celebrities to perform jury duty"; $narray[]="Trump: Deny citizenship to babies of people illegally in US"; $narray[]="Trump Says He Would Deport Illegal Immigrants"; $narray[]="From campaign to court: Trump reports for jury duty in NYC"; $narray[]="Donald Trump says he will ‘deport millions of illegal immigrants’"; $narray[]="Trump outlines immigration specifics"; $narray[]="Donald Trump to Iowa boy: ‘I am Batman’"; $narray[]="Trump blunt but vague: No birthright citizenship, millions of illegal immigrants ‘have to go’"; $narray[]="Trump: end ‘birthright citizenship’"; $narray[]="Trump: Deport children of immigrants living illegally in US"; $narray[]="DNC blasts Donald Trump , Jeb Bush for comments about women"; $narray[]="Trump says would raise visa fees to pay for Mexican border wall"; $narray[]="What does Donald Trump think of immigrants, Saudi Arabia and the Iran nuclear deal?"; $narray[]="Donald Trump Releases Plan to Combat Illegal Immigration"; $narray[]="Donald Trump releases his immigration policy on his GOP presidential campaign website"; $narray[]="Donald Trump warns that Iran deal will lead to Nuclear Holocaust"; $narray[]="Trump details domestic, foreign policies, answers critics, matches fellow challengers"; $narray[]="Donald Trump’s legacy of luxury"; $narray[]="Clinton defends, Trump attacks Saturday at the high-profile Iowa State Fair"; $narray[]="Donald Trump says he would deport all illegal immigrants as president"; $narray[]="Donald Trump breaks the rules at the Iowa State Fair"; $narray[]="Thanks, Donald, but I don’t want to be ‘cherished’ | Barbara Ellen"; $narray[]="Front-runners skirt the soapbox"; $narray[]="Hillary Clinton, Donald Trump and the Trumpcopter descend on the Iowa State Fair"; $narray[]="Op-Ed Columnist: Introducing Donald Trump, Diplomat"; $narray[]="Trump forced to break from campaign trail for jury duty, skipped five summonses since 2006"; $narray[]="Donald Trump forced to take break from campaign trail for jury service"; $narray[]="Tables turned on Trump’s chief tormentor"; $narray[]="Donald Trump will serve jury duty in NYC next week"; $filtered = filter_my_array($narray); // keywords only array $keywords = []; $kwindex = index_keywords($filtered, $keywords); // index of keywords // // find items with no keywords // $otheritems = []; foreach ($filtered as $k=>$v) { if (count($v)==0) $otheritems[] = $k; } // // create output of the indexed lists // // rearrange key words by desc no of occurences / alpha sequence $countedKeywords = []; foreach ($keywords as $kw => $n) { $countedKeywords[$n][] = $kw; } $output = ''; krsort($countedKeywords); foreach ($countedKeywords as $n => $kws) { sort($kws); foreach ($kws as $kw) { $output .= "<h4>$kw<span class='count'>({$n}x)</span></h4><ul>"; foreach($kwindex[$kw] as $i) { $output .= "<li>" . str_ireplace($kw, "<span class='hi'>$kw</span>", $narray[$i]) . "</li>\n"; } $output .= "</ul>\n"; } } if (count($otheritems) > 0) { $output .= "<h4>Non-keyword items</h4><ul>"; foreach ($otheritems as $i) { $output .= "<li>{$narray[$i]}</li>\n"; } $output .= "</ul>\n"; } /******************************************************************************* * helper functions ********************************************************************************/ function filter_my_array($array) { // reduces the lines of text to arrays of the keywords in the line $results = []; foreach ($array as $k => $str) { $str = no_punc($str); $a = array_filter(explode(' ', $str), 'remove_noise'); $results[$k] = $a; } return $results; } function remove_noise($x) { $stopWords = array('about','an','and','are','as','at','be','by','com','de','en','for','from', 'how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where', 'who','will','with','und','the','www','donald','trump'); return strlen($x) > 3 && !in_array(strtolower($x), $stopWords); } function index_keywords($array, &$kwords) { // gets the line numbers containing each keyword $results = []; foreach ($array as $k => $kwarr) { foreach ($kwarr as $kw) { $results[$kw][] = $k; if (isset($kwords[$kw])) { ++$kwords[$kw]; // count keyword usage } else { $kwords[$kw]=1; } } } return $results; } function no_punc($str) { $allow = array_merge([32,39], range(ord('a'), ord('z')), range(ord('0'), ord('9'))); $k = strlen($str); $res = ''; $str = strtolower($str); for ($i=0; $i<$k; $i++) { if (in_array(ord($str[$i]), $allow) ) { $res .= $str[$i]; } else $res .= ' '; } return $res; } ?> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>Keyword Index</title> <style type='text/css'> .hi { font-weight: 700; color: red; } .count { font-weight: 100; color: #f44; } </style> </head> <body> <?=$output?> </body> </html> Bye. Quote Link to comment Share on other sites More sharing options...
Pangu Posted August 26, 2015 Author Share Posted August 26, 2015 thank you very much! that already helped my quite a l lot! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.