andrewgarn Posted October 15, 2008 Share Posted October 15, 2008 Is there anyway to remove words under a certain length and over a certain length from a file? e.g. scan string remove words under 5 letters remove words over 15 letters Thanks Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/ Share on other sites More sharing options...
Maq Posted October 15, 2008 Share Posted October 15, 2008 Do you have any code so far? Grab the file, scan the words and if (strlen($string) 15) { remove } Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666342 Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 Its an attempt to remove junk code and be left with plain words after opening a ppt file in php. 4kHgWSvfbs5lGv4skT4QPlxEiW1Z1cQK4vxuPKZfContentTypes.xmlPK1arels.relsPKGgLdrsshapexml.xmlPKvdrsdownrev.xmlPKLg DEnrico Andy EEE WNff3PPT10eD @B uphP xv0e0eRectangle 4PKZfContentTypes.xmlMO 2WRcJF0iKvLw 9uSq:wGi KIc oVjTMRc042CMPka8DkHbL8e iKXN6rco4y@oPK1arels.relsj0qCNoK ILcXm0XFo0xMeXIN4aG2RKIZ 4M9ctBm:f@3nOrjxR0T0@WBL5vPKOdrsshapexml.xmlTn1G4nPUQSKiw@63bLI7nc.d0Z5d E5peE8j4.3Ulec5 DJ Wbhocrv7u EAeVr::aam@9@FaVGpA61j3x: V3K8str.xN SICv02 ZIFP 6zCmJyI0TZRUwiI52p0fD.ZS EJCRURyn3y4ljM qDNq9FgKIm h2QRlnNLTxUCgZQqjv75olRgzyf0l4KlmzTGJG:5zP JPjBjh nA7 PKdrsdownrev.xmlDK0Cx1tFmloBz DC6JQVpRiAJBs 7 qo32n:kocSywYKglO tPKZfContentTypes.xmlPK1arels.relsPKOdrsshapexml.xmlPKtdrsdownrev.xmlPKd0 g EBusiness ECommerce R x0e0eRectangle 5 @Yg A narrow view of ebusiness is selling over the web is etailing Most web surfers have bought online UK: 76 Online retail sales continue to grow rapidly annual growth 2003: 51 2004: 24 2005: 22 2006: 33.4 2007: 54 2008: 28 expected and faster than traditional retail 45 growth Offline sales are influenced by the web and vice versa An alternative allembracing vision is: The use of Internet technologies eg the web to support the core activities of businesses and organisationsVcmcmA WNff3PPT10u. D @B 0 D0e0eRectangle 2d0 g GTypes of eBusiness x@H0e0eRectangle 3 @Yg B2C retailers content providers portals social networks etc. B2B eprocurement exchanges etc C2C auctions classifieds e.g. eBay P2P Kazaa BitTorrent MCommerce 3G WiFi iPhone Blackberry WNff380PPT10.j @ xJ0e0eRectangle 2PKZfContentTypes.xmlMO 2WRcJF0iKvLw 9uSq:wGi KIc oVjTMRc042CMPka8DkHbL8e iKXN6rco4y@oPK1arels.relsj0qCNoK ILcXm0XFo0xMeXIN4aG2RKIZ 4M9ctBm:f@3nOrjxR0T0@WBL5vPK2drsshapexml.xmlTn1GmnPUQG4MxmcM If you know of any other way of doing it... The code before it is: <?php $myFile = "test.ppt"; $fh = fopen($myFile, 'r'); $data = fread($fh, filesize($myFile)); fclose($fh); //echo $theData; $newdata = preg_replace("/[^a-zA-Z0-9\s.@:]/", "", $data); echo $newdata; ?> Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666347 Share on other sites More sharing options...
revraz Posted October 15, 2008 Share Posted October 15, 2008 You would lose real words too. I see 1 - 3 letters in the code as well as normal 1-3 char words. Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666349 Share on other sites More sharing options...
Maq Posted October 15, 2008 Share Posted October 15, 2008 You need to open the .ppt file and grab the lines, explode with a " " (space) delimiter and check the 2 lengths. May I ask how it got this way in the first place? Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666350 Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 I just posted the code of how I got so far. I dont mind losing small words, as i'm trying to create a searchable index of the words in the ppt file, so words like 'and', 'the', 'a' are not important. Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666351 Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 Do you have any code so far? Grab the file, scan the words and if (strlen($string) < 5 || strlen($string) > 15) { remove } to do that I would need to explode the string then implode it again afterwards? Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666405 Share on other sites More sharing options...
prexep Posted October 15, 2008 Share Posted October 15, 2008 If you want to remove words from 5 letters less and 15 letters more from a file. Through multiple words like a document. I'd split (preg_split) the file or variable which will turn it into a array. (like $split[0][0]) Then useing a foreach loop running through all the arrays and taking out the 5 less chars and 15 more chars. By using strlen. Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666468 Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 Wouldnt it just be a 1d array? as you are only seperating by spaces? Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666492 Share on other sites More sharing options...
prexep Posted October 15, 2008 Share Posted October 15, 2008 Possibly, I may of included the setoff. =p. Yea its only one. Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666497 Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 Right I still have data like this: sRij@W@HFUn IDATxyxUwMro@I 5oB @hEqw0L 4M3Ne2 O0CCm3v o:y:XqrcisX Xi TlH DOTm qbN:lun I@J xeYzAbhb TnLXTtE jPSVUnrpL5Ntx 21Aest 2XpgKSKi9.pSG quoqfq D ye2oQ4 CP0 F yw iLG@J. PH5Ns 7z4s@5ZBP Yf fNG 3AHiMx.I GB:yM6lZIGREN G0SE B4vDmzI J lGvm xNSAH 8DBZHX8 upXpH4@ 4J04I:gKFL @q bdl9M t1sLlBcOo 3 0 Any suggestions on removing? Should I have decoded the file somehow first on opening? Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666555 Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 Why is $pieces = explode(" ", $newdata); still giving me pieces[] with spaces in them? Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666560 Share on other sites More sharing options...
prexep Posted October 15, 2008 Share Posted October 15, 2008 Try: $pieces = preg_split("[/s]", $newdata); Or something like that. It should take out all the whitespaces if I wrote it right. Preg's not my best. Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666565 Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 Try: $pieces = preg_split("[/s]", $newdata); Or something like that. It should take out all the whitespaces if I wrote it right. Preg's not my best. That puts the whole string in $pieces[0] Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666575 Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 I have this now: Not very efficient I know but here it is: <?php $myFile = "test.ppt"; $fh = fopen($myFile, 'r'); $data = fread($fh, filesize($myFile)); fclose($fh); //echo $theData; $data = utf8_decode($data); //$newdata = preg_replace("/[^a-zA-Z0-9\s.@:]/", "", $data); //echo $newdata; //$new = preg_split('/ /', $newdata, -1, PREG_SPLIT_OFFSET_CAPTURE); $pieces = explode(" ", $data); //$pieces = preg_split("[/s]", $newdata); //echo '1test1'.$pieces[0].'2test2'; $count = count($pieces); //echo 'Result: '.$count; $output = ''; $i = 0; while($i < $count) { if(strlen($pieces[$i]) > 4 && strlen($pieces[$i]) < 15 && strpos("$pieces[$i]","@") == FALSE && strpos("$pieces[$i]",".") == FALSE && strpos("$pieces[$i]",":") == FALSE) { $output = $output.' '.$pieces[$i]; //echo $i.'<br>'.$pieces[$i]; } $i++; } //echo $output; $newoutput = preg_replace("/[^a-zA-Z0-9\s.@:]/", "", $output); //echo $newoutput; $newpieces = explode(" ", $newoutput); $count = count($newpieces); $output2 = 'test'; $a = 0; while($a < $count) { if(strlen($newpieces[$a]) > 4 && strlen($newpieces[$a]) < 15) { $output2 = $output2.' '.$newpieces[$a]; //echo $i.'<br>'.$pieces[$i]; } $a++; } echo $output2; Output: test bdbb2 IGsbE U4y 19 HP4S7 fFNYF QqOK1 Ssk6 IDATw KII1 8Ys6 H QEYV3 K0u G tdeSD i2WZK 8 IDATRPt xoKd6 b2IaM L TRhT Yne2F Click Master title Master styles Second level Third level Fourth level Fifth EBusiness bg1lt1 tx1dk1 bg2lt2 tx2dk2 hlinkhlink CwfP Techniques Click Master title Master styles Second level Third level Fourth level Fifth bg1lt1 tx1dk1 bg2lt2 tx2dk2 hlinkhlink Master styles Second level Third level Fourth level Fifth 18AaR bg1lt1 tx1dk1 bg2lt2 tx2dk2 hlinkhlink 18AaR bg1lt1 tx1dk1 bg2lt2 tx2dk2 hlinkhlink EBusiness Gerding Gravell narrow bought retail sales faster retail sales vision retailers content providers portals social networks exchanges auctions classifieds eBay P2P Kazaa iPhone Objectives Learning understanding EBusiness theoretical issues practical use Covers Models EBusiness Development using Business Interchange Services Mobile Commerce Digital Signatures Electronic Payment Protocols Recommender Systems Smart Cards Software Agents Software Negotiation Computational markets auctions important techniques issues designing building modeling Ebusiness relevant technologies smart cards electronic payment coursework assignment instructions November 2008 85 hours answer three questions selection cover technical design implementation assuming already about relational databases networks distributed systems programming Outline Course Introduction Enrico Gerding Scriptin Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666587 Share on other sites More sharing options...
prexep Posted October 15, 2008 Share Posted October 15, 2008 As long as it works for you. I told you I wasn't the best at preg. =p and I think my expression was wrong. Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666594 Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 The data is mostly clean now just a bit of junk at the beginning, also a few words are missing from the text, any idea why? test bdbb2 IGsbE U4y 19 HP4S7 fFNYF QqOK1 Ssk6 IDATw KII1 8Ys6 H QEYV3 K0u G tdeSD i2WZK 8 IDATRPt xoKd6 b2IaM L TRhT Yne2F Click Master title styles Second level Third level Fourth level Fifth EBusiness bg1lt1 tx1dk1 bg2lt2 tx2dk2 hlinkhlink CwfP Techniques 18AaR Gerding Gravell narrow bought retail sales faster vision retailers content providers portals social networks exchanges auctions classifieds eBay P2P Kazaa iPhone Objectives Learning understanding theoretical issues practical use Covers Models Development using Business Interchange Services Mobile Commerce Digital Signatures Electronic Payment Protocols Recommender Systems Smart Cards Software Agents Negotiation Computational markets important techniques designing building modeling Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666601 Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 As long as it works for you. I told you I wasn't the best at preg. =p and I think my expression was wrong. I stuck to my exploding, had to do it twice though, and still have data[] with spaces in it, why? I dont understand Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666620 Share on other sites More sharing options...
prexep Posted October 15, 2008 Share Posted October 15, 2008 If i'm right or wrong line 26 and 44. ($output = $output.' '.$pieces[$i] Contain a space in it. Try $output .= $pieces[$i]; Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666627 Share on other sites More sharing options...
andrewgarn Posted October 16, 2008 Author Share Posted October 16, 2008 Thats intentional, I want that space or all the words are joined together in the output. What i mean is on the second explode: <?php $newpieces = explode(" ", $newoutput); $count = count($newpieces); $output2 = ''; $a = 0; while($a < $count) { if(strlen($newpieces[$a]) > 4 && strlen($newpieces[$a]) < 15 && strpos("$output2","$newpieces[$a]") == FALSE) { $newpieces[$a] = strtolower($newpieces[$a]); $output2 = $output2.' '.$newpieces[$a]; //echo '<br>'.$a.$output2; echo '<br>$newpieces['.$a.'] = '.$newpieces[$a].'<br>'; } $a++; } Look at these: $newpieces[257] = styles second $newpieces[258] = level third $newpieces[259] = level fourth $newpieces[260] = level fifth Why have those not been split? Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666637 Share on other sites More sharing options...
prexep Posted October 16, 2008 Share Posted October 16, 2008 Are all the characters space's between each word? Could be a different whitespace. And I'll suggest the same one as I did on above to this one to. Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666640 Share on other sites More sharing options...
andrewgarn Posted October 16, 2008 Author Share Posted October 16, 2008 Changing the line to this: $output2 = $output2.''.$newpieces[$a]; gives me this as an output: Output is: avxdsfdbb2vqme0igsbeu4y 19rgqslza9f6 qqok1ri 4 ccci2idat1idatwkii1fdyllqeyv3htdesdri2zk 898s9er10a x1fxhirvuk1fwva8jb2iaml trhtp9uk0jaktenynfe2fclickmastertitlemasterstyles secondlevel thirdlevel fourthlevel fifthebusinessbg1lt1tx1dk1bg2lt2tx2dk2hlinkhlinktechniquesclickmastermasterstyles secondlevel thirdlevel fourthlevel fifthmasterstyles secondlevel thirdlevel fourthlevel fifthebusinessgerdinggravellnarrowboughtretailsalesfastervisionretailerscontentprovidersportalssocialnetworksexchanges auctionsclassifiedsebay p2p kazaaiphoneobjectiveslearningunderstandingebusinesstheoreticalissuespracticaluse Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666642 Share on other sites More sharing options...
prexep Posted October 16, 2008 Share Posted October 16, 2008 Right Click -> View Source and see how they are outputted. Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666648 Share on other sites More sharing options...
andrewgarn Posted October 16, 2008 Author Share Posted October 16, 2008 <br><br>$newpieces[278] = master<br><br>$newpieces[279] = styles second<br><br>$newpieces[280] = level third<br><br>$newpieces[281] = level fourth<br><br>$newpieces[282] = level fifth<br><br>$newpieces[288] = master<br><br>$newpieces[289] = styles second That means the problem is hidden /n right? Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666651 Share on other sites More sharing options...
prexep Posted October 16, 2008 Share Posted October 16, 2008 Yepp. =). Thats why I had used /s on my split. You didn't explode all whitespaces. Quote Link to comment https://forums.phpfreaks.com/topic/128580-solved-remove-words-from-string-underover-a-certain-length/#findComment-666653 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.