andrewgarn Posted October 13, 2008 Share Posted October 13, 2008 I was looking into doing a project where one of the features could be searching a set of powerpoint slides for keywords. Is there anyway php can read a powerpoint file's text? Or anyway to extract a powerpoint presentation? another feature would be searching "notes webpages" which are html pages and wikis, this is possible in php right? Or would the data have to all be put in mysql? Quote Link to comment Share on other sites More sharing options...
andrewgarn Posted October 13, 2008 Author Share Posted October 13, 2008 Ideally want to make a system where a user can upload a .ppt file and it then be searchable. Quote Link to comment Share on other sites More sharing options...
andrewgarn Posted October 14, 2008 Author Share Posted October 14, 2008 not possible then? Quote Link to comment Share on other sites More sharing options...
Barand Posted October 14, 2008 Share Posted October 14, 2008 http://web.informbank.com/articles/technology/php-office-documents.htm Quote Link to comment Share on other sites More sharing options...
GKWelding Posted October 14, 2008 Share Posted October 14, 2008 You can actually search the text of a powerpoint file by just opening it in a text editor. PHP has functions for doing this, opening text documents that is, you just need to look. You may be able to get your search script to open each document and search it for the desired string. What i would recommend though is creating a mysql database with 3 columns. An ID column, a File column and a Text column. ID should be auto increment, File should be the filename of the powerpoint file, and Text should be a full text dump, done by PHP on upload of the ppt file. This should allow quicker searching, and then you can also use the File column to construct a link to the ppt document. You can also use sanitization methods to cut out most of the unusable data from the ppt file after conversion to a string. Quote Link to comment Share on other sites More sharing options...
andrewgarn Posted October 14, 2008 Author Share Posted October 14, 2008 http://web.informbank.com/articles/technology/php-office-documents.htm Thats just for making office documents as far as I can see. i didnt know that you could open powerpoint in a text editor and not have it completely garbled, i'll have a look at that. Quote Link to comment Share on other sites More sharing options...
Barand Posted October 14, 2008 Share Posted October 14, 2008 http://web.informbank.com/articles/technology/php-office-documents.htm Thats just for making office documents as far as I can see. If you apply a little thought, you can use it to read them too. Quote Link to comment Share on other sites More sharing options...
GKWelding Posted October 14, 2008 Share Posted October 14, 2008 http://web.informbank.com/articles/technology/php-office-documents.htm Thats just for making office documents as far as I can see. i didnt know that you could open powerpoint in a text editor and not have it completely garbled, i'll have a look at that. The majority of it is complete nonesense. However, the stuff that appears as text in the powerpoint is stored as plain text in the text file, you just have to find it. Quote Link to comment Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 And this is possible to do automatically? the nonsense follow a pattern/ etc that can make it removable via php? ¿ ÿ ? €Ã ¿ R e c t a n g l e 2 "ñˆ ©Ã‚ PK ! Zãfþ â [Content_Types].xml”‘MOÄ †ï&þ2WÓR=cJ÷`õ¨F×0iK¶Â`Ýý÷Òý¸×Ä#̼ÏûêÕvÅL‘w ®Ë 9íu½‚õSq‚:ƒ£w¤`G«æò¢^ï±ÈiÇ †”½”¬šKÈåIçã„)c/ê ö$oªêVjï¹T¤…MÝR‡ŸcÛ|}0‰42ˆ‡ÃâÒ¥ CÆ”MåìÌ–âØPæä~‡ø*k€üµa™œ/8æ^òÓDkH¼bLÏ8e i"Kã¿\¤¹ü²XN\ø®³šÊ6r›co4Ÿ¬ÎÑyÀ@ý_üû’;Áåþ‡šo ÿÿ PK ! 1Ý_aÒ _rels/.rels¤ÁjÃ0†ïƒ½ƒÑ½qÚC£No…^K» [iLcËX&mß¾¦0XFo;êú>ñïö·0©™²xŽÖMŠ¢eçã`àë|X}€’‚ÑáÄ‘ÜI`ß½¿íN4a©G2ú$ªR¢KIŸZ‹) 4œ(ÖMÏ9`©ctB{Áô¦m·:ÿf@·`ª£3nê|OÕü‡¼Í,Ü—ÆrÐÜ÷Þ¾¢jÇ×x¢¹R0T¸,Ï0ÓÜÔç@¿ö®ÿé•}WþBüL«õǬ5v ÿÿ PK ! a? Ÿ drs/shapexml.xml¤TÛN1}¯Ô°üZÑ„r‰ØT‰¶R„"BÕÇjv×›,ñÚ®í ¯ïj7’3ãÏåÌåüó®Ñb«|¨Éäáǡʶ¬Í*“ßo¯N¥‘LIÚ•ÉäçéûwçnœÀc&.“ëÝd0ÅZ5>Z§d•õ E°~5p^e"E8jô`4ªœÂ”Ù.ÝÂ3U\o^Ôe&ǧ££á±†ø½Q¢Xi%FrÐ+voÌm± }4ô7Ñ”žî‘â«@„±_<r9„O;[ÛºðÞÞ¯•¯áw"| Ö VŽÅE|pˆ2ÖQ+‰ðwˆt6>;>}êvÚ°ð’m@Ö)›¸»´åÃôœ&9þ@‡ì¿ç‚’F aý£÷žVøÕ’Gpú›A6g‡ã1j3þt2ã÷%ù¾Ä´ÍÌê ™V3™KÑ‘³¯Û8Šs³t+r.·x»ûAÞ &ÚÅk»\“SI¶óÖ}ÝÄv0°â2> ðÿ ÜÀËÿZIFPŸ†ü<¥ â&z€€CmJôy"I¯0TZŠRU·”/{ÜëØi+š›K¿Iê•5ñ"=É)p¥0,æẺž]´¦Hæ87Á‹"Š-q•†ü¥®|{—ªz«û¤†÷/Ò‹*þA¯—æíLûÛ]‚5o—ÏäÒxf®±5’J¤¼«5M€ÆMê{.;OMº¸nÚ¦nì]zM#ãL*sðå[èN‘—È“* ÛL¸à%åë FÐØe¢¤Ø(ô¡Spû÷ZÜœ¸2¼™tý¨¾&–ñÖ5ï·$[xk«D‡&δ¢d‡ãÔ†Oc¯j»¾ín‚ÕuÉ— 7¿ÊMWDüRŒWjªª°Ô ¿±mT~¹.ïE®[Ce&NQHôNÍ“óÌ`ØF'MÑõWÔi–¾×Ù}<ÛN“õÊm;7}©ZŽ¾§S㥕VQP>4w:²5š(z#PÔ ŠðFP„Þ¿ïj§âçÞÇCÔq¢øÜ 8Ýäi‚nú ÿÿ PK ! >Ø€Ú þ drs/downrev.xmlDKK1…÷‚ÿ!\ÁM¥”±i_T”Ž¢Ûëä΃™ÜIÚNÿ½Á….çð¾åz²ƒØ“c —3‚¸r¦ãFÃÇûãÅDˆÈǤáHÖ«Ó“%æÆxKû26"A8䨡qÌ¥UKÃÌÄ©«·Sô4 n™)5—;N-Žt×RÕ—;«áë5k>ë—P>¿Ýj÷Pp§h£õùÙTÜ€ˆ4Åÿq¡úþ¾ÿ+QOFÃõ"»Rsõæøí;³ÅÉkH‚I7©‚\ý ÿÿ PK- ! Zãfþ â [Content_Types].xmlPK- ! 1Ý_aÒ / _rels/.relsPK- ! a? Ÿ * drs/shapexml.xmlPK- ! >Ø€Ú þ p drs/downrev.xmlPK õ w ð ÿ d0< ð à ÿÿÿÿ ‘g ðS Ÿ ¨ Retrospective ¡ ‘ ª ð3 ð ( ðx ï€ @ÂY 0e ‚ ˜² ƒ 0e „ ˜² … ‡ ˆ ¿ ÿ ? €Ã ¿ R e c t a n g l e 3 ð @ ð à ÿÿÿÿ ‘g ðg Ÿ ¨ 1995-2000 idealistic aspirations Economists thought it would bring markets close to perfect competition and frictionfree commerce (dynamic pricing, perfect price and quality comparison, no brands, no geographical boundaries), leading to disintermediation Entrepreneurs believed in first mover advantage and network effects. Quote Link to comment Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 Any idea, you can see there is plain text in that, which is what I want, but how to remove the junk? Quote Link to comment Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 It also isnt containing all the text from the powerpoint, some of it is missing. I'd edit my posts, but this forum wouldnt let me. EDIT: this code: $myFile = "test.ppt"; $fh = fopen($myFile, 'r'); $theData = fread($fh, filesize($myFile)); fclose($fh); echo $theData; Gives this file: http://www.sendspace.com/file/v1rn81 Quote Link to comment Share on other sites More sharing options...
andrewgarn Posted October 15, 2008 Author Share Posted October 15, 2008 With preg replace i have got to data like this: $newdata = preg_replace("/[^a-zA-Z0-9\s.@]/", "", $data); WNff3PPT10eD @B uphP xv0e0eRectangle 4PKZfContentTypes.xmlMO 2WRcJF0iKvLw 9uSqwGi KIc oVjTMRc042CMPka8DkHbL8e iKXN6rco4y@oPK1arels.relsj0qCNoK ILcXm0XFo0xMeXIN4aG2RKIZ 4M9ctBmf@3nOrjxR0T0@WBL5vPKOdrsshapexml.xmlTn1G4nPUQSKiw@63bLI7nc.d0Z5d E5peE8j4.3Ulec5 DJ Wbhocrv7u EAeVraam@9@FaVGpA61j3x V3K8str.xN SICv02 ZIFP 6zCmJyI0TZRUwiI52p0fD.ZS EJCRURyn3y4ljM qDNq9FgKIm h2QRlnNLTxUCgZQqjv75olRgzyf0l4KlmzTGJG5zP JPjBjh nA7 PKdrsdownrev.xmlDK0Cx1tFmloBz DC6JQVpRiAJBs 7 qo32nkocSywYKglO tPKZfContentTypes.xmlPK1arels.relsPKOdrsshapexml.xmlPKtdrsdownrev.xmlPKd0 g EBusiness ECommerce R x0e0eRectangle 5 @Yg A narrow view of ebusiness is selling over the web is etailing Most web surfers have bought online UK 76 Online retail sales continue to grow rapidly annual growth 2003 51 2004 24 2005 22 2006 33.4 2007 54 2008 28 expected and faster than traditional retail 45 growth Offline sales are influenced by the web and vice versa An alternative allembracing vision is The use of Internet technologies eg the web to support the core activities of businesses and organisationsVcmcmA WNff3PPT10u. D @B 0 D0e0eRectangle 2d0 g GTypes of eBusiness x@H0e0eRectangle 3 @Yg B2C retailers content providers portals social networks etc. B2B eprocurement exchanges etc C2C auctions classifieds e.g. eBay P2P Kazaa BitTorrent MCommerce 3G WiFi iPhone Blackberry WNff380PPT10.j @ xJ0e0eRectangle 2PKZfContentTypes.xmlMO 2WRcJF0iKvLw 9uSqwGi KIc oVjTMRc042CMPka8DkHbL8e iKXN6rco4y@oPK1arels.relsj0qCNoK ILcXm0XFo0xMeXIN4aG2RKIZ 4M9ctBmf@3nOrjxR0T0@WBL5vPK2drsshapexml.xmlTn1GmnPUQG4MxmcM rriJ5ZePsUJdA8i9X3YmKNOZ95aJrovjVbAYrho02OVoU. FBa Jx8g2Nal4 BGWlLN8hUjTbXHF2nX7 DxAFBphT5JrV kECGgWN vk5zEkqn5f@6 niDQeFPRmCZ9qflkB vBRSFiT2fI@xK 7Te9 id.vnK6i5Df A JKD HgPKv drsdownrev.xmlDN0EHC4HJB PAxlMhlK vp5I3UlP .Y Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.