davil Posted November 5, 2009 Share Posted November 5, 2009 Hi all, I was just wondering if anybody has any experience of this. Basically, I'm building a site for a guy and he keeps throwing tonnes of content my way, all Word docs with tables, images, etc. The site is PHP / MySQL based and when it's handling text it's so easy to use. I want it to preserve formatting if possible and maybe where the images are (aligned left or right at the very least) At the moment I have integrated a Javascript RTE (Rich Text Editor) called fckeditor, an earlier version of Ckeditor [ http://ckeditor.com/ ] I think fckeditor is free for commercial projects and Ckeditor isn't (could be wrong but it doesn't matter too much to my main query anyhow) So basically the user will have to copy in each block of text and then upload each image seperately and align it left or right or whatever, which is fine by me, but for the moment I need to put some of these docs in myself, and it can be very, very, time-consuming. I'm looking into ways to get his word documents up in the most automatic way possible. Here are a few of my options 1. wvWare - wvware.sourceforge.net/ I'm not exactly sure how to install it but I do have a VPS so it should be at least possible. Does anybody have any experience using this ? I suppose the question I'm asking is: Does it work? 2. There are a few RTF to html converters here etc.: http://www.w3.org/Tools/Word_proc_filters.html but nothing that seems to do the job for me so far. I'm afraid to start downloading them all and testing because I'm on a tight schedule here and I just need to know what will actually work for me. 3. I could ask the user to save the file as XML (he's bright enough) and then use simplexml to parse, but I'm having trouble finding any info on google about using simplexml to extract the image bits to files or whatever. Perhaps storing them in the MySQL database as binary is the best way but I'd prefer not to have the overhead if possible and just to save JPG or PNG files out and have the html link to them. Is there anybody out there that has done this before? 4. Maybe there's a client-side app for this ? Written in Java or Flash or something ??? 5. Perhaps MHTML - I could get him to save as .MHT archive and then do something with PHP on the server side - if I can't find any other solution I will probably try this one next. So does anybody know of the best option for this or should I stick with the Rich text editor road ? Any alternative (that works and is relatively easy for the client to do) is fine. Thanks in advance Quote Link to comment https://forums.phpfreaks.com/topic/180422-whats-best-way-to-get-a-users-word-doc-converted-to-simple-html-and-images/ Share on other sites More sharing options...
simshaun Posted November 5, 2009 Share Posted November 5, 2009 I usually try to avoid responding unless I have an answer or direct criticism, but you will probably spend a lot more time trying to get an automated solution working than it would take to actually do it manually. Word to HTML is never pretty, thanks to Words crappy markup language. If you want your pages to be clean at all, I suggest pasting as plain text and inserting the images manually even though it will take a lot of time. Also: If your company *requires* a license, you do have the option of purchasing one. Otherwise, CKEditor is free for commercial projects just as FCKEditor is. I currently use CKEditor, but I've had to spend a while reducing the bugs to where our clients can use it. If you don't have the time to fix bugs, I suggest waiting for a future release of CKEditor before upgrading. Quote Link to comment https://forums.phpfreaks.com/topic/180422-whats-best-way-to-get-a-users-word-doc-converted-to-simple-html-and-images/#findComment-951810 Share on other sites More sharing options...
davil Posted November 5, 2009 Author Share Posted November 5, 2009 Well at least you're honest. Yeah I thought as much myself but there's just *SO* much crap to do. - I myself have no trouble in getting images from documents and saving them as JPG and uploading, but client isn't that tech savvy. but if it's something he can do in word, like save as MHT and then upload, he'd get it. I understand your point and agree whole-heartedly. However if I can get the stuff into the Database and the JPGS up to the server any way at all I'd be happy, and then he can edit the kinks out of each article later on himself. I just found this MHTML class and will test: http://www.phpclasses.org/browse/file/23132.html The stuff it outputs seems ok. I can strip a lot of the nasty crappy Word HTML code and tags myself with PHP if necessary but I do have a feeling that yeah this is going to be a lost cause. Thanks anyway for posting your opinion. Quote Link to comment https://forums.phpfreaks.com/topic/180422-whats-best-way-to-get-a-users-word-doc-converted-to-simple-html-and-images/#findComment-951815 Share on other sites More sharing options...
davil Posted November 5, 2009 Author Share Posted November 5, 2009 Ok so I tested the MHTML class - it's not too bad actually. Sure the code needs to be cleaned up a little bit but here's an example: http://www.thedavil.com/testing/Augustinian Saints and Blessed.doc got converted to http://www.thedavil.com/testing/index.htm Which isn't terrible. but not great either. I'm going to do a bit more work on this MHT way of doing things but I'm not gonna work too hard - if it doesn't work out then I'll go back to manual. My next step is too see if I can get that HTML into the Rich Text Editor and see can the little kinks be worked out there. because if they can, it's just a case of moving the files around and putting the HTML into the database and I'm onto a winner..... Still dubious though Quote Link to comment https://forums.phpfreaks.com/topic/180422-whats-best-way-to-get-a-users-word-doc-converted-to-simple-html-and-images/#findComment-951830 Share on other sites More sharing options...
simshaun Posted November 5, 2009 Share Posted November 5, 2009 Take a look at the source code of that page. Quote Link to comment https://forums.phpfreaks.com/topic/180422-whats-best-way-to-get-a-users-word-doc-converted-to-simple-html-and-images/#findComment-951831 Share on other sites More sharing options...
davil Posted November 5, 2009 Author Share Posted November 5, 2009 Yeah I know, an absolute nightmare. But I'm being optimistic here and thinking I might be able to clean it up a bit. :-D Quote Link to comment https://forums.phpfreaks.com/topic/180422-whats-best-way-to-get-a-users-word-doc-converted-to-simple-html-and-images/#findComment-951834 Share on other sites More sharing options...
davil Posted November 5, 2009 Author Share Posted November 5, 2009 Do you know if Word-saved XML is as bad? or worse? Quote Link to comment https://forums.phpfreaks.com/topic/180422-whats-best-way-to-get-a-users-word-doc-converted-to-simple-html-and-images/#findComment-951839 Share on other sites More sharing options...
simshaun Posted November 5, 2009 Share Posted November 5, 2009 Looking at an XML version of the word doc you posted, it doesnt look any more promising. You could try saving the Word doc as "Web page, filtered". Its not perfect, but probably the best option available next to manually copying, pasting, and fixing. Quote Link to comment https://forums.phpfreaks.com/topic/180422-whats-best-way-to-get-a-users-word-doc-converted-to-simple-html-and-images/#findComment-951842 Share on other sites More sharing options...
davil Posted November 5, 2009 Author Share Posted November 5, 2009 Yeah maybe... I know this sounds stupid but I have to ask .... if I save as filtered html in Word, then re-open and save as MHT or XML will it gain all the same crap code again? I suppose if I really want to go down this road (crazy as it is) I could write a small EXE for the client that saves it as filtered HTML and then zips up all the stuff for upload. It is a bit crazy though alright. :-D btw, Is ckeditor any better than fckeditor (I don't mind fixing a few bugs if they're small and fixable by PHP) Quote Link to comment https://forums.phpfreaks.com/topic/180422-whats-best-way-to-get-a-users-word-doc-converted-to-simple-html-and-images/#findComment-951851 Share on other sites More sharing options...
simshaun Posted November 5, 2009 Share Posted November 5, 2009 I personally like CKEditor better than FCKEditor, but its still a bit buggy even after I spent hours spent fixing some bugs. Fixing the bugs is probably more involved than you want to get with it. If I were you, I would probably stick with FCKEditor until CKEditor increases a couple versions. Edit: Saving a word doc as filtered html and then resaving the filtered html as MHT seems to bring back a lot of the Word formatting junk. Quote Link to comment https://forums.phpfreaks.com/topic/180422-whats-best-way-to-get-a-users-word-doc-converted-to-simple-html-and-images/#findComment-951855 Share on other sites More sharing options...
davil Posted November 5, 2009 Author Share Posted November 5, 2009 Thanks sim. Great feedback from ya Quote Link to comment https://forums.phpfreaks.com/topic/180422-whats-best-way-to-get-a-users-word-doc-converted-to-simple-html-and-images/#findComment-951858 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.