jconey Posted June 13, 2010 Share Posted June 13, 2010 New to PHP and MySQL. Not sure I'm even posting this in the right area but here it goes. I have html and text files with data in them. Not delimited in the normal way at all. Most of the text is in paragraph form but all the files have the same data in them. For example a page might look like this: Item1 text Item2 Text Text Text… Item3 text Item 4 Text Text Text Text Text Item# = a name like Year/item number/description... etc. There are about 25 items (or fields) each varies in length and paragraph style. For instance Item 4 in the example might just have one word or it might have 7 paragraphs. This would be easy if I only had two dozen files... but I have upwards of 100,000+ files, most are .html on a CD. :-\ OH one more thing... the many of the 'Item titles' are followed by a : (description:) but not all item names have it. I'm not very DB literate but I am IT/PC literate. I really need to find a quick and hopefully semi-automated way to import/convert this information in batches. Even if I could get it into excel or access, I could get it into PHP/MySQL from there myself. Don’t know if it matters but one of the fields has a photo, which I just need the name/link from not the photo. Please let me now if you have any ideas or need more information. Thank you! JConey Quote Link to comment https://forums.phpfreaks.com/topic/204650-importing-or-converting-problem/ Share on other sites More sharing options...
jconey Posted June 13, 2010 Author Share Posted June 13, 2010 BTW: Nto sure if it matters for this particular question but... My Host is running: MySQL version 5.0.90-community-log Apache version 2.0.63 PHP version 5.2.9 I also use MS Excel/Access 2007 (or earlier), PHPMagic Pro & Plus, Adobe CS3 Master Suite... Thanks Again! JConey Quote Link to comment https://forums.phpfreaks.com/topic/204650-importing-or-converting-problem/#findComment-1071474 Share on other sites More sharing options...
James25 Posted June 14, 2010 Share Posted June 14, 2010 I did not quite catch your question, please specify once more. Quote Link to comment https://forums.phpfreaks.com/topic/204650-importing-or-converting-problem/#findComment-1071730 Share on other sites More sharing options...
jconey Posted June 14, 2010 Author Share Posted June 14, 2010 Basically I have a lot of .html and .txt files that have data in them and I extract that data. If I can get it into a excel spreadsheet, MS access I can get it to MYSQL from there. A closer estimate is about 152,400 files. I'd love to find a batch method or some automated or semi automated way to extract this data a useable format. (Spreadsheet, MS Access Table, MySQL Table...) The first post explains the file contents. I'm pretty good at importing data if there is a consistent delimiter. I have two issues here as I see it. #1 how do I handle so many files without repeating a set of procedures 152K times. #2 the lack of a consistent delimiter The files all have the same type of information and in the same order though. I'm sure someone in cyber space has run into this before I just hope the solution is not in the realm of theoretical physics. Thanks, Jeff Quote Link to comment https://forums.phpfreaks.com/topic/204650-importing-or-converting-problem/#findComment-1071867 Share on other sites More sharing options...
fenway Posted June 15, 2010 Share Posted June 15, 2010 If you get it into CSV, you're done. Quote Link to comment https://forums.phpfreaks.com/topic/204650-importing-or-converting-problem/#findComment-1072365 Share on other sites More sharing options...
jconey Posted June 15, 2010 Author Share Posted June 15, 2010 Yes... I agree now I need to find a way to do that.. got any suggestions? Jeff Quote Link to comment https://forums.phpfreaks.com/topic/204650-importing-or-converting-problem/#findComment-1072386 Share on other sites More sharing options...
fenway Posted June 16, 2010 Share Posted June 16, 2010 Sorry, I didn't realize at which stage you were stuck. The easiest answer is multi-pass. Write a short script that iterate though all the files in a directory (for example), and then just does a basic, dumb import based on file type (or whatever else you know for certain). Then once you have it in MySQL, you can work your PHP magic to fix it, until there's nothing wrong. Quote Link to comment https://forums.phpfreaks.com/topic/204650-importing-or-converting-problem/#findComment-1072956 Share on other sites More sharing options...
jconey Posted June 17, 2010 Author Share Posted June 17, 2010 I think I found my solution but it wasn't what I set out looking for. I was looking in the wrong direction. As I surfed for a solution I stumbled on data mining, page scraping and data harvesting. Most of the files I have to work with are .HTML so I dug into how to use these methods and I came up with gold. First I created a .html file with a link to all the files... that was simpler than thought it would be. Once all the files we're "linked" by the new file that I created, I could run web-harvest (sourceforge) or any number of other tools available on the web. As soon as the files were all linked the program treated it as a site and surfed the entire thing extracting the data I wanted. Web-Harvest took some playing around with to configure but it worked in the end. That made me think about it and if you ever run into a website that has information you need spread all through it this same tactic would work perfectly, as a matter of fact that is what these tools were really created for. I'd recommend HT Track or web2Disk by InSpyder to capture the website's content than run web-harvest to extract the data to a CSV, spreadsheet or what ever you need. All these tools mentioned are available on the web some free and some not free but cheap just the same. Keep this information in mind, might come in handy some day! Thank you - to all that gave thought to my problem! Jeff Quote Link to comment https://forums.phpfreaks.com/topic/204650-importing-or-converting-problem/#findComment-1073512 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.