kateland Posted January 4, 2007 Share Posted January 4, 2007 Hi all,Let me get right down to it without a lot of fuss :)OBJECTIVE: Scrape data from a web page, organize it, store in database.PROGRESS: Stuck at "organizing" --> Multidimensional arrays that need to be sliced, keys reset, etcTARGET: Not to kill self tryingSo here we go. I have a scrape script that pulls the table I want from a URL and can spit it back at me. Fantastic. I then turn it into an array to store in a MySQL database. Here is the Array:[code]Array( [0] => Array ( [0] => Overview [1] => Games ) [1] => Array ( ) [2] => Array ( [0] => Playlist [1] => Level [2] => Games Played [3] => Wins ) [4] => Array ( [0] => Rumble Pit [1] => 15 ) [5] => Array ( [1] => 76 [2] => 7 ) [6] => Array ( ) [8] => Array ( [0] => Double Team [1] => 20 ) [9] => Array ( [1] => 188 [2] => 80 ) [10] => Array ( ) [12] => Array ( [0] => Team Slayer [1] => 25 ) [13] => Array ( [1] => 407 [2] => 177 ) [14] => Array ( ) [16] => Array ( [0] => Team Skirmish [1] => 19 ) [17] => Array ( [1] => 533 [2] => 183 ) [18] => Array ( ) [20] => Array ( [0] => Team Snipers [1] => 20 ) [21] => Array ( [1] => 69 [2] => 41 ) [22] => Array ( ) [24] => Array ( [0] => Team Hardcore [1] => 14 ) [25] => Array ( [1] => 71 [2] => 29 ) [26] => Array ( ) [28] => Array ( [0] => BTB Skirmish [1] => 27 ) [29] => Array ( [1] => 356 [2] => 135 ) [30] => Array ( ) [32] => Array ( ) [33] => Array ( ) [34] => Array ( ) [35] => Array ( ) [36] => Array ( [0] => Questions about Stats? Stats Help: Halo 2 and Bungie.net Gamertag Linking: Get additional features! Halo 2 Matchmaking: Matchmaking Unveiled Halo 2 Stats: Ranking Overview Halo 2 Medals: Medal Info ) [37] => Array ( ) [39] => Array ( [0] => Games | Stats | Community | Inside Bungie | Bungie Store | Home | [1] => contact us | [2] => help ) [40] => Array ( [0] => privacy statement | [1] => terms of use | [2] => code of conduct | [3] => jobs ) [41] => Array ( [0] => © 2006 Microsoft Corporation All rights reserved. [1] => Halo 3 [2] => Halo 2 Xbox [3] => Halo 2 Vista [4] => Last Updated: Halo 2 Vista Home ) [42] => Array ( [1] => Halo 2 Stats [2] => Playlists [3] => Find Player [4] => Rank System [5] => My Stats ) [43] => Array ( [1] => Forums [2] => Find Group [3] => Events [4] => Fanclub [5] => Links ) [44] => Array ( [1] => The Team [2] => Webcams [3] => Bungie History [4] => Last Updated: Inside Bungie Section ) [45] => Array ( [1] => T-Shirts [2] => Multi-Media [3] => Accessories [4] => Newest Item: The entire store! ))[/code]PROBLEMS:1. I can't get rid of the empty arrays2. The data I need is between "rumble pit" and "BTB Skirmish"3. I need to combine those arrays, e.g.[code][4] => Array ( [0] => Rumble Pit [1] => 15 ) [5] => Array ( [1] => 76 [2] => 7 )[/code]needs to be [code][4] => Array ( [0] => Rumble Pit [1] => 15 [2] => 76 [3] => 7 )[/code]and so on for the 7 game types.So my question is...do I try to tweak the scrape script (it's breaking up "Rumble Pit" etc in the game types due to a nested table) or should I just manipulate this array to heck?I've spent two days looking up slices, unset functions, combining, user-defined functions...but I'm stuck.Any guidance would be greatly appreciated! (I did learn that you can't unset an array...how disappointing).Thanks!Kate Quote Link to comment https://forums.phpfreaks.com/topic/32841-scrapes-multidimensional-arrays-and-hair-pulling/ Share on other sites More sharing options...
hvle Posted January 4, 2007 Share Posted January 4, 2007 I think the problem is the pulling of data and store in this array. This array is meaningless because it does not contains consistent data. Think about how you get this array, not how to manipulate it. Quote Link to comment https://forums.phpfreaks.com/topic/32841-scrapes-multidimensional-arrays-and-hair-pulling/#findComment-152898 Share on other sites More sharing options...
kateland Posted January 5, 2007 Author Share Posted January 5, 2007 Very good point. I'm looking into improving the array in the first place. Happen to know any good web resources for reading and parsing remote files? I've done google searches this morning and have a couple of starts, but nothing amazing. Quote Link to comment https://forums.phpfreaks.com/topic/32841-scrapes-multidimensional-arrays-and-hair-pulling/#findComment-153551 Share on other sites More sharing options...
hvle Posted January 6, 2007 Share Posted January 6, 2007 There is no universal way to parse remote files, or any file. It is all depend on the file's flow, consistency, and structure. And some files are not parse-able. To make a file parse-able, people like to use RSS style format. Quote Link to comment https://forums.phpfreaks.com/topic/32841-scrapes-multidimensional-arrays-and-hair-pulling/#findComment-154095 Share on other sites More sharing options...
.josh Posted January 6, 2007 Share Posted January 6, 2007 yeah...An array of random stuff like that is really just as useless as the source page itself. Scraping a remote webpage is more of an artform than a rigid 123 procedure. You just need to get really good at regex and get good at finding a consistent pattern in the target source. And even then, you can spend a lot of time perfecting the regex on the scrape and boom! they change up their layout the next day. Very frustrating. My first advice for you is to contact the site and see if they can't offer some kind of xml version of their data for you to easily grab. Doesn't hurt to ask. Quote Link to comment https://forums.phpfreaks.com/topic/32841-scrapes-multidimensional-arrays-and-hair-pulling/#findComment-154121 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.