fibblax Posted September 22, 2011 Share Posted September 22, 2011 Hello, I am doing (well trying to do, anyway) a script where I need to follow a link, and through its source (file_get_contents) i need to to follow each "category" and into its "subcategory" (and sometimes even SUBsubcategory). Lets say the first in the menu is called "Catfood", the second "Dogfood", you click on "Catfood" and you get a submenu with for example "Whiskas", another one called "Purina Pro", and you click "Whiskas" and you see a list of food called for example "Whiskas Junior Chicken" and "Whiskas Junior Fish". then after i have followed "Whiskas", i need to go back and follow "Purina Pro". then after "Purina Pro" i need to go back to "Dogfood" and do the same to its submenu + subsubmenu + food menu.. so yeah, thats pretty much it basically.. i have already used wget for windows to download the entire website to not put load on it all the time when trying stuff out.. i use RegEx to find categories, their products and price, and i got all that covered, it's just that the website isn't built very friendly for using Regex to tie lets say "Purina Pro" with being in the "Catfood" category, so i have to go through all categories and subcats to save the categories in maybe an array, and bind the subcategories to the main category ("Purina Pro" with "Catfood") i hope this all doesn't sound too errr weird lol, any help is very much appreciated even enough to just get me started on my own! =) ****************** EDIT BELOW: The menu looks a bit like this, though it's orinally not about cat or dogfood, they are just examples Catfood - Whiskas - Purina Pro Dogfood - Royal Canin - Puppy food - Grown - Senior - Bozita Robur Quote Link to comment https://forums.phpfreaks.com/topic/247669-get-categories-and-subcategories-from-website/ Share on other sites More sharing options...
requinix Posted September 22, 2011 Share Posted September 22, 2011 And the owners of that website know you're indexing all their products? And gave you explicit permission to do so? Quote Link to comment https://forums.phpfreaks.com/topic/247669-get-categories-and-subcategories-from-website/#findComment-1271831 Share on other sites More sharing options...
fibblax Posted September 22, 2011 Author Share Posted September 22, 2011 And the owners of that website know you're indexing all their products? And gave you explicit permission to do so? Ah, knew i should have mentioned that, it is my friends' website, so yes he did give me permission do it, also it's only for an educational purpose, the info won't be used in any way, after i get through it the files may either just lay in a folder somewhere on my computer and rot until i need parts of the script for another project, or deleted. *shrugs* Thanks for pointing it out though! Quote Link to comment https://forums.phpfreaks.com/topic/247669-get-categories-and-subcategories-from-website/#findComment-1271838 Share on other sites More sharing options...
requinix Posted September 22, 2011 Share Posted September 22, 2011 Well, the "proper" way would be for his site to expose an API that gives a list of categories, products, and prices. It could be simple XML output like ... Stuff like that is very easy to generate. If you're thinking of screen scraping specifically, load the HTML into something like a DOMDocument, and traverse the DOM as if it was regular HTML on a webpage. That includes finding stuff by ID or tag name, child nodes, and even XPath expressions for more complicated stuff. Quote Link to comment https://forums.phpfreaks.com/topic/247669-get-categories-and-subcategories-from-website/#findComment-1271843 Share on other sites More sharing options...
Psycho Posted September 22, 2011 Share Posted September 22, 2011 If this is your friends site it would be a pretty trivial task for your friend to create a web service to allow you to get the full list of categories/subcategories. But, let's say you are only doing this as an educational exercise on how to screen-scape the data from a web-page. But, as stated above you may need to obtain permission first. based on your explanation it is not clear "how" the subcategories are getting displayed. Is the menu system a javascript controlled thing and all the data is in the current page? If so, it may be easy or difficult (even impossible) to differentiate the categories/subcategories. However, if the subcategories are displayed on a page refresh after selecting a category, then you could do this using cURL. In either case you need to analyze the layout of how categories/subcategories are constructed and build the logic to decipher it. That means your code will be very "fragile" and can break any time the site owner changes content/structure. Quote Link to comment https://forums.phpfreaks.com/topic/247669-get-categories-and-subcategories-from-website/#findComment-1271845 Share on other sites More sharing options...
fibblax Posted September 22, 2011 Author Share Posted September 22, 2011 If this is your friends site it would be a pretty trivial task for your friend to create a web service to allow you to get the full list of categories/subcategories. But, let's say you are only doing this as an educational exercise on how to screen-scape the data from a web-page. But, as stated above you may need to obtain permission first. based on your explanation it is not clear "how" the subcategories are getting displayed. Is the menu system a javascript controlled thing and all the data is in the current page? If so, it may be easy or difficult (even impossible) to differentiate the categories/subcategories. However, if the subcategories are displayed on a page refresh after selecting a category, then you could do this using cURL. In either case you need to analyze the layout of how categories/subcategories are constructed and build the logic to decipher it. That means your code will be very "fragile" and can break any time the site owner changes content/structure. No, the menu is simplistically written with <br>'s inside a table: <TD> <BR><a href="products.asp" title="" class="">PRODUCTS</a> (top menu) <BR><BR> - <a href="kits.asp" title="kits" class="">KITS</a> (submenu) <BR> - <a href="CLASHES.asp" title="clashes" class="">CLASHES</a> (submenu) <BR> - <a href="WALLS.asp" title="walls" class="">WALLS</a> (submenu) <BR><BR> - <a href="JOLT.asp" title="" class="">JOLT</a> (submenu of WALLS) ... ... ... ... </TR> and thank you both for answering Quote Link to comment https://forums.phpfreaks.com/topic/247669-get-categories-and-subcategories-from-website/#findComment-1271853 Share on other sites More sharing options...
fibblax Posted September 22, 2011 Author Share Posted September 22, 2011 Oh and yes sorry, uhh the data is in the current page of the selected sub/category Quote Link to comment https://forums.phpfreaks.com/topic/247669-get-categories-and-subcategories-from-website/#findComment-1271855 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.