L1GH7 Posted February 6, 2018 Share Posted February 6, 2018 Hello all, I'm not new to coding, but I'm pretty new to php. What I need to do is scrape A.php for a link to B.php, and then scrape B.php for a link to C.php so I can then scrape data from C.php- and push said data out to X.php because A.php is dynamic. Here is an example of my current code with which I am obtaining the desired link to B.php from A.php: <?php require('simple_html_dom.php'); $html = file_get_html('http://test.com/'); $i = 1; foreach ($html->find('tr') as $desiredItem) { if ($i > 2) { break; } // Find link element $desiredItemDetails = $desiredItem->find('a.tag', 0); // Get href attribute $desiredItemUrl = 'test.com/' . $desiredItemDetails->href; $i++; } echo ($desiredItemUrl); ?> I've tried re-initializing $html by passing $desiredItemUrl through file_get_html. This doesn't seem to work, even if I call it as a string. Is this not possible? Is there simply an easier/more efficient way of doing this? Any help is greatly appreciated. Thanks! Quote Link to comment Share on other sites More sharing options...
requinix Posted February 6, 2018 Share Posted February 6, 2018 Are all these pages on your site? Why are there so many pages involved that are scraping each other? Why can't X.php do everything? Quote Link to comment Share on other sites More sharing options...
L1GH7 Posted February 6, 2018 Author Share Posted February 6, 2018 (edited) Thanks for the quick reply, A.php and B.php are on my site, but the link I need to traverse to in B.php is actually an external domain I have no control over. And X.php is actually X.html right now as I'm unsure how to go about this. Because A.php is dynamic, I was under the impression I would need to begin from there. Edited February 6, 2018 by L1GH7 Quote Link to comment Share on other sites More sharing options...
requinix Posted February 6, 2018 Share Posted February 6, 2018 So I'm about 95% sure that this process you're describing is either way convoluted or flat out the wrong approach to this. Can you be more precise than these A/B/C/X.php files and scraping links and pushing data? Basically, if you control A.php and B.php then there's no reason why B should have to scrape anything from A - you could just copy the code or logic or whatever that A is using into B. B gets what it needs naturally. Not sure what C or X are supposed to be. Quote Link to comment Share on other sites More sharing options...
L1GH7 Posted February 6, 2018 Author Share Posted February 6, 2018 Ok, my apologies. The main page (A.php) is essentially a dynamic table which lists ranked clients 1 through n. I'm simply grabbing the first client's link in the table (which may be different on any given day). This link provides another page which houses bio/information on the client (B.php). On This page, there is an external link (C.php) I need to retrieve a name and profile image from so that I may display them on a greeting/information page (X). I need data from the external link, but only if the external link correlates to the rank 1 client found on A.php. I hope this is more precise, thanks for your time. Quote Link to comment Share on other sites More sharing options...
requinix Posted February 6, 2018 Share Posted February 6, 2018 A.php has the logic to list clients. You control this page so you can find that logic and replicate it in X.php in order to get the first client in that list. B.php has the logic to display information. You control this page so you can find that logic and replicate it in X.php in order to get that external link. I don't know if you control C.php. If so then I'm sure you can guess what I would say. X.php doesn't have to do any scraping from any pages that you control because you can just copy the logic driving each page. So as far as I'm concerned the only unanswered issue is with C.php... Quote Link to comment Share on other sites More sharing options...
L1GH7 Posted February 7, 2018 Author Share Posted February 7, 2018 Ok I see what you're saying, but after looking into the main html and php files, not only are they extremely confusing, but there is a database that's being read from, a common.php and a bunch of other includes which all seem to require each other. So, I've essentially been trying to replicate the entire site in a sub directory for testing, trying to fix error after error, and it seems to be heavily cumbersome when all that's needed is a name and an avatar. I guess I still don't know if it's just a matter of me being inexperienced with php, or if it really would be better to have some kind of method of traversing through a few pages and grabbing what I need. Quote Link to comment Share on other sites More sharing options...
requinix Posted February 7, 2018 Share Posted February 7, 2018 Scraping your own site is definitely not the right answer. I can tell you that right now. As much of a burden as it may be, learning how your site works is the best thing you can do. You should do it regardless of this project. Quote Link to comment Share on other sites More sharing options...
L1GH7 Posted February 7, 2018 Author Share Posted February 7, 2018 Gotcha. I'll keep you posted, and return if I run into any issues. Thanks again for your help thus far! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.