carstorm Posted December 6, 2014 Share Posted December 6, 2014 This would be an example of a page I'm referring to: http://www.curse.com/mc-mods/minecraft/railcraft#t1:other-downloads I want to get the file name and compare it to another name (to see if they are the same or not). I have no idea how to access that page from my website. I'm guessing PHP would be best although if you don't think so tell me what you think would be better (I was thinking javascript but don't think that's cross-site compatible). Is there a PHP function to get the value of the href which is in this case (as of this posting) "Railcraft 9.4.0.0". I used inspect element to see the code of the page but have no idea if this is possible or if I'm even going about this the right way. The code would have to work on this entire website (all pages have the smae layout as this one though! Any help is greatly appreciated! Quote Link to comment Share on other sites More sharing options...
carstorm Posted December 6, 2014 Author Share Posted December 6, 2014 After doing some more digging, I still didn't find an answer however I thought that it might be easier to get the value from http://www.curse.com/mc-mods/minecraft/railcraft where it says "Newest File: Railcraft 9.4.0.0" based on the structure of the coding though I could be wrong. What do you think? Quote Link to comment Share on other sites More sharing options...
Alex_ Posted December 6, 2014 Share Posted December 6, 2014 You could just send a GET request from your php script which gets the entire dom object from that page, and then use a dom-parser that can extract it for you. See example http://simplehtmldom.sourceforge.net/ Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted December 7, 2014 Share Posted December 7, 2014 It's entirely possible to scrape sites, but you should get their permission. They have the curseforge api , That's the better way to do this. http://www.curseforge.com/projects/ Quote Link to comment Share on other sites More sharing options...
carstorm Posted December 7, 2014 Author Share Posted December 7, 2014 It's entirely possible to scrape sites, but you should get their permission. They have the curseforge api , That's the better way to do this. http://www.curseforge.com/projects/ Never knew curse had an API so I will look into that. However I need to also get other sites so need a cross-site method but am starting with curse forge since that is where most info is. Maybe it would help to know my end goal. When everything is set up it will compare versions of a list that you give it to the most recent version released and tell you what mods need updating. My goal is to help people who have many mods (like me at 200+) easily know when to update which mods since right now if they don't inform you when you first launch the game you have no way to know without manually checking yourself which can be time consuming and tedious if you have many mods. Quote Link to comment Share on other sites More sharing options...
WinstonLA Posted December 7, 2014 Share Posted December 7, 2014 Also you can use file_get_contents() (if server configuration enabled allow_url_fopen setting) + RegExp Quote Link to comment Share on other sites More sharing options...
carstorm Posted December 7, 2014 Author Share Posted December 7, 2014 You could just send a GET request from your php script which gets the entire dom object from that page, and then use a dom-parser that can extract it for you. See example http://simplehtmldom.sourceforge.net/ Though this looked good at first glance, the project seems to no longer be downloadable on sourceforge. Also you can use file_get_contents() (if server configuration enabled allow_url_fopen setting) + RegExp This is what I'm probably going to use. As of right now it just simply returns the entire page as if I was looking at it on its regular site. Using RegExp I'm guessing there is a way to search for specific content on the page. How would I do this. I think finding the line " <li class="newest-file">Newest File: Railcraft 9.4.0.0</li>" would be the best way? Quote Link to comment Share on other sites More sharing options...
carstorm Posted December 7, 2014 Author Share Posted December 7, 2014 Ok this is what I got so far. It's not erroring but also not displaying anything. For now I'm just echoing to make sure it's working with be only being interested in the last line being "9.4.0.0" as of right now! <?php $page = file_get_contents("http://www.curse.com/mc-mods/minecraft/railcraft"); $posStart = strpos($page, "Newest File: "); echo "Start " $posStart; $posStop = strpos($page, "<",$startPos); echo "Stop " $posStop; $version = substr($posStart,$posStop); echo $version; ?> Quote Link to comment Share on other sites More sharing options...
WinstonLA Posted December 7, 2014 Share Posted December 7, 2014 (edited) $page = file_get_contents("http://www.curse.com/mc-mods/minecraft/railcraft"); preg_match('#<li class="newest-file">(.*?)</li>#is', $page, $version); echo '<pre>' . htmlspecialchars(print_r($version, 1)) . '</pre>'; The result is Array ( [0] => <li class="newest-file">Newest File: Railcraft 9.4.0.0</li> [1] => Newest File: Railcraft 9.4.0.0 ) Or like this preg_match('#<li class="newest-file">.*([0-9\.]++).*</li>#Uis', $page, $versionNum); echo $versionNum[1]; //9.4.0.0 If you need only version number Edited December 7, 2014 by WinstonLA Quote Link to comment Share on other sites More sharing options...
carstorm Posted December 7, 2014 Author Share Posted December 7, 2014 Sorry for the triple post but I can't edit posts I had errors in concatenation and one variable was wrong. Now it shows everything but the version. The output is "Start 22228Stop 22258version" This is the current code: <?php$page = file_get_contents("http://www.curse.com/mc-mods/minecraft/railcraft"); $posStart = strpos($page, "Newest File: "); echo "Start " . $posStart; $posStop = strpos($page, "<",$posStart); echo "Stop " . $posStop; $version = substr($posStart,$posStop); echo "version " . $version; ?> Quote Link to comment Share on other sites More sharing options...
carstorm Posted December 7, 2014 Author Share Posted December 7, 2014 $page = file_get_contents("http://www.curse.com/mc-mods/minecraft/railcraft"); preg_match('#<li class="newest-file">(.*?)</li>#i', $page, $version); echo '<pre>' . htmlspecialchars(print_r($version, 1)) . '</pre>'; The result is Array ( [0] => <li class="newest-file">Newest File: Railcraft 9.4.0.0</li> [1] => Newest File: Railcraft 9.4.0.0 ) Sorry I was making my last post and didn't see it till after I posted mine. Just wondering though what makes your way different than mine (other than obviously the code ) Quote Link to comment Share on other sites More sharing options...
WinstonLA Posted December 7, 2014 Share Posted December 7, 2014 String functions that you're using will be more performance than RegExp used me xD Quote Link to comment Share on other sites More sharing options...
carstorm Posted December 7, 2014 Author Share Posted December 7, 2014 (edited) I just looked at http://php.net/manual/en/function.preg-match.php but didn't really understand it. In the end I need a string (it can be one of the array elements) that has a value of exactly "9.4 .0.0". How would I go about this and if you can either explain the code and/or link me to a good source of info on preg_match? Thank You! Edited December 7, 2014 by carstorm Quote Link to comment Share on other sites More sharing options...
WinstonLA Posted December 7, 2014 Share Posted December 7, 2014 (edited) I just looked at http://php.net/manua....preg-match.php but didn't really understand it Regular expressions is a magiс, not all can understand it xD In the end I need a string (it can be one of the array elements) You can get access to the string in array specify array key index. No problem link me to a good source of info on preg_match You need not preg_match() but regular expressions before. Good book about it http://shop.oreilly.com/product/9780596528126.do Edited December 7, 2014 by WinstonLA Quote Link to comment Share on other sites More sharing options...
Barand Posted December 7, 2014 Share Posted December 7, 2014 You should read the manual for substr() $version = substr($page, $posStart, $posStop-$posStart); Quote Link to comment Share on other sites More sharing options...
Barand Posted December 7, 2014 Share Posted December 7, 2014 String functions that you're using will be more performance than RegExp used me xD Running both methods 100 times gave me STRPOS: 0.002000 sec REGEX : 0.004000 sec Quote Link to comment Share on other sites More sharing options...
WinstonLA Posted December 7, 2014 Share Posted December 7, 2014 @Barand Thanks for benchmark Quote Link to comment Share on other sites More sharing options...
carstorm Posted December 7, 2014 Author Share Posted December 7, 2014 Running both methods 100 times gave me STRPOS: 0.002000 sec REGEX : 0.004000 sec So does this mean I should use strpos since this will be running on 100s if not 1000s of pages at once in the end when it's done for the best performance. Quote Link to comment Share on other sites More sharing options...
carstorm Posted December 7, 2014 Author Share Posted December 7, 2014 You should read the manual for substr() $version = substr($page, $posStart, $posStop-$posStart); I read that but misunderstood it the first time. Thanks for clearing that up. Quote Link to comment Share on other sites More sharing options...
Barand Posted December 7, 2014 Share Posted December 7, 2014 So does this mean I should use strpos since this will be running on 100s if not 1000s of pages at once in the end when it's done for the best performance. That is how I would interpret the results that I got. Regex is powerful but slow compared to native string functions. Quote Link to comment Share on other sites More sharing options...
carstorm Posted December 7, 2014 Author Share Posted December 7, 2014 (edited) Ok I got working code with a final value "version" that I can use with this code <?php echo nl2br("\nRailcraft",false); $page = file_get_contents("http://www.curse.com/mc-mods/minecraft/railcraft"); $posStart = strpos($page, "Newest File: ")+23; echo nl2br("\nStart " . $posStart,false); $posStop = strpos($page, "<",$posStart); echo nl2br("\nStop " . $posStop,false); $version = substr($page, $posStart, $posStop-$posStart); echo nl2br("\nversion " . $version,false); echo nl2br("\nTinker's Construct",false); $page = file_get_contents("http://www.curse.com/mc-mods/minecraft/tinkers-construct"); $posStart = strpos($page, "Newest File: ")+31; echo nl2br("\nStart " . $posStart,false); $posStop = strpos($page, "<",$posStart)-4; echo nl2br("\nStop " . $posStop,false); $version = substr($page, $posStart, $posStop-$posStart); echo nl2br("\nversion " . $version,false); ?> However for every entry there will be 3 things I will have to know that will be different: page, posStart, and posStop. I was wondering what would be the best way to organise these. I was thinking some kind of array but have no idea if that's possible or if it's even the best solution. I would need something such as (using the two above examples): Railcraft page = http://www.curse.com/mc-mods/minecraft/railcraftposStartModifier = 23posStopModifer = -0 Tinker's Construct page = http://www.curse.com/mc-mods/minecraft/tinkers-construct posStartModifier =31; posStopModifer = -4; If you think I should start a new thread let me know since my original question is solved. Edited December 7, 2014 by carstorm Quote Link to comment Share on other sites More sharing options...
carstorm Posted December 7, 2014 Author Share Posted December 7, 2014 I think this is achieves what I'm trying to do unless anyone else has a diff/better idea $mods = array ( array("RailCraft","http://www.curse.com/mc-mods/minecraft/railcraft",23,0), array("Tinker's Construct","http://www.curse.com/mc-mods/minecraft/tinkers-construct",31,-4), ); Quote Link to comment Share on other sites More sharing options...
Barand Posted December 7, 2014 Share Posted December 7, 2014 As well as the array, use a function instead of repeating the code every time function getMod($mod) { $page = file_get_contents($mod[1]); $posStart = strpos($page, "Newest File: ") + $mod[2]; $posStop = strpos($page, "<",$posStart) + $mod[3]; $version = substr($page, $posStart, $posStop-$posStart); $output = "<br>{$mod[0]}<br>version $version<br>"; return $output; } $mods = array ( array("RailCraft","http://www.curse.com/mc-mods/minecraft/railcraft",23,0), array("Tinker's Construct","http://www.curse.com/mc-mods/minecraft/tinkers-construct",31,-4), ); foreach ($mods as $mod) { echo getMod($mod); } Quote Link to comment Share on other sites More sharing options...
carstorm Posted December 7, 2014 Author Share Posted December 7, 2014 Ya I had plans on turning it into a function eventually. What if I only wanted to get info on one mod though. I tried passing in "Railcraft" but nothing showed up. <?php function getMod($mod) { $page = file_get_contents($mod[1]); $posStart = strpos($page, "Newest File: ") + $mod[2]; $posStop = strpos($page, "<",$posStart) + $mod[3]; $version = substr($page, $posStart, $posStop-$posStart); $output = "<br>{$mod[0]}<br>version $version<br>"; return $output; } $mods = array ( array("RailCraft","http://www.curse.com/mc-mods/minecraft/railcraft",23,0), array("Tinker's Construct","http://www.curse.com/mc-mods/minecraft/tinkers-construct",31,-4), ); echo getMod("RailCraft") ?> Quote Link to comment Share on other sites More sharing options...
Barand Posted December 7, 2014 Share Posted December 7, 2014 The way it is at the moment you need to pass the array echo getMod( array("RailCraft","http://www.curse.com/mc-mods/minecraft/railcraft",23,0) ); Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.