Jump to content

Recommended Posts

This would be an example of a  page I'm referring to: http://www.curse.com/mc-mods/minecraft/railcraft#t1:other-downloads

 

I want to get the file name and compare it to another name (to see if they are the same or not). I have no idea how to access that page from my website. I'm guessing PHP would be best although if you don't think so tell me what you think would be better  (I was thinking javascript but don't think that's cross-site compatible). Is there a  PHP function to get the value of the href which is in this case (as of this posting) "Railcraft 9.4.0.0".

 

I used inspect element to see the code of the page but have no idea if this is possible or if I'm even going about this the right way. The code would have to work on this entire website (all pages have the smae layout as this one though! Any help is greatly appreciated!

 

 

Link to comment
https://forums.phpfreaks.com/topic/292935-get-a-value-from-another-website/
Share on other sites

After doing some more digging, I still didn't find an answer however I thought that it might be easier to get the value from http://www.curse.com/mc-mods/minecraft/railcraft where it says "Newest File: Railcraft 9.4.0.0" based on the structure of the coding though I could be wrong. What do you think?

It's entirely possible to scrape sites, but you should get their permission.

 

They have the curseforge api , That's the better way to do this.

http://www.curseforge.com/projects/

Never knew curse had an API so I will look into that. However I need to also get other sites so need a cross-site method but am starting with curse forge since that is where most info is.

 

Maybe it would help to know my end goal. When everything is set up it will compare versions of a list that you give it to the most recent version released and tell you what mods need updating. My goal is to help people who have many mods (like me at 200+) easily know when to update which mods since right now if they don't inform you when you first launch the game you have no way to know without manually checking yourself which can be time consuming and tedious if you have many mods.

You could just send a GET request from your php script which gets the entire dom object from that page, and then use a dom-parser that can extract it for you. See example http://simplehtmldom.sourceforge.net/

 

Though this looked good at first glance, the project seems to no longer be downloadable on sourceforge.

Also you can use file_get_contents()  (if server configuration enabled allow_url_fopen setting) + RegExp

This is what I'm probably going to use. As of right now it just simply returns the entire page as if I was looking at it on its regular site. Using RegExp I'm guessing there is a  way to search for specific content on the page. How would I do this. I think finding the line " <li class="newest-file">Newest File: Railcraft 9.4.0.0</li>" would be the best way?

Ok this is what I got so far. It's not erroring but also not displaying anything. For now I'm just echoing to make sure it's working with be only being interested  in the last line being "9.4.0.0" as of right now!

 

<?php
$posStart = strpos($page, "Newest File: ");
echo  "Start " $posStart;
$posStop = strpos($page, "<",$startPos);
echo  "Stop " $posStop;
$version = substr($posStart,$posStop);
echo $version;
?>
$page = file_get_contents("http://www.curse.com/mc-mods/minecraft/railcraft");
preg_match('#<li class="newest-file">(.*?)</li>#is', $page, $version);
echo '<pre>' . htmlspecialchars(print_r($version, 1)) . '</pre>';

The result is

Array
(
    [0] => <li class="newest-file">Newest File: Railcraft 9.4.0.0</li>
    [1] => Newest File: Railcraft 9.4.0.0
)

Or like this

preg_match('#<li class="newest-file">.*([0-9\.]++).*</li>#Uis', $page, $versionNum);
echo $versionNum[1]; //9.4.0.0
If you need only version number
Edited by WinstonLA

Sorry for the triple post but I can't edit posts :( I had errors in concatenation and one variable was wrong. Now it shows everything but the version. The output is "Start 22228Stop 22258version" This is the current code:

<?php$page = file_get_contents("http://www.curse.com/mc-mods/minecraft/railcraft");
$posStart = strpos($page, "Newest File: ");
echo  "Start " . $posStart;
$posStop = strpos($page, "<",$posStart);
echo  "Stop " . $posStop;
$version = substr($posStart,$posStop);
echo "version " . $version;
?>
$page = file_get_contents("http://www.curse.com/mc-mods/minecraft/railcraft"); preg_match('#<li class="newest-file">(.*?)</li>#i', $page, $version); echo '<pre>' . htmlspecialchars(print_r($version, 1)) . '</pre>';

The result is

Array (     [0] => <li class="newest-file">Newest File: Railcraft 9.4.0.0</li>     [1] => Newest File: Railcraft 9.4.0.0 )

 

Sorry I was making my last post and didn't see it till after I posted mine. Just wondering though what makes your way different than mine (other than obviously the code :) )

I just looked at http://php.net/manual/en/function.preg-match.php but didn't really understand it. In the end I need a string (it can be one of the array elements) that has a  value of exactly "9.4 .0.0". How would I go about this and if you can either explain the code and/or link me to a good source of info on preg_match? Thank You!

Edited by carstorm

I just looked at http://php.net/manua....preg-match.php but didn't really understand it

Regular expressions is a magiс, not all can understand it xD

 

In the end I need a string (it can be one of the array elements)

You can get access to the string in array specify array key index. No problem

 

link me to a good source of info on preg_match

You need not preg_match() but regular expressions before. Good book about it http://shop.oreilly.com/product/9780596528126.do

Edited by WinstonLA

So does this mean I should use strpos since this will be running on 100s if not 1000s of pages at once in the end when it's done for the best performance.

 

That is how I would interpret the results that I got.

 

Regex is powerful but slow compared to native string functions.

Ok I got working code with a final value "version" that I can use with this code

 


<?php
echo nl2br("\nRailcraft",false);
$posStart = strpos($page, "Newest File: ")+23;
echo  nl2br("\nStart " . $posStart,false);
$posStop = strpos($page, "<",$posStart);
echo  nl2br("\nStop " . $posStop,false);
$version = substr($page, $posStart, $posStop-$posStart);
echo nl2br("\nversion " . $version,false);
 
echo nl2br("\nTinker's Construct",false);
$posStart = strpos($page, "Newest File: ")+31;
echo  nl2br("\nStart " . $posStart,false);
$posStop = strpos($page, "<",$posStart)-4;
echo  nl2br("\nStop " . $posStop,false);
$version = substr($page, $posStart, $posStop-$posStart);
echo nl2br("\nversion " . $version,false);
?>

However for every entry there will be 3 things I will have to know that will be different: page, posStart, and posStop. I was wondering what would be the best way to organise these. I was thinking some kind of array but have no idea if that's possible or if it's even the best solution. I would need something such as (using the two above examples):

 

Railcraft

page = http://www.curse.com/mc-mods/minecraft/railcraft

posStartModifier = 23

posStopModifer = -0

 

Tinker's Construct

page = http://www.curse.com/mc-mods/minecraft/tinkers-construct

posStartModifier =31;

posStopModifer = -4;

 

If you think I should start a  new thread let me know since my original question is solved.

Edited by carstorm

I think this is achieves what I'm trying to do unless anyone else has a  diff/better idea

$mods = array
   (
   array("RailCraft","http://www.curse.com/mc-mods/minecraft/railcraft",23,0),
   array("Tinker's Construct","http://www.curse.com/mc-mods/minecraft/tinkers-construct",31,-4),
   );

As well as the array, use a function instead of repeating the code every time

function getMod($mod)
{
    $page = file_get_contents($mod[1]);
    $posStart = strpos($page, "Newest File: ") + $mod[2];
    $posStop = strpos($page, "<",$posStart) + $mod[3];
    $version = substr($page, $posStart, $posStop-$posStart);
    $output = "<br>{$mod[0]}<br>version $version<br>";
    return $output;
}

$mods = array
   (
   array("RailCraft","http://www.curse.com/mc-mods/minecraft/railcraft",23,0),
   array("Tinker's Construct","http://www.curse.com/mc-mods/minecraft/tinkers-construct",31,-4),
   );

foreach ($mods as $mod) {
    echo getMod($mod);
}

Ya I had plans on turning it into a   function eventually. What if I only wanted to get info on one mod though. I tried passing in "Railcraft" but nothing showed up.

<?php
function getMod($mod)
{
    $page = file_get_contents($mod[1]);
    $posStart = strpos($page, "Newest File: ") + $mod[2];
    $posStop = strpos($page, "<",$posStart) + $mod[3];
    $version = substr($page, $posStart, $posStop-$posStart);
    $output = "<br>{$mod[0]}<br>version $version<br>";
    return $output;
}

$mods = array
   (
   array("RailCraft","http://www.curse.com/mc-mods/minecraft/railcraft",23,0),
   array("Tinker's Construct","http://www.curse.com/mc-mods/minecraft/tinkers-construct",31,-4),
   );

   echo getMod("RailCraft")
?>
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.