Jump to content

Extract data from site?


Graxeon

Recommended Posts

I'm trying to build a calculator for a game...but I'm stuck on the extracting part. I don't really know what it's called (so if you know of a tutorial/handbook please link), but how I would I extract this data:

 

//data to extract
$name = "Robin hood hat"
$currentprice = "3.3m"
$change = "+11.5k"
//display
echo $name
echo $currentprice
echo $change

 

from: http://itemdb-rs.runescape.com/results.ws?query=robin hood hat

 

And a small question: how would I convert "+11.5k" into "11500" (1k = 1,000)? The * multiplies it but how do I check for and remove the "+" and "k"?

Link to comment
https://forums.phpfreaks.com/topic/213411-extract-data-from-site/
Share on other sites

$source=preg_replace('/\.$/', '', $source);

 

param1=regex to search for, must be enclosed in 2 slashes, one at each end.

param2=string to replace with

param3=string to search and replace on

 

If (preg_match('/\+([\d\.])+k/', $change, $matches))
    {
    //Full match of whole regex is in $matches[0].
    //First parenthesis match in $matches[1].
    $amt=$matches[1]; 
    $fullamt=$amt*1000;
    }
//If k is 1000, then m must mean a million.
If (preg_match('/\+([\d\.])+m/', $change, $matches))
    {
    $amt=$matches[1]; //Full match of whole regex is in $matches[0].
    $fullamt=$amt*1000000;
    }

In your case:

$change=preg_replace('/^\+/', '', $change); //Drop + at beginning of string, + must be escaped with backslash.
$change=preg_replace('/[km]$/', '', $change); //Drop k or m at end of string

 

^ represents beginning of string

$ represents end of string

 

That will get you started.

Are you trying to grab data from a generated web page, like from the link you gave? I would call that "screen scraping". Basically, you put the whole webpage into a string and parse the string, one line at a time. Or, make it an array, each entry in the array is a line in the html file.

 

HTML is just text, after all. I did this once using MS Access and the Yahoo stock quote page.

 

 

Well...thats good except that this would have to be done for over 100 different pages. How would I make it more dynamic? I know google docs does it by table like this:

 

=Index(ImportHtml("http://itemdb-rs.runescape.com/results.ws?query=Robin Hood Hat", "table", 2),2,4)

 

(that's for the "Change")

 

Is there something similar to this in PHP?

I just found a problem with the preg_replace.

 

I need to convert all values that higher than 999 (for example, +11.5k needs to convert to 11500). I would do this by just multiplying the source (or $change in your code) by 1000. But how would I distinguish between a value that has a "k" or "m"  at the end (higher than 999) and one that doesn't (lower than 1000)?

Never mind about the converting question. I'm still clueless on the extracting question xD

 

Btw...here's how I did the converting (yes, it's lengthy and newbie...but it works for what I need since all values go to the tenths):

 

if (strpos($source, 'k') !== false) {
    if (strpos($source, '.') !== false) {
    $source = preg_replace("/[^0-9]/", "", $source);
echo $source*100;
   } else {
             $source = preg_replace("/[^0-9]/", "", $source);
         echo $source*1000;
              } 
} else {
      if (strpos($source, 'm') !== false) {
          if (strpos($source, '.') !== false) {
          $source = preg_replace("/[^0-9]/", "", $source);
      echo $source*100000;
        } else {
         $source = preg_replace("/[^0-9]/", "", $source);
         echo $source*1000000;
		 }
	} else {
	      $source = preg_replace("/[^0-9]/", "", $source);
	      echo $source;
		   }
	}

 

I think you can just put the preg_replace at the top of the code and it'll do the same thing (so it won't be repetitive). I just kept it cause I might need the original values later on.

Your options are:

 

1) Get read-only access directly into the site database. Talk to the site owners and state you just want READ ONLY access, which will prevent you (or a hacker who hacks your program) from messing with their db.

 

2) Ask the site owners to provide data you need in a tab-delimited text file. Update the file each day. The site owners can automate this process using unix cron and use SQL to dump the data you need from a query into a text file. (Is once a day enough for your purposes?)

 

3) Write your own screen scraper.

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.