Graxeon Posted September 14, 2010 Share Posted September 14, 2010 I'm trying to build a calculator for a game...but I'm stuck on the extracting part. I don't really know what it's called (so if you know of a tutorial/handbook please link), but how I would I extract this data: //data to extract $name = "Robin hood hat" $currentprice = "3.3m" $change = "+11.5k" //display echo $name echo $currentprice echo $change from: http://itemdb-rs.runescape.com/results.ws?query=robin hood hat And a small question: how would I convert "+11.5k" into "11500" (1k = 1,000)? The * multiplies it but how do I check for and remove the "+" and "k"? Quote Link to comment https://forums.phpfreaks.com/topic/213411-extract-data-from-site/ Share on other sites More sharing options...
bulrush Posted September 14, 2010 Share Posted September 14, 2010 $source=preg_replace('/\.$/', '', $source); param1=regex to search for, must be enclosed in 2 slashes, one at each end. param2=string to replace with param3=string to search and replace on If (preg_match('/\+([\d\.])+k/', $change, $matches)) { //Full match of whole regex is in $matches[0]. //First parenthesis match in $matches[1]. $amt=$matches[1]; $fullamt=$amt*1000; } //If k is 1000, then m must mean a million. If (preg_match('/\+([\d\.])+m/', $change, $matches)) { $amt=$matches[1]; //Full match of whole regex is in $matches[0]. $fullamt=$amt*1000000; } In your case: $change=preg_replace('/^\+/', '', $change); //Drop + at beginning of string, + must be escaped with backslash. $change=preg_replace('/[km]$/', '', $change); //Drop k or m at end of string ^ represents beginning of string $ represents end of string That will get you started. Quote Link to comment https://forums.phpfreaks.com/topic/213411-extract-data-from-site/#findComment-1111094 Share on other sites More sharing options...
Graxeon Posted September 14, 2010 Author Share Posted September 14, 2010 hehe...yeah I just found something similar to that. Thank you though, yours is cleaner I'm still wondering about the extraction part, though Quote Link to comment https://forums.phpfreaks.com/topic/213411-extract-data-from-site/#findComment-1111095 Share on other sites More sharing options...
twittoris Posted September 14, 2010 Share Posted September 14, 2010 I think you can use xpath with domdocument or get_file_contents Quote Link to comment https://forums.phpfreaks.com/topic/213411-extract-data-from-site/#findComment-1111096 Share on other sites More sharing options...
Graxeon Posted September 14, 2010 Author Share Posted September 14, 2010 I know about get_file_contents but I don't know how to specify that exact data Quote Link to comment https://forums.phpfreaks.com/topic/213411-extract-data-from-site/#findComment-1111097 Share on other sites More sharing options...
bulrush Posted September 14, 2010 Share Posted September 14, 2010 Are you trying to grab data from a generated web page, like from the link you gave? I would call that "screen scraping". Basically, you put the whole webpage into a string and parse the string, one line at a time. Or, make it an array, each entry in the array is a line in the html file. HTML is just text, after all. I did this once using MS Access and the Yahoo stock quote page. Quote Link to comment https://forums.phpfreaks.com/topic/213411-extract-data-from-site/#findComment-1111100 Share on other sites More sharing options...
Graxeon Posted September 14, 2010 Author Share Posted September 14, 2010 Well...thats good except that this would have to be done for over 100 different pages. How would I make it more dynamic? I know google docs does it by table like this: =Index(ImportHtml("http://itemdb-rs.runescape.com/results.ws?query=Robin Hood Hat", "table", 2),2,4) (that's for the "Change") Is there something similar to this in PHP? Quote Link to comment https://forums.phpfreaks.com/topic/213411-extract-data-from-site/#findComment-1111107 Share on other sites More sharing options...
Graxeon Posted September 14, 2010 Author Share Posted September 14, 2010 I just found a problem with the preg_replace. I need to convert all values that higher than 999 (for example, +11.5k needs to convert to 11500). I would do this by just multiplying the source (or $change in your code) by 1000. But how would I distinguish between a value that has a "k" or "m" at the end (higher than 999) and one that doesn't (lower than 1000)? Quote Link to comment https://forums.phpfreaks.com/topic/213411-extract-data-from-site/#findComment-1111118 Share on other sites More sharing options...
Graxeon Posted September 14, 2010 Author Share Posted September 14, 2010 Never mind about the converting question. I'm still clueless on the extracting question xD Btw...here's how I did the converting (yes, it's lengthy and newbie...but it works for what I need since all values go to the tenths): if (strpos($source, 'k') !== false) { if (strpos($source, '.') !== false) { $source = preg_replace("/[^0-9]/", "", $source); echo $source*100; } else { $source = preg_replace("/[^0-9]/", "", $source); echo $source*1000; } } else { if (strpos($source, 'm') !== false) { if (strpos($source, '.') !== false) { $source = preg_replace("/[^0-9]/", "", $source); echo $source*100000; } else { $source = preg_replace("/[^0-9]/", "", $source); echo $source*1000000; } } else { $source = preg_replace("/[^0-9]/", "", $source); echo $source; } } I think you can just put the preg_replace at the top of the code and it'll do the same thing (so it won't be repetitive). I just kept it cause I might need the original values later on. Quote Link to comment https://forums.phpfreaks.com/topic/213411-extract-data-from-site/#findComment-1111129 Share on other sites More sharing options...
Graxeon Posted September 15, 2010 Author Share Posted September 15, 2010 So does anyone know of a method to extract the information similar to the way I describe in the last post? (through tables and not specific lines from get_file_contents) Quote Link to comment https://forums.phpfreaks.com/topic/213411-extract-data-from-site/#findComment-1111213 Share on other sites More sharing options...
bulrush Posted September 17, 2010 Share Posted September 17, 2010 Your options are: 1) Get read-only access directly into the site database. Talk to the site owners and state you just want READ ONLY access, which will prevent you (or a hacker who hacks your program) from messing with their db. 2) Ask the site owners to provide data you need in a tab-delimited text file. Update the file each day. The site owners can automate this process using unix cron and use SQL to dump the data you need from a query into a text file. (Is once a day enough for your purposes?) 3) Write your own screen scraper. Quote Link to comment https://forums.phpfreaks.com/topic/213411-extract-data-from-site/#findComment-1112185 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.