Jump to content

Archived

This topic is now archived and is closed to further replies.

dingles

array help with page scrape

Recommended Posts

I am able to scrape a single page for html data. I'm looking to be able to scrape multiple pages in the same script and store them as variables.

For example. The link to page one would be [a href=\"http://somepage.jsp?pid=10\" target=\"_blank\"]http://somepage.jsp?pid=10[/a] and the link to the second page would be [a href=\"http://somepage.jsp?pid=11\" target=\"_blank\"]http://somepage.jsp?pid=11[/a]

Here is the code that works with scraping data from the first page and saving information to variables:

[code]<?php

// the url I need to fetch for the user information
$url = "http://somepage.jsp?pid=";

//Append the users pid onto the URL and read the page into the $page as an array
$page = file($url . $_POST['10']);
if (!is_array($page))
{
    echo "No Array found";
    exit;
}

//Put the page in the $page variable
$page = trim(implode("\n", $page));

//Some variables to store things
$variable1 = "";
$variable2 = "";
$variable3 = "";
$variable4 = "";


//The pattern to scrape
$pattern = "/<td align=\"right\">(.*)<\/td>\n/i";
if (preg_match_all($pattern, $page, $out, PREG_PATTERN_ORDER))
{
    $variable1 = $out[1][0];
    $variable2 = $out[1][3];
    $variable3 = $out[1][4];
    $variable4 = $out[1][5];
}

?> [/code]


Is it simple to add something so I can store info for PID=11 or should I run 2 seperate scripts?

Share this post


Link to post
Share on other sites
I'm not quite sure how to do it myself, but I think a foreach statement could help you out. Someone else can probably elaborate that alittle more.

Share this post


Link to post
Share on other sites
Well, the question would be how are you passing the value to the page to decide which file you want to read?

Gaia is mostly right, but we have to know the above before we proceed. If this is hardcoded, and you put them in an array:
[code]
<?php
$myarray = array('10', '11', '12');

foreach($myarray as $key => $val)
{
    $page = file($url . $myarray[$val]); // you can do this with the $_POST/$_REQUEST array too

     // process as normal
}
?>
[/code]

The only other problem you'll have to deal with is what you do with the variables at the end. You'll have to make your $out array a little more dynamic or you'll have to store everything in records in a database or some other storage medium.

Share this post


Link to post
Share on other sites
My plan was to store the variables in a database. But I wanted to see if I could do what I wanted to do here first.

I think you answered my question. foreach looks like it might be what I was looking for. I'll give it a try, thanks!

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.