Jump to content

array help with page scrape


dingles

Recommended Posts

I am able to scrape a single page for html data. I'm looking to be able to scrape multiple pages in the same script and store them as variables.

For example. The link to page one would be [a href=\"http://somepage.jsp?pid=10\" target=\"_blank\"]http://somepage.jsp?pid=10[/a] and the link to the second page would be [a href=\"http://somepage.jsp?pid=11\" target=\"_blank\"]http://somepage.jsp?pid=11[/a]

Here is the code that works with scraping data from the first page and saving information to variables:

[code]<?php

// the url I need to fetch for the user information
$url = "http://somepage.jsp?pid=";

//Append the users pid onto the URL and read the page into the $page as an array
$page = file($url . $_POST['10']);
if (!is_array($page))
{
    echo "No Array found";
    exit;
}

//Put the page in the $page variable
$page = trim(implode("\n", $page));

//Some variables to store things
$variable1 = "";
$variable2 = "";
$variable3 = "";
$variable4 = "";


//The pattern to scrape
$pattern = "/<td align=\"right\">(.*)<\/td>\n/i";
if (preg_match_all($pattern, $page, $out, PREG_PATTERN_ORDER))
{
    $variable1 = $out[1][0];
    $variable2 = $out[1][3];
    $variable3 = $out[1][4];
    $variable4 = $out[1][5];
}

?> [/code]


Is it simple to add something so I can store info for PID=11 or should I run 2 seperate scripts?
Link to comment
Share on other sites

Well, the question would be how are you passing the value to the page to decide which file you want to read?

Gaia is mostly right, but we have to know the above before we proceed. If this is hardcoded, and you put them in an array:
[code]
<?php
$myarray = array('10', '11', '12');

foreach($myarray as $key => $val)
{
    $page = file($url . $myarray[$val]); // you can do this with the $_POST/$_REQUEST array too

     // process as normal
}
?>
[/code]

The only other problem you'll have to deal with is what you do with the variables at the end. You'll have to make your $out array a little more dynamic or you'll have to store everything in records in a database or some other storage medium.
Link to comment
Share on other sites

My plan was to store the variables in a database. But I wanted to see if I could do what I wanted to do here first.

I think you answered my question. foreach looks like it might be what I was looking for. I'll give it a try, thanks!
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.