Jump to content

Web Scrapin works, but when i try to use with form, it displays blank??


Modernvox

Recommended Posts

Hi Guyz.

 

I have  a working web scraping script that returns musicians ads from CL. The problem is I have implemented a US States html form and depending on the users choice of State it is suppose to return the corresponding ads.

Instead it keeps displaying a blank white page?

 

Here is the html form:

<form action="test.php" method="POST">

 

<select name="state">

<option value="AL">Alabama</option>

<option value="AK">Alaska</option>

<option value="AZ">Arizona</option>

<option value="AR">Arkansas</option>

<option value="CA">California</option>

<option value="CO">Colorado</option>

<option value="CT">Connecticut</option>

<option value="DE">Delaware</option>

<option value="DC">District of Columbia</option>

<option value="FL">Florida</option>

<option value="GA">Georgia</option>

<option value="HI">Hawaii</option>

<option value="ID">Idaho</option>

<option value="IL">Illinois</option>

<option value="IN">Indiana</option>

<option value="IA">Iowa</option>

<option value="KS">Kansas</option>

<option value="KY">Kentucky</option>

<option value="LA">Louisiana</option>

<option value="ME">Maine</option>

<option value="MD">Maryland</option>

<option value="MA">Massachusetts</option>

<option value="MI">Michigan</option>

<option value="MN">Minnesota</option>

<option value="MS">Mississippi</option>

<option value="MO">Missouri</option>

<option value="MT">Montana</option>

<option value="NE">Nebraska</option>

<option value="NV">Nevada</option>

<option value="NH">New Hampshire</option>

<option value="NJ">New Jersey</option>

<option value="NM">New Mexico</option>

<option value="NY">New York</option>

<option value="NC">North Carolina</option>

<option value="ND">North Dakota</option>

<option value="OH">Ohio</option>

<option value="OK">Oklahoma</option>

<option value="OR">Oregon</option>

<option value="PA">Pennsylvania</option>

<option value="RI">Rhode Island</option>

<option value="SC">South Carolina</option>

<option value="SD">South Dakota</option>

<option value="TN">Tennessee</option>

<option value="TX">Texas</option>

<option value="UT">Utah</option>

<option value="VT">Vermont</option>

<option value="VA">Virginia</option>

<option value="WA">Washington</option>

<option value="WV">West Virginia</option>

<option value="WI">Wisconsin</option>

<option value="WY">Wyoming</option>

</select>

<input type="submit" value="submit" name="submit"><br />

</form><br />

 

Here is the php code:

 <?php
if(isset($_POST['submit'])) 
$st = $_post['state'];

if ($st == "AL")
{
$url = "http://southcoast.craigslist.org";
$html = file_get_contents("$url/muc/");

preg_match_all('/<a href="([^"]+)">([^<]+)<\/a><font size="-1">([^"]+)<\/font>/s', $html,$posts,PREG_SET_ORDER);
//echo "<pre>";print_r($posts);

}
foreach ($posts as $post)
{
    //print $post[0]; //HTML
    $post[2] = str_ireplace($url,"",$post[2]); //remove domain
    echo "<a href=\"$url{$post[1]}\">{$post[2]}<\/a><font size=\"-1\">{$post[3]}<\/font>";
    print "<BR />\n";

}

?> 

 

As always thanks for your time answering questions.

Do a little error checking..

preg_match_all can return an error.. You should check this aswell as the size of $posts;

 

Something like this.

<?php
$preg = preg_match_all('/<a href="([^"]+)">([^<]+)<\/a><font size="-1">([^"]+)<\/font>/s', $html,$posts,PREG_SET_ORDER);

if ($preg !== false && count($posts) != 0) {
//do the loop//
} else {
echo 'Preg match problem';
var_dump($preg);
}
?>

try:

if(isset($_POST['submit'])) 
$st = $_post['state'];

 

into

if(isset($_POST['submit'])) 
$st = $_POST['state'];

 

and if not then try

if(isset($_GET['submit'])) 
$st = $_GET['state'];

 

the second is from experience when forms send to other page.

The script works alone. However, I am trying to use the script in a if statement and it keeps displaying a blank page.

If I change anything it then gives an invalid foreach argument.

 

Could it be the if statement is the wrong way to do this?

A blank page could mean a PHP syntax error.. Have you enabled error reporting?

error_reporting(E_ALL);
ini_set('display_errors',1);

 

And RaythMistwalker:

POST and GET have nothing to do with the method of a form.. and it doesnt matter where your posting it to.. the action attribute is completely irrelevant when it comes to accessing the values.

 

If method="POST" PHP accesses the form values in $_POST

If method="GET" PHP accesses the form values in $_GET

Ok...What is in the $html variable?

Do you try my preg_match_all checking method?

 

$html just holds the muc pertaining to musicians ads

example:  craigslist SouthCoast homepage is http://southcoast.craigslist.org   

Musicians ads are located at http://southcoast.craigslist.org/muc/

 

This is the same for all musicians ads for each location

Can you show us the whole PHP script..

 

 <?php
$url = "http://southcoast.craigslist.org";
$html = file_get_contents("$url/muc/");

preg_match_all('/<a href="([^"]+)">([^<]+)<\/a><font size="-1">([^"]+)<\/font>/s', $html,$posts,PREG_SET_ORDER);
//echo "<pre>";print_r($posts);
foreach ($posts as $post)
{
    //print $post[0]; //HTML
    $post[2] = str_ireplace($url,"",$post[2]); //remove domain
    echo "<a href=\"$url{$post[1]}\">{$post[2]}<\/a><font size=\"-1\">{$post[3]}<\/font>";
    print "<BR />\n";
}
?> 

That's it for this step

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.