Jump to content

Web Scrapin works, but when i try to use with form, it displays blank??


Modernvox

Recommended Posts

Hi Guyz.

 

I have  a working web scraping script that returns musicians ads from CL. The problem is I have implemented a US States html form and depending on the users choice of State it is suppose to return the corresponding ads.

Instead it keeps displaying a blank white page?

 

Here is the html form:

<form action="test.php" method="POST">

 

<select name="state">

<option value="AL">Alabama</option>

<option value="AK">Alaska</option>

<option value="AZ">Arizona</option>

<option value="AR">Arkansas</option>

<option value="CA">California</option>

<option value="CO">Colorado</option>

<option value="CT">Connecticut</option>

<option value="DE">Delaware</option>

<option value="DC">District of Columbia</option>

<option value="FL">Florida</option>

<option value="GA">Georgia</option>

<option value="HI">Hawaii</option>

<option value="ID">Idaho</option>

<option value="IL">Illinois</option>

<option value="IN">Indiana</option>

<option value="IA">Iowa</option>

<option value="KS">Kansas</option>

<option value="KY">Kentucky</option>

<option value="LA">Louisiana</option>

<option value="ME">Maine</option>

<option value="MD">Maryland</option>

<option value="MA">Massachusetts</option>

<option value="MI">Michigan</option>

<option value="MN">Minnesota</option>

<option value="MS">Mississippi</option>

<option value="MO">Missouri</option>

<option value="MT">Montana</option>

<option value="NE">Nebraska</option>

<option value="NV">Nevada</option>

<option value="NH">New Hampshire</option>

<option value="NJ">New Jersey</option>

<option value="NM">New Mexico</option>

<option value="NY">New York</option>

<option value="NC">North Carolina</option>

<option value="ND">North Dakota</option>

<option value="OH">Ohio</option>

<option value="OK">Oklahoma</option>

<option value="OR">Oregon</option>

<option value="PA">Pennsylvania</option>

<option value="RI">Rhode Island</option>

<option value="SC">South Carolina</option>

<option value="SD">South Dakota</option>

<option value="TN">Tennessee</option>

<option value="TX">Texas</option>

<option value="UT">Utah</option>

<option value="VT">Vermont</option>

<option value="VA">Virginia</option>

<option value="WA">Washington</option>

<option value="WV">West Virginia</option>

<option value="WI">Wisconsin</option>

<option value="WY">Wyoming</option>

</select>

<input type="submit" value="submit" name="submit"><br />

</form><br />

 

Here is the php code:

 <?php
if(isset($_POST['submit'])) 
$st = $_post['state'];

if ($st == "AL")
{
$url = "http://southcoast.craigslist.org";
$html = file_get_contents("$url/muc/");

preg_match_all('/<a href="([^"]+)">([^<]+)<\/a><font size="-1">([^"]+)<\/font>/s', $html,$posts,PREG_SET_ORDER);
//echo "<pre>";print_r($posts);

}
foreach ($posts as $post)
{
    //print $post[0]; //HTML
    $post[2] = str_ireplace($url,"",$post[2]); //remove domain
    echo "<a href=\"$url{$post[1]}\">{$post[2]}<\/a><font size=\"-1\">{$post[3]}<\/font>";
    print "<BR />\n";

}

?> 

 

As always thanks for your time answering questions.

Link to comment
Share on other sites

Do a little error checking..

preg_match_all can return an error.. You should check this aswell as the size of $posts;

 

Something like this.

<?php
$preg = preg_match_all('/<a href="([^"]+)">([^<]+)<\/a><font size="-1">([^"]+)<\/font>/s', $html,$posts,PREG_SET_ORDER);

if ($preg !== false && count($posts) != 0) {
//do the loop//
} else {
echo 'Preg match problem';
var_dump($preg);
}
?>

Link to comment
Share on other sites

A blank page could mean a PHP syntax error.. Have you enabled error reporting?

error_reporting(E_ALL);
ini_set('display_errors',1);

 

And RaythMistwalker:

POST and GET have nothing to do with the method of a form.. and it doesnt matter where your posting it to.. the action attribute is completely irrelevant when it comes to accessing the values.

 

If method="POST" PHP accesses the form values in $_POST

If method="GET" PHP accesses the form values in $_GET

Link to comment
Share on other sites

Ok...What is in the $html variable?

Do you try my preg_match_all checking method?

 

$html just holds the muc pertaining to musicians ads

example:  craigslist SouthCoast homepage is http://southcoast.craigslist.org   

Musicians ads are located at http://southcoast.craigslist.org/muc/

 

This is the same for all musicians ads for each location

Link to comment
Share on other sites

Can you show us the whole PHP script..

 

 <?php
$url = "http://southcoast.craigslist.org";
$html = file_get_contents("$url/muc/");

preg_match_all('/<a href="([^"]+)">([^<]+)<\/a><font size="-1">([^"]+)<\/font>/s', $html,$posts,PREG_SET_ORDER);
//echo "<pre>";print_r($posts);
foreach ($posts as $post)
{
    //print $post[0]; //HTML
    $post[2] = str_ireplace($url,"",$post[2]); //remove domain
    echo "<a href=\"$url{$post[1]}\">{$post[2]}<\/a><font size=\"-1\">{$post[3]}<\/font>";
    print "<BR />\n";
}
?> 

That's it for this step

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.