Jump to content

[SOLVED] Help using Preg_Match for string starting with S & ending with g


Recommended Posts

Hi again, I don't like to ask for help unless i really need it and after 16 hours of non-stop searching and reading i think it's time to ask the pro's.

 

I would like to grab a text string that begins with the letter s and ends with a g. I am able to grab the site and it's contents , but having a tough time using the preg_match syntax to grab just the specific text. 

 

The s is always the first character and the g is always the last, so do i need to do anything special or can i just use ^s and $g ?

 

Thanks in advance for your help with this.

I think you might need to be a little more specific with regards to what you wish to match, with sample input and expected output.

 

Sure.

I want to search craigslist for a specific item. Once found i want to grab the annonymous email address and send the seller an automated email. e.g. Hi, i am interested in purchasing your vintage record player, etc..

The email starts with an S and ends with a G everytime.

That doesn't exactly provide any more information. But since I like trying to master regex I went to craigslist to see if I could find what you were talking about. I assume you want to capture the Reply To: email address? After checking the source I think something like this should work...

 

preg_match_all('~mailto:([^?]+)~', $input, $out);
echo $out[1][0];

Tried it with this

  <?php  
    function curlURL($url) {  
        $curl = curl_init();  
        curl_setopt($curl, CURLOPT_URL, $url);  
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);  
        curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2');  
        $output = curl_exec($curl);  
            return $output;  
    }  
     
   $curlResults = curlURL("http://southcoast.craigslist.org/tls/1432616932.html");  
   preg_match_all('~mailto:([^?]+)~', $input, $out);
echo $out[1][0];
     
   // Display what we've found  
   var_dump($curlResults);   

But still get all the content

preg_match is not magic, it doesn't see the variable $input and think oh he wants to use the source code that we're in the process of fetching with curl. You need to replace $input with the HTML source code of the site.

 

I don't know a great deal about curl as I've never really used it, but something along the lines of...

 

<?php  
    function curlURL($url) {  
        $curl = curl_init();  
        curl_setopt($curl, CURLOPT_URL, $url);  
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);  
        curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2');  
        $output = curl_exec($curl);  
            return $output;  
    }  
     
   $curlResults = curlURL("http://southcoast.craigslist.org/tls/1432616932.html");  
   preg_match_all('~mailto:([^?]+)~', $curlResults, $out);
   echo $out[1][0];
?>

 

... would make a great deal more sense.

 

preg_match is not magic, it doesn't see the variable $input and think oh he wants to use the source code that we're in the process of fetching with curl. You need to replace $input with the HTML source code of the site.

 

I don't know a great deal about curl as I've never really used it, but something along the lines of...

 

<?php  
    function curlURL($url) {  
        $curl = curl_init();  
        curl_setopt($curl, CURLOPT_URL, $url);  
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);  
        curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2');  
        $output = curl_exec($curl);  
            return $output;  
    }  
     
   $curlResults = curlURL("http://southcoast.craigslist.org/tls/1432616932.html");  
   preg_match_all('~mailto:([^?]+)~', $curlResults, $out);
   echo $out[1][0];
?>

 

... would make a great deal more sense.

 

Works just as i had hoped. I sure wish to get to your level someday soon.

Thanks a bunch, if you have some time today, tomorrow or whenever could you go over the expression characters you used for this so i may better understand what, where and why.

Much love! Thanks man!

Sure...

 

mailto:

1. Is a literal string, it looks for that exact string in the HTML source.

 

2. The open and closing brackets tells preg_match_all to capture the characters matching the expression between them into a seperate subset. If you use print_r to print out the contents of $out, you will see it contains 2 arrays. The first is strings that match the whole pattern (ie has the mailto: part included) the second array is all matches of that sub pattern.

 

[^?]+

3. Basically says fetch 1 or more characters untill you run into a question mark.

 

4. salathe will no doubt turn up, ripping this pattern to shreds and give you a better solution :)

 

Note: Just occured to me you could also probably use '~mailto:\K[^?]+~' as the pattern and use $out[0][0], not sure if theres an advantage or disadvantage to either method.

 

 

4. salathe will no doubt turn up, ripping this pattern to shreds and give you a better solution :)

 

:keep_quiet: No shreds will be ripped in this post.  There is no real reason to, but it's generally nice to, be as explicit as possible with what you want to match (and you can learn what the new bits mean!). To make sure you only grab the anonymous seller email, perhaps extend the expression to look for the pattern the emails share: /mailto:(sale-(?:[a-z0-9]{5}-)?\d{9,10}@craigslist.org)/  That'll give you something to sink your teeth into. For help, check out Regular Expression Quick Start and go from there.  If that's far too over-your-head then how about /mailto:(sale-[^@]+@craigslist.org)/ as a nice intermediate?  :-*

I actually just grabbed O' Reilly's Regular Expressions vol. 2. I'm thinking I may not ready for this stuff yet. I always felt just jumping right in makes or breaks you.

 

What do you mean by shred, ripped and the such? lol?

 

Anyhow, thanks for the help and insight. I think wrapping my head around the array's tends to throw me for a loop at times. I will keep at it as i love everything about programming and the excitement of being able to just about anything you want with it!

And there ~\b(s?he)\b~ is... :)

 

I was orginally going to include the 'sale' and the 'craiglist.org' parts, but I wasn't certain they were constant having never used craigslist before. Same applies to a certain degree with restricting characters and testing lengths. The one part I didn't know was the ?: So I tried looking it up, am I correct in saying that the ?: makes the parentheses a non-captured group?

 

Oh, also I just realised, shouldn't the fullstop in .org be escaped? :)

 

Edit: Well I was simply pointing out that salathe has a habit of replying to threads after me, pointing out flaws and suggesting better alternatives. If only I had somebody to do that for all aspects of my life, I think my life would be a lot easier/better ("no Pete, you don't need another beer, your already wasted. And don't even think about flirting with that girl, it's the beer goggles.", lol ).

And there ~\b(s?he)\b~ is... :)

 

I was orginally going to include the 'sale' and the 'craiglist.org' parts, but I wasn't certain they were constant having never used craigslist before. Same applies to a certain degree with restricting characters and testing lengths. The one part I didn't know was the ?: So I tried looking it up, am I correct in saying that the ?: makes the parentheses a non-captured group?

 

Oh, also I just realised, shouldn't the fullstop in .org be escaped? :)

 

Edit: Well I was simply pointing out that salathe has a habit of replying to threads after me, pointing out flaws and suggesting better alternatives. If only I had somebody to do that for all aspects of my life, I think my life would be a lot easier/better ("no Pete, you don't need another beer, your already wasted. And don't even think about flirting with that girl, it's the beer goggles.", lol ).

 

LMAO! Your a funny dude..

 

As far as i understand, the ? means the preceding character is optional but, your asking ?: and this sequence i am unsure about.

 

This register stuff is pretty complex given all the combination's of syntax that can be used.

However, i will now try grab these emails and add them to a database and then finally auto send them (10 at a time per CL's TOS)

 

Thanks again,

Mike

 

The question mark has multiple meaning depending on where in a pattern it is used. When used directly after any literal character for example it makes it optional. When used after some of the repeat metacharacters such as + and * it makes the pattern lazy (meaning break at the first match, rather than the last). I believe in the case salathe used here it has a whole other meaning.

The question mark has multiple meaning depending on where in a pattern it is used. When used directly after any literal character for example it makes it optional. When used after some of the repeat metacharacters such as + and * it makes the pattern lazy (meaning break at the first match, rather than the last). I believe in the case salathe used here it has a whole other meaning.

 

Let me ask you this:

Do you find it easy to get work programming knowing what you know?

Why do i get this error Parse error: syntax error, unexpected $end in C:\xampp\xampp\htdocs\test3.php on line 30 if all i'm trying to do is input the variable into a created database table.

 

 <?php  
    function curlURL($url) {  
        $curl = curl_init();  
        curl_setopt($curl, CURLOPT_URL, $url);  
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);  
        curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2');  
        $output = curl_exec($curl);  
            return $output;  
    }  
     
   $curlResults = curlURL("http://southcoast.craigslist.org/tls/1432616932.html");  
   preg_match_all('~mailto:([^?]+)~', $curlResults, $out);
   echo $out[1][0];

   $out = $v;
   
   $dbx= mysql_connect("localhost", "root", "");   //include before any database implematation
         if (!$dbx)
         {
         die('Could not connect: ' . mysql_error());
         }

         mysql_SELECT_db("Email", $dbx);
         mysql_Query("INSERT INTO holder ($v));
        
         mysql_close($dbx);
         


         ?>  

mysql_Query("INSERT INTO holder ($v));

 

Is missing both the closing qutoes for the string and the mysql keyword VALUES. More correct syntax would be....

 

mysql_Query("INSERT INTO holder (column_name) VALUES ($v)");

 

Let me ask you this:

Do you find it easy to get work programming knowing what you know?

 

I'm unemployed.

Yes, me 2.

 

Ok so i have this close, but i can't seem to post this address to my database.

What am i doing wrong here?

 

 <?php  
    function curlURL($url) {  
        $curl = curl_init();  
        curl_setopt($curl, CURLOPT_URL, $url);  
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);  
        curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2');  
        $output = curl_exec($curl);  
            return $output;  
    }  
     
   $curlResults = curlURL("http://southcoast.craigslist.org/tls/1432616932.html");  
   preg_match_all('~mailto:([^?]+)~', $curlResults, $out);
   echo $out[1][0];

   
   $out = $v;
   $dbx= mysql_connect("localhost", "root", "");   //include before any database implematation
         if (!$dbx)
         {
         die('Could not connect: ' . mysql_error());
         }

         mysql_SELECT_db("Email", $dbx);
         mysql_Query("INSERT INTO address (holder) VALUES ($v)");
        
         mysql_close($dbx);
         


         ?>  

You are setting $out = to $v, which is NULL as you don't declared $v at any point before that. If you wish to insert the address that you are echo'ing out, you would need to be using $v = $out[1][0].

You are setting $out = to $v, which is NULL as you don't declared $v at any point before that. If you wish to insert the address that you are echo'ing out, you would need to be using $v = $out[1][0].

So $out is an array[1][0] ?

I thought [1] held the results of the preg_match_all and [0] was the finished part "the address"

The preg_match_all function returns a 2 dimensional array. The first array ie $out[0] contains an array of complete matches. Each index after that contains another array which contains another capture group, ie the sets captured by the parentheses.

The preg_match_all function returns a 2 dimensional array. The first array ie $out[0] contains an array of complete matches. Each index after that contains another array which contains another capture group, ie the sets captured by the parentheses.

 

I still can't pass the address to database table address column holder

 

<?php  
    function curlURL($url) {  
        $curl = curl_init();  
        curl_setopt($curl, CURLOPT_URL, $url);  
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);  
        curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2');  
        $output = curl_exec($curl);  
            return $output;  
    }  
     
   $curlResults = curlURL("http://southcoast.craigslist.org/tls/1432616932.html");  
   preg_match_all('~mailto:([^?]+)~', $curlResults, $out);
   echo $out[1][0];

   
         $v= $out[1][0];
   $dbx= mysql_connect("localhost", "root", "");   //include before any database implematation
         if (!$dbx)
         {
         die('Could not connect: ' . mysql_error());
         }

         mysql_SELECT_db("Email", $dbx);
         mysql_Query("INSERT INTO address (holder) VALUES ($v)");
        
         mysql_close($dbx);
         


         ?>  

Oops, $v is a string so it must be treated as such, it should be passed with quotes around it. BTW if you use a method similar to this you should be able to debug your own problems to a greater degree.

 

$sql = "INSERT INTO address (holder) VALUES ('$v')";
mysql_query($sql) or trigger_error("SQL: $sql, ERROR: " . mysql_error(), E_USER_ERROR);

Oops, $v is a string so it must be treated as such, it should be passed with quotes around it. BTW if you use a method similar to this you should be able to debug your own problems to a greater degree.

 

$sql = "INSERT INTO address (holder) VALUES ('$v')";
mysql_query($sql) or trigger_error("SQL: $sql, ERROR: " . mysql_error(), E_USER_ERROR);

 

Thanks i'm on that now.  I know i asked you a similar question last night, but you must be offering your coding experience as a service, no?

[quote author=cags link=topic :-* :-*99.msg1294811#msg1294811 date=1256313125]

Oops, $v is a string so it must be treated as such, it should be passed with quotes around it. BTW if you use a method similar to this you should be able to debug your own problems to a greater degree.

 

$sql = "INSERT INTO address (holder) VALUES ('$v')";
mysql_query($sql) or trigger_error("SQL: $sql, ERROR: " . mysql_error(), E_USER_ERROR);

 

Yep, that did it.

Great now i have working code to learn from. Excellent :-* lol

 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.