Jump to content


This topic is now archived and is closed to further replies.


Need string to read msn

Recommended Posts

I'm trying to write a script that will read in a search page from msn and find the index position of a given site. The code works except for one condition of the search.

MSN lists the results in two ways:
- If the searched for phase is in the domain name, then the parts of the url containing the search phase is enclosed in <strong> tags.  So if I did a search for the word cars and my url was www.bestcars.com, the code on MSN would appear as
<li class="first">www.best<strong>cars</strong>.com</li>

- if the searched for phase is not in the url, say ties, then the result appears as <li class="first">www.bestcars.com</li>

The problem is with the second type. Sometimes the found url will contain something after the .com, like:
<li class="first">www.bestcars.com/index.php?cPath=35</li>

I have tried an expresion like this to find that but I can't get it to work:
<li class="first">www.bestcars.com(.*)</li>
I also tried
<li class="first">www.bestcars.com^(.*)$</li>

Does anyone have any idea on what is needed to get this to work. I would appreciate any suggestions.

Share this post

Link to post
Share on other sites
do you have anymore code to show
like the one you're using to do this regex...
would help a lot more

Share this post

Link to post
Share on other sites
The following is the relevant code, I think.

$conditions = sprintf("<li class=\"first\">%s(.*)</li>", $tmpurl);

$file = fopen($filename, "r");
if ($file) {
while (!feof($file)) {
  $var = fgets($file, 1024);
    if (eregi($conditions,$var,$out))

After the above, $out should contain the found entries but doesn't.

Share this post

Link to post
Share on other sites


Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.