Jump to content

Regular Expression Question


kevinkhan

Recommended Posts

Hi Everyone..

 

I want to extract the title an link in the following html file but only if the location is london

 

 

<li >
  <ul class="search-result" id="AdvertRow12">
    <li class="vehicle-images"><a href="http:\\www.mydomain.com" title="9 photos of Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !"><span>9</span></a></li>                                 
    <li class="vehicle-make-model"><a title="Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !" href="http://www.mydomain.com/search/Alfa-Romeo/147/Diesel-J/200938195255441/advert?channel=CARS">Alfa Romeo 147 Diesel JTDM …</a></li>                  
    <li class="vehicle-approved"><img src="http://images.mydomain.com/dealer-resource/programme/20x20/keary.gif" /></li>                  
    <li class="vehicle-year">2009</li>               
    <li class="vehicle-seller"><span class="dealer-simi">Dealer</span></li>                              
    <li class="vehicle-location"><span title="Kearys Lexus">London</span></li>         
    <li class="vehicle-mileage">2</li>
    <li class="vehicle-colour"><span title="Metallic Grey" class="grey"><em>Grey</em></span></li>
    <li class="vehicle-engine">1.9</li> 
    <li class="vehicle-price">€20,900</li>
  </ul>
</li>

<li >
  <ul class="search-result" id="AdvertRow12">
    <li class="vehicle-images"><a href="http:\\www.mydomain.com" title="9 photos of Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !"><span>9</span></a></li>                                 
    <li class="vehicle-make-model"><a title="Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !" href="http://www.mydomain.com/search/Alfa-Romeo/147/Diesel-J/200938195255441/advert?channel=CARS">Alfa Romeo 147 Diesel JTDM …</a></li>                  
    <li class="vehicle-approved"><img src="http://images.mydomain.com/dealer-resource/programme/20x20/keary.gif" /></li>                  
    <li class="vehicle-year">2009</li>               
    <li class="vehicle-seller"><span class="dealer-simi">Dealer</span></li>                              
    <li class="vehicle-location"><span title="Kearys Lexus">Leeds</span></li>         
    <li class="vehicle-mileage">2</li>
    <li class="vehicle-colour"><span title="Metallic Grey" class="grey"><em>Grey</em></span></li>
    <li class="vehicle-engine">1.9</li> 
    <li class="vehicle-price">€20,900</li>
  </ul>
</li>

<li >
  <ul class="search-result" id="AdvertRow12">
    <li class="vehicle-images"><a href="http:\\www.mydomain.com" title="9 photos of Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !"><span>9</span></a></li>                                 
    <li class="vehicle-make-model"><a title="Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !" href="http://www.mydomain.com/search/Alfa-Romeo/147/Diesel-J/200938195255441/advert?channel=CARS">Alfa Romeo 147 Diesel JTDM …</a></li>                  
    <li class="vehicle-approved"><img src="http://images.mydomain.com/dealer-resource/programme/20x20/keary.gif" /></li>                  
    <li class="vehicle-year">2009</li>               
    <li class="vehicle-seller"><span class="dealer-simi">Dealer</span></li>                              
    <li class="vehicle-location"><span title="Kearys Lexus">London</span></li>         
    <li class="vehicle-mileage">2</li>
    <li class="vehicle-colour"><span title="Metallic Grey" class="grey"><em>Grey</em></span></li>
    <li class="vehicle-engine">1.9</li> 
    <li class="vehicle-price">€20,900</li>
  </ul>
</li>


<li >
  <ul class="search-result" id="AdvertRow12">
    <li class="vehicle-images"><a href="http:\\www.mydomain.com" title="9 photos of Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !"><span>9</span></a></li>                                 
    <li class="vehicle-make-model"><a title="Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !" href="http://www.mydomain.com/search/Alfa-Romeo/147/Diesel-J/200938195255441/advert?channel=CARS">Alfa Romeo 147 Diesel JTDM …</a></li>                  
    <li class="vehicle-approved"><img src="http://images.mydomain.com/dealer-resource/programme/20x20/keary.gif" /></li>                  
    <li class="vehicle-year">2009</li>               
    <li class="vehicle-seller"><span class="dealer-simi">Dealer</span></li>                              
    <li class="vehicle-location"><span title="Kearys Lexus">Leeds</span></li>         
    <li class="vehicle-mileage">2</li>
    <li class="vehicle-colour"><span title="Metallic Grey" class="grey"><em>Grey</em></span></li>
    <li class="vehicle-engine">1.9</li> 
    <li class="vehicle-price">€20,900</li>
  </ul>
</li>

 

I set up this php code but not sure how i can construct the regular expression. Can anyone help me out please..

 


$strURL = $_POST["crawlUrl"];

function getMatches($strMatch,$strContent) 
  {
	if(preg_match_all($strMatch,$strContent,$objMatches))
    {
		return $objMatches;
	}
	return "";
}


$strContent = @file_get_contents($strListingUrl);
		    
		$strListMatches = '!<div class="title">(.*)</div>!isU';
		$objListMatches = getMatches($strListMatches,$strContent);	


 

Link to comment
Share on other sites

An example using DOM/Xpath:

 

$html = <<<EOF
<li >
  <ul class="search-result" id="AdvertRow12">
    <li class="vehicle-images"><a href="http:\\www.mydomain.com" title="9 photos of Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !"><span>9</span></a></li>
    <li class="vehicle-make-model"><a title="Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !" href="http://www.mydomain.com/search/Alfa-Romeo/147/Diesel-J/200938195255441/advert?channel=CARS">Alfa Romeo 147 Diesel JTDM …</a></li>
    <li class="vehicle-approved"><img src="http://images.mydomain.com/dealer-resource/programme/20x20/keary.gif" /></li>
    <li class="vehicle-year">2009</li>
    <li class="vehicle-seller"><span class="dealer-simi">Dealer</span></li>
    <li class="vehicle-location"><span title="Kearys Lexus">London</span></li>
    <li class="vehicle-mileage">2</li>
    <li class="vehicle-colour"><span title="Metallic Grey" class="grey"><em>Grey</em></span></li>
    <li class="vehicle-engine">1.9</li>
    <li class="vehicle-price">€20,900</li>
  </ul>
</li>


EOF;

$dom = new DOMDocument;
@$dom->loadHTML($html); // change loadHTML to loadHTMLFile and replace $html with complete url in quotes for a live site
$xpath = new DOMXPath($dom);
$tag = $xpath->query('//ul/li/span[@title and text()="London"]');
foreach ($tag as $val) {
    echo $val->getAttribute('title') . ' - ' . $val->nodeValue . "<br />\n";
}

 

I know you are just starting out with PHP (we all start somewhere, don't we?) But I would definitely advise taking some time to research specifics of what you are looking for (I'm finding that many new comers in general seem more prone to asking for complete solutions instead - nothing wrong with asking for help mind you). Posting what you have attempted from what you have researched is not only a learning experience for you, but provides people in the forums with insight as to your thought process (as well as dispels any impressions of laziness).

 

In any case, aside from the link Maq provided, you can also read up about xpath here as well. And of course, looking up DOM / XPath on google will provide additional links as well. Like everything else, it simply takes some time and effort to grasp it. The solution provided should be enough to provide a spring board to work off of :)

Link to comment
Share on other sites

Side note: In the event there might be spaces surrounding (or on either side of) "London" like such:

 

<li class="vehicle-location"><span title="Kearys Lexus"> London  </span></li>

 

You can change the xpath query line to:

$tag = $xpath->query('//ul/li/span[@title and normalize-space(text())="London"]');

 

The normalize-space acts like trim in simple layman terms and removes any spaces that might otherwise throw off a search, which tends to look for stuff verbatim.

Link to comment
Share on other sites

The normalize-space acts like trim() in simple layman terms and removes any spaces that might otherwise throw off a search, which tends to look for stuff verbatim.

 

I believe the normalize-space() method will also replace 2 or more spaces (even inside the string) and replace them with a single space.  At least this is true in XSLT.

Link to comment
Share on other sites

I think a better solution would be to use the DOMDocument class.  With this approach you can extract what you need and use XPath to specify the exact nodes you want to filter out.

 

does this class require any installastion reconfiguration ? importation linking ? or is it freely available in the environment

 

If I'm not mistaken, I think DOM/XPath is part of the PHP core (looking up domxpath in the PHP manual will link to the dom in general, which has no installation/configurations required).

Link to comment
Share on other sites

I think a better solution would be to use the DOMDocument class.  With this approach you can extract what you need and use XPath to specify the exact nodes you want to filter out.

 

does this class require any installastion reconfiguration ? importation linking ? or is it freely available in the environment

 

It's "freely in the environment".

Link to comment
Share on other sites

The normalize-space acts like trim() in simple layman terms and removes any spaces that might otherwise throw off a search, which tends to look for stuff verbatim.

 

I believe the normalize-space() method will also replace 2 or more spaces (even inside the string) and replace them with a single space.  At least this is true in XSLT.

 

You're right.. so perhaps using trim() wasn't the best example.  :tease-03: So let me rephrase.. normalize-space removes spaces (edit, sorry bout that).

Link to comment
Share on other sites

i know you are but what am i,

 

You're a manchild.

 

you could try spliting the document into an array the good old fashioned way

 

1) Why would he try that when we already have a proper working solution?

2) I think you meant to say, "the good old inefficient way".

3) Why in the world would you use an array when they made the DOM class and XPath for this specific reason?  Come on Nadeem, that's just silly.

Link to comment
Share on other sites

i know you are but what am i,

 

You're a manchild.

 

you could try spliting the document into an array the good old fashioned way

 

1) Why would he try that when we already have a proper working solution?

2) I think you meant to say, "the good old inefficient way".

3) Why in the world would you use an array when they made the DOM class and XPath for this specific reason?  Come on Nadeem, that's just silly.

 

dont be so narrow minded he could probably find it more interesting, manchild lol

Link to comment
Share on other sites

i know you are but what am i,

 

You're a manchild.

 

you could try spliting the document into an array the good old fashioned way

 

1) Why would he try that when we already have a proper working solution?

2) I think you meant to say, "the good old inefficient way".

3) Why in the world would you use an array when they made the DOM class and XPath for this specific reason?  Come on Nadeem, that's just silly.

 

dont be so narrow minded he could probably find it more interesting, manchild lol

 

Coming from someone of your caliber, lack of skill and grammar, I could care less what you think.  freal

Link to comment
Share on other sites

i know you are but what am i,

 

You're a manchild.

 

you could try spliting the document into an array the good old fashioned way

 

1) Why would he try that when we already have a proper working solution?

2) I think you meant to say, "the good old inefficient way".

3) Why in the world would you use an array when they made the DOM class and XPath for this specific reason?  Come on Nadeem, that's just silly.

 

dont be so narrow minded he could probably find it more interesting, manchild lol

 

Coming from someone of your caliber, lack of skill and grammar, I could care less what you think.  freal

 

your an A.P

Link to comment
Share on other sites

You're right.. So let me rephrase.. normalize-space removes spaces

Well, it does but also some other whitespace characters and only in certain circumstances. Just for reference, for the OP or whoever might want to know, it:

 

1. Replaces each carriage return (\r), line feed (\n), and tab (\t) character with a single space (\x20)

2. Collapses all consecutive spaces into a single space

3. Removes all leading and trailing spaces

 

 

P.S. Wow, troll much?  :-\

 

Link to comment
Share on other sites

P.S. Wow' date=' troll much?  :-\[/quote']

 

I already fed him, and he's still hungry.  Must be a fat troll.

 

hey listen you started it with your insult, i dont come here to be insulted , you can clearly see my physique in the picture

 

Relax, it's a metaphor.

 

And no it did not start with my insult, because I didn't insult you.  Everything I said was and is true.

Link to comment
Share on other sites

your an A.P

 

I agree, Maq does appear to be an Actual Programmer, unlike some others.

 

why is it that when somone says somthing, you just repeat it in the same thread, you done that on a few posts now, knock it off, its not gona get you a job, you even done it on my answers in my last help to somone, it was getting annoying but i let you go, now im confronting you, stop your foolishness, its like having a mirror

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.