kevinkhan Posted October 28, 2009 Share Posted October 28, 2009 Hi Everyone.. I want to extract the title an link in the following html file but only if the location is london <li > <ul class="search-result" id="AdvertRow12"> <li class="vehicle-images"><a href="http:\\www.mydomain.com" title="9 photos of Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !"><span>9</span></a></li> <li class="vehicle-make-model"><a title="Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !" href="http://www.mydomain.com/search/Alfa-Romeo/147/Diesel-J/200938195255441/advert?channel=CARS">Alfa Romeo 147 Diesel JTDM …</a></li> <li class="vehicle-approved"><img src="http://images.mydomain.com/dealer-resource/programme/20x20/keary.gif" /></li> <li class="vehicle-year">2009</li> <li class="vehicle-seller"><span class="dealer-simi">Dealer</span></li> <li class="vehicle-location"><span title="Kearys Lexus">London</span></li> <li class="vehicle-mileage">2</li> <li class="vehicle-colour"><span title="Metallic Grey" class="grey"><em>Grey</em></span></li> <li class="vehicle-engine">1.9</li> <li class="vehicle-price">€20,900</li> </ul> </li> <li > <ul class="search-result" id="AdvertRow12"> <li class="vehicle-images"><a href="http:\\www.mydomain.com" title="9 photos of Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !"><span>9</span></a></li> <li class="vehicle-make-model"><a title="Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !" href="http://www.mydomain.com/search/Alfa-Romeo/147/Diesel-J/200938195255441/advert?channel=CARS">Alfa Romeo 147 Diesel JTDM …</a></li> <li class="vehicle-approved"><img src="http://images.mydomain.com/dealer-resource/programme/20x20/keary.gif" /></li> <li class="vehicle-year">2009</li> <li class="vehicle-seller"><span class="dealer-simi">Dealer</span></li> <li class="vehicle-location"><span title="Kearys Lexus">Leeds</span></li> <li class="vehicle-mileage">2</li> <li class="vehicle-colour"><span title="Metallic Grey" class="grey"><em>Grey</em></span></li> <li class="vehicle-engine">1.9</li> <li class="vehicle-price">€20,900</li> </ul> </li> <li > <ul class="search-result" id="AdvertRow12"> <li class="vehicle-images"><a href="http:\\www.mydomain.com" title="9 photos of Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !"><span>9</span></a></li> <li class="vehicle-make-model"><a title="Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !" href="http://www.mydomain.com/search/Alfa-Romeo/147/Diesel-J/200938195255441/advert?channel=CARS">Alfa Romeo 147 Diesel JTDM …</a></li> <li class="vehicle-approved"><img src="http://images.mydomain.com/dealer-resource/programme/20x20/keary.gif" /></li> <li class="vehicle-year">2009</li> <li class="vehicle-seller"><span class="dealer-simi">Dealer</span></li> <li class="vehicle-location"><span title="Kearys Lexus">London</span></li> <li class="vehicle-mileage">2</li> <li class="vehicle-colour"><span title="Metallic Grey" class="grey"><em>Grey</em></span></li> <li class="vehicle-engine">1.9</li> <li class="vehicle-price">€20,900</li> </ul> </li> <li > <ul class="search-result" id="AdvertRow12"> <li class="vehicle-images"><a href="http:\\www.mydomain.com" title="9 photos of Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !"><span>9</span></a></li> <li class="vehicle-make-model"><a title="Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !" href="http://www.mydomain.com/search/Alfa-Romeo/147/Diesel-J/200938195255441/advert?channel=CARS">Alfa Romeo 147 Diesel JTDM …</a></li> <li class="vehicle-approved"><img src="http://images.mydomain.com/dealer-resource/programme/20x20/keary.gif" /></li> <li class="vehicle-year">2009</li> <li class="vehicle-seller"><span class="dealer-simi">Dealer</span></li> <li class="vehicle-location"><span title="Kearys Lexus">Leeds</span></li> <li class="vehicle-mileage">2</li> <li class="vehicle-colour"><span title="Metallic Grey" class="grey"><em>Grey</em></span></li> <li class="vehicle-engine">1.9</li> <li class="vehicle-price">€20,900</li> </ul> </li> I set up this php code but not sure how i can construct the regular expression. Can anyone help me out please.. $strURL = $_POST["crawlUrl"]; function getMatches($strMatch,$strContent) { if(preg_match_all($strMatch,$strContent,$objMatches)) { return $objMatches; } return ""; } $strContent = @file_get_contents($strListingUrl); $strListMatches = '!<div class="title">(.*)</div>!isU'; $objListMatches = getMatches($strListMatches,$strContent); Link to comment Share on other sites More sharing options...
Maq Posted October 28, 2009 Share Posted October 28, 2009 I think a better solution would be to use the DOMDocument class. With this approach you can extract what you need and use XPath to specify the exact nodes you want to filter out. Link to comment Share on other sites More sharing options...
kevinkhan Posted October 28, 2009 Author Share Posted October 28, 2009 Would you be able to give a small simple example Im only at the beginning stages of learning php... Link to comment Share on other sites More sharing options...
nrg_alpha Posted October 29, 2009 Share Posted October 29, 2009 An example using DOM/Xpath: $html = <<<EOF <li > <ul class="search-result" id="AdvertRow12"> <li class="vehicle-images"><a href="http:\\www.mydomain.com" title="9 photos of Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !"><span>9</span></a></li> <li class="vehicle-make-model"><a title="Alfa Romeo 147 Diesel JTDM 120bhp No Mileage !" href="http://www.mydomain.com/search/Alfa-Romeo/147/Diesel-J/200938195255441/advert?channel=CARS">Alfa Romeo 147 Diesel JTDM …</a></li> <li class="vehicle-approved"><img src="http://images.mydomain.com/dealer-resource/programme/20x20/keary.gif" /></li> <li class="vehicle-year">2009</li> <li class="vehicle-seller"><span class="dealer-simi">Dealer</span></li> <li class="vehicle-location"><span title="Kearys Lexus">London</span></li> <li class="vehicle-mileage">2</li> <li class="vehicle-colour"><span title="Metallic Grey" class="grey"><em>Grey</em></span></li> <li class="vehicle-engine">1.9</li> <li class="vehicle-price">€20,900</li> </ul> </li> EOF; $dom = new DOMDocument; @$dom->loadHTML($html); // change loadHTML to loadHTMLFile and replace $html with complete url in quotes for a live site $xpath = new DOMXPath($dom); $tag = $xpath->query('//ul/li/span[@title and text()="London"]'); foreach ($tag as $val) { echo $val->getAttribute('title') . ' - ' . $val->nodeValue . "<br />\n"; } I know you are just starting out with PHP (we all start somewhere, don't we?) But I would definitely advise taking some time to research specifics of what you are looking for (I'm finding that many new comers in general seem more prone to asking for complete solutions instead - nothing wrong with asking for help mind you). Posting what you have attempted from what you have researched is not only a learning experience for you, but provides people in the forums with insight as to your thought process (as well as dispels any impressions of laziness). In any case, aside from the link Maq provided, you can also read up about xpath here as well. And of course, looking up DOM / XPath on google will provide additional links as well. Like everything else, it simply takes some time and effort to grasp it. The solution provided should be enough to provide a spring board to work off of Link to comment Share on other sites More sharing options...
nrg_alpha Posted October 29, 2009 Share Posted October 29, 2009 Side note: In the event there might be spaces surrounding (or on either side of) "London" like such: <li class="vehicle-location"><span title="Kearys Lexus"> London </span></li> You can change the xpath query line to: $tag = $xpath->query('//ul/li/span[@title and normalize-space(text())="London"]'); The normalize-space acts like trim in simple layman terms and removes any spaces that might otherwise throw off a search, which tends to look for stuff verbatim. Link to comment Share on other sites More sharing options...
nadeemshafi9 Posted October 29, 2009 Share Posted October 29, 2009 I think a better solution would be to use the DOMDocument class. With this approach you can extract what you need and use XPath to specify the exact nodes you want to filter out. does this class require any installastion reconfiguration ? importation linking ? or is it freely available in the environment Link to comment Share on other sites More sharing options...
Maq Posted October 29, 2009 Share Posted October 29, 2009 The normalize-space acts like trim() in simple layman terms and removes any spaces that might otherwise throw off a search, which tends to look for stuff verbatim. I believe the normalize-space() method will also replace 2 or more spaces (even inside the string) and replace them with a single space. At least this is true in XSLT. Link to comment Share on other sites More sharing options...
nrg_alpha Posted October 29, 2009 Share Posted October 29, 2009 I think a better solution would be to use the DOMDocument class. With this approach you can extract what you need and use XPath to specify the exact nodes you want to filter out. does this class require any installastion reconfiguration ? importation linking ? or is it freely available in the environment If I'm not mistaken, I think DOM/XPath is part of the PHP core (looking up domxpath in the PHP manual will link to the dom in general, which has no installation/configurations required). Link to comment Share on other sites More sharing options...
Maq Posted October 29, 2009 Share Posted October 29, 2009 I think a better solution would be to use the DOMDocument class. With this approach you can extract what you need and use XPath to specify the exact nodes you want to filter out. does this class require any installastion reconfiguration ? importation linking ? or is it freely available in the environment It's "freely in the environment". Link to comment Share on other sites More sharing options...
nrg_alpha Posted October 29, 2009 Share Posted October 29, 2009 The normalize-space acts like trim() in simple layman terms and removes any spaces that might otherwise throw off a search, which tends to look for stuff verbatim. I believe the normalize-space() method will also replace 2 or more spaces (even inside the string) and replace them with a single space. At least this is true in XSLT. You're right.. so perhaps using trim() wasn't the best example. So let me rephrase.. normalize-space removes spaces (edit, sorry bout that). Link to comment Share on other sites More sharing options...
nadeemshafi9 Posted October 29, 2009 Share Posted October 29, 2009 It's "freely in the environment". freal ? Link to comment Share on other sites More sharing options...
Maq Posted October 29, 2009 Share Posted October 29, 2009 It's "freely in the environment". freal ? Sorry, I don't understand what you're saying, I only speak English. Link to comment Share on other sites More sharing options...
nadeemshafi9 Posted October 29, 2009 Share Posted October 29, 2009 It's "freely in the environment". freal ? Sorry, I don't understand what you're saying, I only speak English. i know you are but what am i, to the original poster, you could try spliting the document into an array the good old fashioned way Link to comment Share on other sites More sharing options...
Maq Posted October 29, 2009 Share Posted October 29, 2009 i know you are but what am i, You're a manchild. you could try spliting the document into an array the good old fashioned way 1) Why would he try that when we already have a proper working solution? 2) I think you meant to say, "the good old inefficient way". 3) Why in the world would you use an array when they made the DOM class and XPath for this specific reason? Come on Nadeem, that's just silly. Link to comment Share on other sites More sharing options...
nadeemshafi9 Posted October 29, 2009 Share Posted October 29, 2009 i know you are but what am i, You're a manchild. you could try spliting the document into an array the good old fashioned way 1) Why would he try that when we already have a proper working solution? 2) I think you meant to say, "the good old inefficient way". 3) Why in the world would you use an array when they made the DOM class and XPath for this specific reason? Come on Nadeem, that's just silly. dont be so narrow minded he could probably find it more interesting, manchild lol Link to comment Share on other sites More sharing options...
Maq Posted October 29, 2009 Share Posted October 29, 2009 i know you are but what am i, You're a manchild. you could try spliting the document into an array the good old fashioned way 1) Why would he try that when we already have a proper working solution? 2) I think you meant to say, "the good old inefficient way". 3) Why in the world would you use an array when they made the DOM class and XPath for this specific reason? Come on Nadeem, that's just silly. dont be so narrow minded he could probably find it more interesting, manchild lol Coming from someone of your caliber, lack of skill and grammar, I could care less what you think. freal Link to comment Share on other sites More sharing options...
nadeemshafi9 Posted October 29, 2009 Share Posted October 29, 2009 i know you are but what am i, You're a manchild. you could try spliting the document into an array the good old fashioned way 1) Why would he try that when we already have a proper working solution? 2) I think you meant to say, "the good old inefficient way". 3) Why in the world would you use an array when they made the DOM class and XPath for this specific reason? Come on Nadeem, that's just silly. dont be so narrow minded he could probably find it more interesting, manchild lol Coming from someone of your caliber, lack of skill and grammar, I could care less what you think. freal your an A.P Link to comment Share on other sites More sharing options...
cags Posted October 29, 2009 Share Posted October 29, 2009 your an A.P I agree, Maq does appear to be an Actual Programmer, unlike some others. Link to comment Share on other sites More sharing options...
nadeemshafi9 Posted October 29, 2009 Share Posted October 29, 2009 your an A.P I agree, Maq does appear to be an Actual Programmer, unlike some others. you can plug into him then Link to comment Share on other sites More sharing options...
Maq Posted October 29, 2009 Share Posted October 29, 2009 your an A.P I agree, Maq does appear to be an Actual Programmer, unlike some others. kevinkhan, have you come up with a solution? Link to comment Share on other sites More sharing options...
salathe Posted October 29, 2009 Share Posted October 29, 2009 You're right.. So let me rephrase.. normalize-space removes spaces Well, it does but also some other whitespace characters and only in certain circumstances. Just for reference, for the OP or whoever might want to know, it: 1. Replaces each carriage return (\r), line feed (\n), and tab (\t) character with a single space (\x20) 2. Collapses all consecutive spaces into a single space 3. Removes all leading and trailing spaces P.S. Wow, troll much? :-\ Link to comment Share on other sites More sharing options...
Maq Posted October 29, 2009 Share Posted October 29, 2009 P.S. Wow' date=' troll much? :-\[/quote'] I already fed him, and he's still hungry. Must be a fat troll. Link to comment Share on other sites More sharing options...
nadeemshafi9 Posted October 29, 2009 Share Posted October 29, 2009 P.S. Wow' date=' troll much? :-\[/quote'] I already fed him, and he's still hungry. Must be a fat troll. hey listen you started it with your insult, i dont come here to be insulted , you can clearly see my physique in the picture Link to comment Share on other sites More sharing options...
Maq Posted October 29, 2009 Share Posted October 29, 2009 P.S. Wow' date=' troll much? :-\[/quote'] I already fed him, and he's still hungry. Must be a fat troll. hey listen you started it with your insult, i dont come here to be insulted , you can clearly see my physique in the picture Relax, it's a metaphor. And no it did not start with my insult, because I didn't insult you. Everything I said was and is true. Link to comment Share on other sites More sharing options...
nadeemshafi9 Posted October 29, 2009 Share Posted October 29, 2009 your an A.P I agree, Maq does appear to be an Actual Programmer, unlike some others. why is it that when somone says somthing, you just repeat it in the same thread, you done that on a few posts now, knock it off, its not gona get you a job, you even done it on my answers in my last help to somone, it was getting annoying but i let you go, now im confronting you, stop your foolishness, its like having a mirror Link to comment Share on other sites More sharing options...
Recommended Posts