amyhughes Posted March 12, 2009 Share Posted March 12, 2009 I'm just learning to use xpath to scrape links from a site. Here's some code that works: $dom = new DOMDocument(); @ $dom->loadHTML( $html); $xpath = new DOMXPath( $dom); $hrefs = $xpath->evaluate( '/html/body//ul//li[@class="interwiki-de"]//a'); What I'd like is something like the following, though I don't know if the details for the union are correct, or could be simplified: $hrefs = $xpath->evaluate( '/html/body//ul//li[@class=("interwiki-de"|"interwiki-jp")]//a'); This gives me "Warning: DOMXPath::evaluate() [domxpath.evaluate]: Invalid expression in ..." It was just a guess, so I tried something that was more likely to work (actually a couple things before I got to this point): $hrefs = $xpath->evaluate( '/html/body//ul//(li | stuff)//a'); This also gives me an invalid expression. In fact, any expression that contains parenthesis gives an invalid expression. The example unions I see lead me to believe it's possible to do what I want. What am I missing, please? Quote Link to comment https://forums.phpfreaks.com/topic/149120-xpath-doesnt-like-parenthesis/ Share on other sites More sharing options...
amyhughes Posted March 13, 2009 Author Share Posted March 13, 2009 Does anyone use xpath? Quote Link to comment https://forums.phpfreaks.com/topic/149120-xpath-doesnt-like-parenthesis/#findComment-783888 Share on other sites More sharing options...
JonnoTheDev Posted March 13, 2009 Share Posted March 13, 2009 Doubt it. If all you are doing is scraping links (like a spider) contained in href html tags then a simple bit of regex will suffice. Quote Link to comment https://forums.phpfreaks.com/topic/149120-xpath-doesnt-like-parenthesis/#findComment-783898 Share on other sites More sharing options...
amyhughes Posted March 13, 2009 Author Share Posted March 13, 2009 Well, I'm not just scraping links. I'm following a few specific links on the page and summarizing both the main page and those linked pages. Basically, I want to drop a link in a box and have it spit out information that would require thirty clicks and a bunch of scrolling and copy-paste and window switching to get by hand. It's something I have do do hundreds of times for pages in a particular format. But yeah, getting the links is not terribly difficult and I'm moving on to other methods. Quote Link to comment https://forums.phpfreaks.com/topic/149120-xpath-doesnt-like-parenthesis/#findComment-784170 Share on other sites More sharing options...
Maq Posted March 13, 2009 Share Posted March 13, 2009 I use XPath with XSLT but and if it's the same then you don't need the parentheses: $hrefs = $xpath->evaluate( '/html/body//ul//li[@class="interwiki-de"|"interwiki-jp"]//a'); Quote Link to comment https://forums.phpfreaks.com/topic/149120-xpath-doesnt-like-parenthesis/#findComment-784176 Share on other sites More sharing options...
amyhughes Posted March 14, 2009 Author Share Posted March 14, 2009 Nope. tried that. It also gives errors. It's not just parenthesis it doesn't like. None of the non-trivial examples I've found work through the php interface. Maybe the strings have to be encoded somehow. I've tried escaping the characters it seems to have difficulty with and that doesn't fix it, and discussions I've found say it won't take escaped strings. Quote Link to comment https://forums.phpfreaks.com/topic/149120-xpath-doesnt-like-parenthesis/#findComment-784332 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.