Jump to content

Recommended Posts

I'm just learning to use xpath to scrape links from a site. Here's some code that works:

 

$dom = new DOMDocument();
@ $dom->loadHTML( $html);
$xpath = new DOMXPath( $dom);
$hrefs = $xpath->evaluate( '/html/body//ul//li[@class="interwiki-de"]//a');

 

What I'd like is something like the following, though I don't know if the details for the union are correct, or could be simplified:

 

$hrefs = $xpath->evaluate( '/html/body//ul//li[@class=("interwiki-de"|"interwiki-jp")]//a');

 

This gives me "Warning: DOMXPath::evaluate() [domxpath.evaluate]: Invalid expression in ..."

 

It was just a guess, so I tried something that was more likely to work (actually a couple things before I got to this point):

 

$hrefs = $xpath->evaluate( '/html/body//ul//(li | stuff)//a');

 

This also gives me an invalid expression. In fact, any expression that contains parenthesis gives an invalid expression. The example unions I see lead me to believe it's possible to do what I want. What am I missing, please?

 

Link to comment
https://forums.phpfreaks.com/topic/149120-xpath-doesnt-like-parenthesis/
Share on other sites

Well, I'm not just scraping links. I'm following a few specific links on the page and summarizing both the main page and those linked pages. Basically, I want to drop a link in a box and have it spit out information that would require thirty clicks and a bunch of scrolling and copy-paste and window switching to get by hand. It's something I have do do hundreds of times for pages in a particular format.

 

But yeah, getting the links is not terribly difficult and I'm moving on to other methods.

Nope. tried that. It also gives errors. It's not just parenthesis it doesn't like. None of the non-trivial examples I've found work through the php interface. Maybe the strings have to be encoded somehow. I've tried escaping the characters it seems to have difficulty with and that doesn't fix it, and discussions I've found say it won't take escaped strings.

 

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.