Jump to content

Problem with parsing curl result


Levian

Recommended Posts

Hi,

 

Well, I've been wondering if I can get answer to my problem here...

 

I'd used curl to fetch information from other site and then I need to parse the data to be displayed in our own. However I found something odd...I can't parse it to get what I want...however should I put the result into a html file, n parse the file itself...it's fine.

 

Basically I just need to find whether a word/phrase is there or not...n I'm using preg_match to do it so far. Since the time it first arises...I'd been googling n saw some posts...something similar to how regex is doing poorly in doing such thing...

 

However it does work when I just make a php file just to do that regex from the curl-result html file, it just not work on the actual php file which handle the process even if it's from the same curl-result html file. FYI, I'm using codeigniter.

I wonder if it got something to do with codeigniter or something...or is there any other better way to parse a html file from a curl result. In my opinion it's just the same...be it a curl result or just a normal html...so I really can't figure out where it went wrong.

 

I appreciate if there's anyone who can shed a light on this problem, even a little bit is a great help to me

 

 

 

Thanks in advance,

Link to comment
Share on other sites

That's a real fast, thanks a lot bsmither

 

Yes, returntransfer is set to true

If you don't mind could you please enlighten me as to the fact behind that "be careful" ? what may go wrong with xml object ?

Anyway it should be a page capture as I need to detect the presence or absence of some word, so it should be html object instead of xml.

 

What I meant is that I'm using preg_match n so far as I know the matching pattern is done with regex, please do correct me if I'm mistaken in this

I'll try strpos as per your advice.

Link to comment
Share on other sites

An XML object requires(?) object notation and (maybe?) everything about it, all its nodes, etc, is an object.

 

I get caught frequently when I need the value of a node and forget to cast it as a useable variable type:

$wanted = (string)$obj_a->obj_b->obj_c->obj_d;

 

So, I need to keep my wits about me.

Link to comment
Share on other sites

Thanks bsmither n hansford for the reply.

 

Well, strpos gives me result, n that htmlparsing is a good addition to my knowledge.

Anyway found some problem ahead that needs to be solved first before going with more "parsing", I did found another problem with using preg_match n regex to parse further, but I want to double-check things first before going ahead with any more question.

 

Last for now, I just wonder what's the bad side of using regex to parse html documents (regardless, I'll try to use dom parser more...the question is just a curiosity of mine)

Link to comment
Share on other sites

 

 

Last for now, I just wonder what's the bad side of using regex to parse html documents

 

Because html is not context free grammar - it's currently forgiving, allowing you to omit closing tags, use wrong nested tags  etc.

To successfully parse something you need something that is strictly defined. I've used regex to extract data from html and it can be a real pain - especially if the exact layout of the page is not known - your regex will fail and you'll have to add another expression to cover if the page is laid out this way or another way or even another way. 

Link to comment
Share on other sites

I see, thanks for the words hansford

 

I'm stumbled on yet another stuffs around curl...

 

Well, the problem is because the login id is one n used by many...I'm using username n session id to determine the cookies, on a case when a second browser open up it should have different session id, hence it may create new cookies instead of using the active previously-made cookie file, please do correct me if I'm wrong.

 

Any idea on how to detect that user is logged or not ?

Link to comment
Share on other sites

 

I'm stumbled on yet another stuffs around curl...

 

Well, the problem is because the login id is one n used by many...I'm using username n session id to determine the cookies, on a case when a second browser open up it should have different session id, hence it may create new cookies instead of using the active previously-made cookie file, please do correct me if I'm wrong.

 

Any idea on how to detect that user is logged or not ?

 

Form this into another question and post another topic so you can benefit from other members' input on the forum.

Many, way more knowledgeable than myself.

Edited by hansford
Link to comment
Share on other sites

I'd just made it a new post.

 

Anyway as for conclusion of my question on parsing...regex isn't really a safe way to parse html document, hence using preg_match directly to parse html is risky on its own...probably as time goes on n html isn't as forgiving...

 

Is there any way to mark this post as "SOLVED" ?

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.