Problem with parsing curl result

Levian · December 30, 2014

Hi,

Well, I've been wondering if I can get answer to my problem here...

I'd used curl to fetch information from other site and then I need to parse the data to be displayed in our own. However I found something odd...I can't parse it to get what I want...however should I put the result into a html file, n parse the file itself...it's fine.

Basically I just need to find whether a word/phrase is there or not...n I'm using preg_match to do it so far. Since the time it first arises...I'd been googling n saw some posts...something similar to how regex is doing poorly in doing such thing...

However it does work when I just make a php file just to do that regex from the curl-result html file, it just not work on the actual php file which handle the process even if it's from the same curl-result html file. FYI, I'm using codeigniter.

I wonder if it got something to do with codeigniter or something...or is there any other better way to parse a html file from a curl result. In my opinion it's just the same...be it a curl result or just a normal html...so I really can't figure out where it went wrong.

I appreciate if there's anyone who can shed a light on this problem, even a little bit is a great help to me

Thanks in advance,

bsmither · December 30, 2014

Are you setting CURLOPT_RETURNTRANSFER? If so, capturing the results into what? An XML object? Be careful!

This word/phrase... will be sometimes this and sometimes that, requiring regex's? I would like to see your results using strpos().

Levian · December 30, 2014

That's a real fast, thanks a lot bsmither

Yes, returntransfer is set to true

If you don't mind could you please enlighten me as to the fact behind that "be careful" ? what may go wrong with xml object ?

Anyway it should be a page capture as I need to detect the presence or absence of some word, so it should be html object instead of xml.

What I meant is that I'm using preg_match n so far as I know the matching pattern is done with regex, please do correct me if I'm mistaken in this

I'll try strpos as per your advice.

bsmither · December 30, 2014

An XML object requires(?) object notation and (maybe?) everything about it, all its nodes, etc, is an object.

I get caught frequently when I need the value of a node and forget to cast it as a useable variable type:

$wanted = (string)$obj_a->obj_b->obj_c->obj_d;

So, I need to keep my wits about me.

hansford · December 30, 2014

You should be looking in Html parsing. Here's a link to get you started. http://htmlparsing.com/php.html

Levian · December 30, 2014

Thanks bsmither n hansford for the reply.

Well, strpos gives me result, n that htmlparsing is a good addition to my knowledge.

Anyway found some problem ahead that needs to be solved first before going with more "parsing", I did found another problem with using preg_match n regex to parse further, but I want to double-check things first before going ahead with any more question.

Last for now, I just wonder what's the bad side of using regex to parse html documents (regardless, I'll try to use dom parser more...the question is just a curiosity of mine)

hansford · December 31, 2014

Last for now, I just wonder what's the bad side of using regex to parse html documents

Because html is not context free grammar - it's currently forgiving, allowing you to omit closing tags, use wrong nested tags etc.

To successfully parse something you need something that is strictly defined. I've used regex to extract data from html and it can be a real pain - especially if the exact layout of the page is not known - your regex will fail and you'll have to add another expression to cover if the page is laid out this way or another way or even another way.

Levian · December 31, 2014

I see, thanks for the words hansford

I'm stumbled on yet another stuffs around curl...

Well, the problem is because the login id is one n used by many...I'm using username n session id to determine the cookies, on a case when a second browser open up it should have different session id, hence it may create new cookies instead of using the active previously-made cookie file, please do correct me if I'm wrong.

Any idea on how to detect that user is logged or not ?

hansford · December 31, 2014

I'm stumbled on yet another stuffs around curl...

Well, the problem is because the login id is one n used by many...I'm using username n session id to determine the cookies, on a case when a second browser open up it should have different session id, hence it may create new cookies instead of using the active previously-made cookie file, please do correct me if I'm wrong.

Any idea on how to detect that user is logged or not ?

Form this into another question and post another topic so you can benefit from other members' input on the forum.

Many, way more knowledgeable than myself.

Levian · December 31, 2014

I'd just made it a new post.

Anyway as for conclusion of my question on parsing...regex isn't really a safe way to parse html document, hence using preg_match directly to parse html is risky on its own...probably as time goes on n html isn't as forgiving...

Is there any way to mark this post as "SOLVED" ?

Levian · December 31, 2014

Forgotten...thanks a lot to both of you for the help...it really helps a lot...time-wise especially...

Sign In

Problem with parsing curl result

Recommended Posts

Levian

Link to comment

Share on other sites

bsmither

Link to comment

Share on other sites

Levian

Link to comment

Share on other sites

bsmither

Link to comment

Share on other sites

hansford

Link to comment

Share on other sites

Levian

Link to comment

Share on other sites

hansford

Link to comment

Share on other sites

Levian

Link to comment

Share on other sites

hansford

Link to comment

Share on other sites

Levian

Link to comment

Share on other sites

Levian

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information