Levian Posted December 30, 2014 Share Posted December 30, 2014 Hi, Well, I've been wondering if I can get answer to my problem here... I'd used curl to fetch information from other site and then I need to parse the data to be displayed in our own. However I found something odd...I can't parse it to get what I want...however should I put the result into a html file, n parse the file itself...it's fine. Basically I just need to find whether a word/phrase is there or not...n I'm using preg_match to do it so far. Since the time it first arises...I'd been googling n saw some posts...something similar to how regex is doing poorly in doing such thing... However it does work when I just make a php file just to do that regex from the curl-result html file, it just not work on the actual php file which handle the process even if it's from the same curl-result html file. FYI, I'm using codeigniter. I wonder if it got something to do with codeigniter or something...or is there any other better way to parse a html file from a curl result. In my opinion it's just the same...be it a curl result or just a normal html...so I really can't figure out where it went wrong. I appreciate if there's anyone who can shed a light on this problem, even a little bit is a great help to me Thanks in advance, Quote Link to comment Share on other sites More sharing options...
bsmither Posted December 30, 2014 Share Posted December 30, 2014 Are you setting CURLOPT_RETURNTRANSFER? If so, capturing the results into what? An XML object? Be careful! This word/phrase... will be sometimes this and sometimes that, requiring regex's? I would like to see your results using strpos(). Quote Link to comment Share on other sites More sharing options...
Levian Posted December 30, 2014 Author Share Posted December 30, 2014 That's a real fast, thanks a lot bsmither Yes, returntransfer is set to true If you don't mind could you please enlighten me as to the fact behind that "be careful" ? what may go wrong with xml object ? Anyway it should be a page capture as I need to detect the presence or absence of some word, so it should be html object instead of xml. What I meant is that I'm using preg_match n so far as I know the matching pattern is done with regex, please do correct me if I'm mistaken in this I'll try strpos as per your advice. Quote Link to comment Share on other sites More sharing options...
bsmither Posted December 30, 2014 Share Posted December 30, 2014 An XML object requires(?) object notation and (maybe?) everything about it, all its nodes, etc, is an object. I get caught frequently when I need the value of a node and forget to cast it as a useable variable type: $wanted = (string)$obj_a->obj_b->obj_c->obj_d; So, I need to keep my wits about me. Quote Link to comment Share on other sites More sharing options...
hansford Posted December 30, 2014 Share Posted December 30, 2014 You should be looking in Html parsing. Here's a link to get you started. http://htmlparsing.com/php.html Quote Link to comment Share on other sites More sharing options...
Levian Posted December 30, 2014 Author Share Posted December 30, 2014 Thanks bsmither n hansford for the reply. Well, strpos gives me result, n that htmlparsing is a good addition to my knowledge. Anyway found some problem ahead that needs to be solved first before going with more "parsing", I did found another problem with using preg_match n regex to parse further, but I want to double-check things first before going ahead with any more question. Last for now, I just wonder what's the bad side of using regex to parse html documents (regardless, I'll try to use dom parser more...the question is just a curiosity of mine) Quote Link to comment Share on other sites More sharing options...
hansford Posted December 31, 2014 Share Posted December 31, 2014 Last for now, I just wonder what's the bad side of using regex to parse html documents Because html is not context free grammar - it's currently forgiving, allowing you to omit closing tags, use wrong nested tags etc. To successfully parse something you need something that is strictly defined. I've used regex to extract data from html and it can be a real pain - especially if the exact layout of the page is not known - your regex will fail and you'll have to add another expression to cover if the page is laid out this way or another way or even another way. Quote Link to comment Share on other sites More sharing options...
Levian Posted December 31, 2014 Author Share Posted December 31, 2014 I see, thanks for the words hansford I'm stumbled on yet another stuffs around curl... Well, the problem is because the login id is one n used by many...I'm using username n session id to determine the cookies, on a case when a second browser open up it should have different session id, hence it may create new cookies instead of using the active previously-made cookie file, please do correct me if I'm wrong. Any idea on how to detect that user is logged or not ? Quote Link to comment Share on other sites More sharing options...
hansford Posted December 31, 2014 Share Posted December 31, 2014 (edited) I'm stumbled on yet another stuffs around curl... Well, the problem is because the login id is one n used by many...I'm using username n session id to determine the cookies, on a case when a second browser open up it should have different session id, hence it may create new cookies instead of using the active previously-made cookie file, please do correct me if I'm wrong. Any idea on how to detect that user is logged or not ? Form this into another question and post another topic so you can benefit from other members' input on the forum. Many, way more knowledgeable than myself. Edited December 31, 2014 by hansford Quote Link to comment Share on other sites More sharing options...
Levian Posted December 31, 2014 Author Share Posted December 31, 2014 I'd just made it a new post. Anyway as for conclusion of my question on parsing...regex isn't really a safe way to parse html document, hence using preg_match directly to parse html is risky on its own...probably as time goes on n html isn't as forgiving... Is there any way to mark this post as "SOLVED" ? Quote Link to comment Share on other sites More sharing options...
Levian Posted December 31, 2014 Author Share Posted December 31, 2014 Forgotten...thanks a lot to both of you for the help...it really helps a lot...time-wise especially... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.