Grodo Posted March 11, 2008 Share Posted March 11, 2008 I am looking for the easiest method to parse a webpage and extract the text from it. I realize that I could use the function get_file_contents() and then do a preg replace on all html tags. But is there an easier fuction to just get the plain text or is the only way of doing it is to use preg replace? What I am trying to do is open the Fedex website search a tracking number and then parse the text of the website to a string. After the text is parsed I will search through it and extract the status, tracking number, and date delivered. Problem with this code is that it returns the html coding <?php if($getcon = file_get_contents("http://www.fedex.com/Tracking?ascend_header=1&clienttype=dotcom&cntry_code=us&language=english&tracknumbers=222222222222222")) { echo $getcon; } else { echo "Error: Could not connect to page..." } ?> Quote Link to comment Share on other sites More sharing options...
trq Posted March 11, 2008 Share Posted March 11, 2008 If you just want to strip the html use strip_tags(). Is that your question? Quote Link to comment Share on other sites More sharing options...
Grodo Posted March 11, 2008 Author Share Posted March 11, 2008 Thanks for the quick reply but that is not the solution... You were on the right track of what I wanted to do. The striptags() had no effect to the variable. I have also tried to set the strip_tags() to a variable via $page = strip_tags($getcon); and echo out ($page) but it did the same thing as strip_tags($getcon). My current code is <?php if($getcon = file_get_contents("http://www.fedex.com/Tracking?ascend_header=1&clienttype=dotcom&cntry_code=us&language=english&tracknumbers=222222222222222")) { strip_tags($getcon); } else { echo "Error: Could not connect to page..." } ?> Quote Link to comment Share on other sites More sharing options...
trq Posted March 11, 2008 Share Posted March 11, 2008 The striptags() had no effect to the variable. My current code is How do you know it had no affect? You never echo it. Quote Link to comment Share on other sites More sharing options...
Grodo Posted March 11, 2008 Author Share Posted March 11, 2008 You caught me as i was editing my post... WOW you guys are extremely fast Quote Link to comment Share on other sites More sharing options...
alecks Posted March 11, 2008 Share Posted March 11, 2008 if you do echo '<textarea style="width: 500px; height: 300px;">'.strip_tags($getcon).'</textarea>'; you can better see what effect strip_tags has had... Quote Link to comment Share on other sites More sharing options...
alecks Posted March 11, 2008 Share Posted March 11, 2008 OK I just had a quick look at the source of the page you are trying to extract data from, I assume you are trying to get the table that lists status? Well in the source it is nicely surrounded by '<!-- BEGIN Scan Activity -->' and '<!-- END Scan Activity -->', you can use strpos() to find where these are as a string index, and then you can just get the data in between the two. Quote Link to comment Share on other sites More sharing options...
alecks Posted March 11, 2008 Share Posted March 11, 2008 and here we go... <?php // http://www.phpfreaks.com/forums/index.php/topic,186985.0.html if($getcon = file_get_contents("http://www.fedex.com/Tracking?ascend_header=1&clienttype=dotcom&cntry_code=us&language=english&tracknumbers=222222222222222")) { $one = strpos($getcon, "<!-- BEGIN Scan Activity -->"); $final = substr($getcon, $one); $two = strpos($final, "<!-- END Scan Activity -->"); $final = substr($final, 0, $two); echo $final; } else { echo "Error: Could not connect to page..."; } ?> Quote Link to comment Share on other sites More sharing options...
Grodo Posted March 12, 2008 Author Share Posted March 12, 2008 THANKS for the awesome reply Alecks! I didnt even know that those commands existed. Now correct me if I am wrong... strpos set the position of the string to read. substr removes all parts up to the begin scan activity line. the example you coded for me works perfectly! Now it time for phase two which I dont think would be to difficult and that is searching through the string... Ill attempt this in a few hours and post my results Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.