PersianMan Posted February 28, 2011 Share Posted February 28, 2011 Dear friends, I wrote a code to extract a text from a pages of a site like this: ************ $handle = @fopen($url, 'r'); $contents = ''; if ($handle) { while (!feof($handle)) { $contents .= fread($handle, 8192); } ************ This code is working properly with many pages just pages those are began with the following tags: ************ ... <body> <form name="aspnetForm" method="post" action="ViewContents.aspx?Contract=cms_Contents_I_News&%3br=721192" id="aspnetForm"> <input type="hidden" name="__VIEWSTATE" id=" __VIEWSTATE" value="" /> <input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWAwL+raDpAgL3qPzdCwLyp86ZD5mqDm6ZnRL/pRerpqyobvzmy5LB" /> ************ The result of function read() in variable $content is not full. It's just theme of the page without main content. I mean there isn't the story related to id of page (e.g. 721192) in the $content. Why? Is the above <form> affected the result? What can i do? Please help me. Quote Link to comment Share on other sites More sharing options...
trq Posted February 28, 2011 Share Posted February 28, 2011 Maybe you haven't set the length argument to be long enough. Quote Link to comment Share on other sites More sharing options...
PersianMan Posted February 28, 2011 Author Share Posted February 28, 2011 Maybe you haven't set the length argument to be long enough. Dear friend, No, The $content is including header and footer of the page. But there is no main content related to the id of the page. Also, It works correctly with pages of other websites. I think for the pages of this website, i need to another solution for reading the content of the page like what crawlers do. Thanks Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.