cooldude832 Posted November 13, 2007 Share Posted November 13, 2007 I am trying to return only the content section of a page that is a "text" page, so the forum would not apply, like an about.com or wikipedia article is what i am trying to get. On wiki I know I can easily do it by getting the div labeled "content", but what abotu pages that aren't labeled, is there any ideas out there? Quote Link to comment Share on other sites More sharing options...
phpQuestioner Posted November 13, 2007 Share Posted November 13, 2007 contact about.com and ask them if they have any type of API or XML feed for others to include the contents of their pages into your pages. Quote Link to comment Share on other sites More sharing options...
cooldude832 Posted November 13, 2007 Author Share Posted November 13, 2007 it won't just be about or wiki, if it was hte casee I could easily build an inclusive case, but in this case it is any article type page could even be a phpfreaks article. I think maybe an rss feed but that isnt' goign to work Quote Link to comment Share on other sites More sharing options...
cooldude832 Posted November 13, 2007 Author Share Posted November 13, 2007 my new idea is this I think this might work Use strip_tags on the file_get_contents, but some how preserver all the divs, tables,tr, tds (the container elements) then count the number of words in each element using the opener/closer tags so to speak the container with the greatest nubmer of words is the said "content", and then simply find that text and I got it, make sense? my current issue then is how in the hell do I strip all tags but <div> <table><tr><td></div></table></tr></td> Quote Link to comment Share on other sites More sharing options...
gin Posted November 13, 2007 Share Posted November 13, 2007 look up strip_tags() Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.