Jump to content

Phrasing only the content area of a page


cooldude832

Recommended Posts

I am trying to return only the content section of a page that is a "text" page, so the forum would not apply, like an about.com or wikipedia article is what i am trying to get.  On wiki I know I can easily do it by getting the div labeled "content", but what abotu pages that aren't labeled, is there any ideas out there?

Link to comment
https://forums.phpfreaks.com/topic/77086-phrasing-only-the-content-area-of-a-page/
Share on other sites

my new idea is this

 

I think this might work

 

Use strip_tags on the file_get_contents, but some how preserver all the divs, tables,tr, tds (the container elements)

 

then count the number of words in each element using the opener/closer tags so to speak

 

the container with the greatest nubmer of words is the said "content", and then simply find that text and I got it, make sense?  my current issue then is how in the hell do I strip all tags but <div> <table><tr><td></div></table></tr></td>

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.