Phrasing only the content area of a page

cooldude832 · November 13, 2007

I am trying to return only the content section of a page that is a "text" page, so the forum would not apply, like an about.com or wikipedia article is what i am trying to get. On wiki I know I can easily do it by getting the div labeled "content", but what abotu pages that aren't labeled, is there any ideas out there?

phpQuestioner · November 13, 2007

contact about.com and ask them if they have any type of API or XML feed for others to include the contents of their pages into your pages.

cooldude832 · November 13, 2007

it won't just be about or wiki, if it was hte casee I could easily build an inclusive case, but in this case it is any article type page could even be a phpfreaks article. I think maybe an rss feed but that isnt' goign to work

cooldude832 · November 13, 2007

my new idea is this

I think this might work

Use strip_tags on the file_get_contents, but some how preserver all the divs, tables,tr, tds (the container elements)

then count the number of words in each element using the opener/closer tags so to speak

the container with the greatest nubmer of words is the said "content", and then simply find that text and I got it, make sense? my current issue then is how in the hell do I strip all tags but <div> <table><tr><td></div></table></tr></td>

gin · November 13, 2007

look up strip_tags()

Sign In

Phrasing only the content area of a page

Recommended Posts

cooldude832

Link to comment

Share on other sites

phpQuestioner

Link to comment

Share on other sites

cooldude832

Link to comment

Share on other sites

cooldude832

Link to comment

Share on other sites

gin

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information