DIM3NSION Posted May 3, 2011 Share Posted May 3, 2011 Hi guys. I have been using the wikipedia API to retrieve information about a topic. Ive managed to get a response and retrieve the first section of the topic (in this case football) Using this method - http://en.wikipedia.org/w/api.php?action=parse&page='.$search.'&redirects=1&format=json&prop=text§ion=0'); However the first section that is retrieved includes the pictures and i just want to main text from the introduction. The code that is sent back from wiki is this - Array ( [parse] => Array ( [text] => Array ( [*] => <div class="dablink">This article is about sports known as football. For the ball used in these sports, see <a href="/wiki/Football_(ball)">Football (ball)</a>.</div> <div class="thumb tright"> <div class="thumbinner" style="width:227px;"><a href="/wiki/File:Football4.png" class="image"><img alt="" src="http://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Football4.png/225px-Football4.png" width="225" height="274" class="thumbimage" /></a> <div class="thumbcaption"> <div class="magnify"><a href="/wiki/File:Football4.png" class="internal" title="Enlarge"><img src="http://bits.wikimedia.org/skins-1.17/common/images/magnify-clip.png" width="15" height="11" alt="" /></a></div> Some of the many different games known as football. From top left to bottom right: <a href="/wiki/Association_football">Association football</a> or soccer, <a href="/wiki/Australian_rules_football">Australian rules football</a>, <a href="/wiki/International_rules_football">International rules football</a>, <a href="/wiki/Rugby_Union" class="mw-redirect" title="Rugby Union">Rugby Union</a>, <a href="/wiki/Rugby_League" class="mw-redirect" title="Rugby League">Rugby League</a>, and <a href="/wiki/American_Football" class="mw-redirect" title="American Football">American Football</a>.</div> </div> </div> <p>The game of <b>football</b> is any of several similar <a href="/wiki/Team_sport" title="Team sport">team sports</a>, of similar origins which involve advancing a ball into a goal area in an attempt to score. Many of these involve <a href="/wiki/Kick_(football)" title="Kick (football)">kicking</a> a ball with the foot to score a <a href="/wiki/Goal_(sport)" title="Goal (sport)">goal</a>, though not all codes of football using kicking as a primary means of advancing the ball or scoring. The most popular of these sports worldwide is <a href="/wiki/Association_football">association football</a>, more commonly known as just "football" or "soccer". Unqualified, the word <i><a href="/wiki/Football_(word)" title="Football (word)">football</a></i> applies to whichever form of football is the most popular in the regional context in which the word appears, including <a href="/wiki/American_football">American football</a>, <a href="/wiki/Australian_rules_football">Australian rules football</a>, <a href="/wiki/Canadian_football">Canadian football</a>, <a href="/wiki/Gaelic_football">Gaelic football</a>, <a href="/wiki/Rugby_league">rugby league</a>, <a href="/wiki/Rugby_union">rugby union</a> and other related games. These variations are known as "codes".</p> I want the code that resides in the <p> tags. How would i go about parsing this and removing the rest. ive tried to get to work simple html dom parser but with no luck. Any help would be greatly appreciated Thanks, DIM3NSION Quote Link to comment https://forums.phpfreaks.com/topic/235432-parsing-html-from-wikipedia/ Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.