ngng Posted April 17, 2008 Share Posted April 17, 2008 I'm having trouble screen scraping with preg_match, for example, I'm trying to pull everything from wikipedia between these tags: <h5 style="white-space: nowrap;"> and </h5> ideally, should return: <h5 style="white-space: nowrap;"><label for="searchInput"> <span lang="en" xml:lang="en">Search</span> <b>·</b> <span lang="de" xml:lang="de">Suche</span> <b>·</b> <span lang="fr" xml:lang="fr">Rechercher</span> <b>·</b> <span lang="pl" xml:lang="pl">Szukaj</span> <b>·</b> <span lang="ja" xml:lang="ja" title="Kensaku">検索</span> <b>·</b> <span lang="it" xml:lang="it">Ricerca</span> <b>·</b> <span lang="nl" xml:lang="nl">Zoeken</span> <b>·</b> <span lang="pt" xml:lang="pt">Busca</span> <b>·</b> <span lang="es" xml:lang="es">Buscar</span><br /> <span lang="sv" xml:lang="sv">Sök</span> <b>·</b> <span lang="ru" xml:lang="ru" title="Poisk">Поиск</span> <b>·</b> <span lang="zh" xml:lang="zh" title="Sōusuǒ">搜索</span> <b>·</b> <span lang="nb" xml:lang="nb">Søk</span> <b>·</b> <span lang="fi" xml:lang="fi">Haku</span> <b>·</b> <span lang="vo" xml:lang="vo">Suk</span> <b>·</b> <span lang="ca" xml:lang="ca">Cerca</span> <b>·</b> <span lang="ro" xml:lang="ro">Căutare</span> <b>·</b> <span lang="tr" xml:lang="tr">Ara</span> <b>·</b> <span lang="uk" xml:lang="uk" title="Pošuk">Пошук</span> </label></h5> I can't seem to get it to work. Yes, I know content changes and screen scraping is not the best way to do something, but for the sake of learning, I want to try this. <? $url = file_get_contents('http://wikipedia.org/'); $regex = '/\<h5(.*)\>\<\/h5\>/m'; // match preg_match($regex, $url, $output); var_dump($output); ?> Link to comment https://forums.phpfreaks.com/topic/101483-screen-scraping-help-with-preg_match/ Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.