cordoprod

Parsing HTML, starts at right place, but stops at wrong place

cordoprod posted a topic in PHP Coding Help

I need to parse this HTML out of a webpage: <div id="maincontent" class="left"> <h1 class="node">Akademiet VGS gir Mac til alle elever</h1> <div class="node"> <div class="submitted">Skrevet av <a href="/user/24" title="Vis brukerprofil.">kolby</a>, fre, 29/01/2010 - 09:26</div> <div class="content"><p><img src="http://mac1.no/files/mac1/akademiet-logo.jpg" align="right" />De seks videregående skolene under Akademiet-kjeden skal nå bytte ut alle PCene med Macintosh. Det vil si at rundt 2 000 elever vil få hver sin Mac, og dette blir den største enkeltordren Apple noen gang har fått i Norge.</p> <p><a href="http://www.akademiet.no/vgs" rel="nofollow">Akademiet</a> var i 2004 noen av skolene i landet som var først ute med å tilby alle elevene egne PCer. Nå som denne praksisen har spredt seg til flere videregående- og ungdomsskoler, velger Akademiet å fornye seg ved å tilby egne Macer istedenfor.</p> <blockquote><p>Da vi tilbød pc-er til elevene var vi som privatskole med på å heve standarden for den offentlige skolen også. Det har vært en stor suksess, men vi ønsker å ta skrittet videre, spesielt fordi en så stor andel av elevmassen arbeider med medie- og kommunikasjonsfag. Mange av dem jobber mye med bilder, video og lyd og da er Mac-en en meget god plattform. Dessuten har Mac-er lengre levetid enn pc-er og det påvirker økonomien,</p></blockquote> <p> sier daglig leder og markedsansvarlig ved Akademiet Privatist Drammen, Hilde Dramdal til <a href="http://www.tu.no/it/article233273.ece" rel="nofollow">Teknisk Ukeblad</a>.</p> <p>Akademiet nøyer seg ikke bare med dette. To av skolene i kjeden har også begynt å teste ut bruk av Mac OS X Server for å forenkle kommunikasjonen med alle de hundre Macene som vil dukke opp rundt på skolene.</p> <p>Er dette begynnelsen på en endring i norske skoler? De siste årene har vi Macere sett en utrolig utvikling der flere og flere skaffer seg en Macintosh. Suksessen til Apple har gitt selskapet større muligheter til å lage fantastiske produkter til oss, og store endringer som dette i norske skoler kan bety bedre muligheter for oss Macere på Universiteter og Høgskoler.</p> <h2>Eplehuset leverer ordren</h2> <p>Det er Eplehuset som leverer de 2500 Macene til Akademiet i Nordens største Apple-investering noen gang.</p> <blockquote><p>Eplehuset er valgt som leverandør på grunn av sin produktkompense, geografiske nærhet, godt utbygde serviceapparat og, ikke minst, sitt klare fokus på utdanningssektoren. I leveransen ligger både leveranse av selve maskinene, men også opplæring av alle brukere, utvikling av driftsopplegg og integrasjon mot servere, kursing av IT-personell m.m. </p></blockquote> <p>sier Erlend Larsen, salgssjef i Eplehuset til Mac1.no</p> <p>Eplehuset er også valgt som leverandør av elev-Macer i Rogaland, hvor Eplehuset leverte over 800 enheter i fjor høst. Apple opplever tydeligvis en kraftig vekst i interessen for Mac i skolesektoren.</p> <p>Tilbyr skolen du går på PCer til alle elever? Da foreslår vi at dere starter en Mac User Group på skolen som kan bidra til å opplyse både elever, lærere og administrasjonen om hvor effektivt det kan være å benytte seg av Mac OS X til skolen.</p> </div> I use this code: <?php $url = $_GET['url']; $htmlCode = file_get_contents(trim($url)); $div_pattern = '#(<div id="maincontent" class="left">)(.*)(<\/p>\n<\/div>)#s'; preg_match($div_pattern, $htmlCode, $matches); var_dump($matches); foreach($matches as $key=>$value) { } ?> It should stop at the div at the end of the HTML code. But the problem is that it just continues to the last div on the page. How can i fix that?

January 30, 2010

Insert to MySQL inside foreach

cordoprod posted a topic in PHP Coding Help

Hey. I am parsing some html, and putting inside an html db. I need to create a randomized integer to be a unique identifier, and i chose just to use time(). But I foreach the results, and inside here i do the inserting. But when i call time() in foreach, it'll be the same for every entry parsed. foreach code: foreach($linjerMatches[1] as $k=>$v) { echo $fylke." <b>".$linjerMatches[1][$k]."</b> ".time()."<br />"; //$sql = "INSERT INTO ruteinfo_linjer(fylke,identifier,linje) VALUES('".$fylke."', '".time()."', '".$linjerMatches[1][$k]."')"; //$result = mysql_query($sql, $linkID) or die("Error"); } time() will output the same for every row. How can i fix that?

preg_match_all help

cordoprod replied to cordoprod's topic in Regex Help

Excellent! Finally got it working Thanks so much.

preg_match_all help

cordoprod replied to cordoprod's topic in Regex Help

I want output like this: http://www.cordoproduction.com/x.png As you can see Halden is one of the tabs at that page. The tabs are javascript driven so all the content in each tab is in one HTML source. I want to seperate the content in the tabs because when I try to parse the content in the tabs, I get all the content from all tabs if i parse from the beginning of the page to the end. Thats why i need to start at <div id="Tab0x"> and end it at </div>

preg_match_all help

cordoprod replied to cordoprod's topic in Regex Help

I tried it, but unfortunatly empty arrays. I tried to set the first to <div.*> and also tried <div\sid="Tab01">, but still no luck.

preg_match_all help

cordoprod replied to cordoprod's topic in Regex Help

Can you please show me how to do this in my code so I can understand it correctly?

preg_match_all help

cordoprod posted a topic in Regex Help

Hi, Im trying to parse some HTML code. It's a whole webpage, and I need to start parsing it on a tag, and end the parsing at the end of the tag. This is an example: <div id="1"> // start parse here blah blah blah blah </div> // end parsing here Here is my regex: $tabeller = preg_match_all('/^<div id="Tab01">(.*\d\d\-\d\d\d\.htm">)(\d\d\-\d\d\d)(.*px">)(.*)(<\/td>.*)<\/div>$/mu', $htmlCode, $matches); die(var_dump($matches)); Output is just empty arrays when i try that code. And here is the site I'm trying to do it with: http://www.rutebok.no/NRIIISStaticTables/Tables/ruter/index/Avd_01.htm

Parse bus plan, please help.

cordoprod replied to cordoprod's topic in PHP Coding Help

bump

Parse bus plan, please help.

cordoprod replied to cordoprod's topic in PHP Coding Help

bump

Parse bus plan, please help.

cordoprod posted a topic in PHP Coding Help

Hey, i'm trying to parse a web page containing bus plans. It is this site: http://www.rutebok.no/NRIIISStaticTables/Tables/ruter/index/Avd_01.htm What I want to parse is all the rutenr and rutenavn. The plans are divided into section as you can see there are tabs on the top of the page. This does not make it easy for me, as I have to separate them. Here is what I managed to pull out: http://ruteinfo.cordoproduction.com/hent_ruter.php?avdeling=01&fylke=Ostfold Here is my code: $htmlCode = file_get_contents("http://www.rutebok.no/NRIIISStaticTables/Tables/ruter/index/Avd_".$avdeling.".htm"); $linjer = preg_match_all('/\["([\pL ]+)",/', $htmlCode, $linjerMatches); //$type = preg_match_all('/(.*\/images\/)(.*)(-s.gif)/', $htmlCode, $typeMatches); $tabeller = preg_match_all('/(.*\d\d\-\d\d\d\.htm">)(\d\d\-\d\d\d)(.*px">)(.*)(<\/td>.*)/', $htmlCode, $matches); foreach($linjerMatches[1] as $k=>$v) { echo '<h2>' . $linjerMatches[1][$k] . '</h2>'; foreach($matches[2] as $key=>$value) { if(mysql_num_rows($checkResult) == 0) { echo '<b>' . $value . '</b> ' . $matches[4][$key] . ' <b>' . $linjerMatches[1][$k] . '</b><br/>'; //$sql = "INSERT INTO ruteinfo_ruter(fylke,linje,bussnummer,bussnavn) VALUES('".$fylke."', '".$linjerMatches[1][$k]."','".$value."', '".$matches[4][$key]."')"; //$result = mysql_query($sql, $linkID) or die("Error"); } } } My problem is that in each section (tab) all the results appear, from every tab. Like in the tab called Halden the results from Sarpsborg appears as well.

Very simple Regex help

cordoprod replied to cordoprod's topic in Regex Help

Ok, i figured it out by just changing some small stuff. $linjer = preg_match_all('/\["([\pL ]+)",/', $htmlCode, $linjerMatches); Thanks a lot

Very simple Regex help

cordoprod replied to cordoprod's topic in Regex Help

Thanks for the reply but unfortunatly it didn't work out. I outputs an array like this: array(2) { [0]=> array(0) { } [1]=> array(0) { } } But be aware that the HTML i posted, was not the whole parse html. It was a big website, so the first line in the HTML code wasn't the first line in the web page code.

Very simple Regex help

cordoprod posted a topic in Regex Help

Hey, I'm actually starting to understand some simple regex. But there is one thing i'm struggling with. I know how to parse simple tags and stuff. But this is the code i need to parse: var bmenuItems = [ ["Halden", "Tab01", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_n_icon.gif", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_s_icon.gif", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_s_icon.gif", "Oversikt for region 01 Halden", "0"], ["Sarpsborg", "Tab02", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_n_icon.gif", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_s_icon.gif", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_s_icon.gif", "Oversikt for region 02 Sarpsborg", "0"], ["Fredrikstad", "Tab03", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_n_icon.gif", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_s_icon.gif", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_s_icon.gif", "Oversikt for region 03 Fredrikstad", "0"], ["Moss", "Tab04", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_n_icon.gif", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_s_icon.gif", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_s_icon.gif", "Oversikt for region 04 Moss", "0"], ["Indre Østfold", "Tab05", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_n_icon.gif", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_s_icon.gif", "http://hafas.websrv05.reiseinfo.no/hafas-res/dev/nri/img/vs_rutebok/style02_2_s_icon.gif", "Oversikt for region 05 Indre Østfold", "0"], ]; I need to get out all the ones with inside ["xxxxx", For example: Halden, Sarpsborg I've tried this: $linjer = preg_match_all('/([")(\w.*)(",.*)/', $htmlCode, $linjerMatches); But when i do var_dump the output is NULL.

Please help, trying to parse TV-listing site

cordoprod posted a topic in PHP Coding Help

Hello, I am building an iPhone app and therefore I need to parse the data from a webpage. The site I need to parse is this: http://www.nrk.no/tv/ Some of the HTML is like this: <div class="program-content"> <div class="wraparound"> <div class="col"> <h2>NRK1</h2> <ul class="channel_nrk1"> <li> <em>06:30</em> <strong>Morgennytt</strong> <div> <p>Nyheter og aktualiteter fra NRKs nyhetsredaksjon.<span><a href="http://www.nrk.no/programmer/sider/morgennytt/" title="Gå til hjemmesiden til Morgennytt">Hjemmeside</a><a class="ical" href="http://nrk.no/tvepg/iCal.aspx?from=20100106T053000Z&to=20100106T090000Z&channel=NRK1&program=Morgennytt&desc=Nyheter og aktualiteter fra NRKs nyhetsredaksjon." title="Få påmindelse i kalender."><img src="http://fil.nrk.no/gull/programoversikt/img/ics.GIF" alt="Få påmindelse i kalender." /></a></span></p> </div> </li> <li> <em>10:00</em> <strong>NRK nyheter</strong> <div> <p>Siste nytt fra nyhetsredaksjonen.<span><a href="http://www.nrk.no/programmer/sider/nrk_nyheter/" title="Gå til hjemmesiden til NRK Nyheter">Hjemmeside</a><a class="ical" href="http://nrk.no/tvepg/iCal.aspx?from=20100106T090000Z&to=20100106T090400Z&channel=NRK1&program=NRK nyheter&desc=Siste nytt fra nyhetsredaksjonen." title="Få påmindelse i kalender."><img src="http://fil.nrk.no/gull/programoversikt/img/ics.GIF" alt="Få påmindelse i kalender." /></a></span></p> </div> </li> <li> <em>10:05</em> <strong>Aktuelt</strong> <div> <p>Direkte fra studio om politikk, kultur og samfunnsliv.<span><a href="mailto://aktuelt@nrk.no" title="Send e-post til Aktuelt">E-post</a><a href="http://www.nrk.no/programmer/sider/aktuelt/" title="Gå til hjemmesiden til Aktuelt">Hjemmeside</a><a class="ical" href="http://nrk.no/tvepg/iCal.aspx?from=20100106T090500Z&to=20100106T094900Z&channel=NRK1&program=Aktuelt&desc=AKTUELT. 
Nytt magasinprogram med Anne Lindmo og Erik Wold.
Kultur og politikk. 
Samtaler og deba..." title="Få påmindelse i kalender."><img src="http://fil.nrk.no/gull/programoversikt/img/ics.GIF" alt="Få påmindelse i kalender." /></a></span></p> </div> </li> <li> <em>10:50</em> <strong>Ut i naturen: Naturfilosofen</strong> <div> <p>Kan den vesle bekken bakom huset heime vere vegen til det store i naturen? Per Ingvar Haukeland stiller seg mange uvanlege spørsmål om natur. Vi følgjer tankane til naturfilosofen langs bekken under Bryggefjell.<span><a href="mailto://utinaturen@nrk.no" title="Send e-post til Ut i naturen">E-post</a><a href="http://www.nrk.no/programmer/sider/ut_i_naturen/" title="Gå til hjemmesiden til Ut i naturen">Hjemmeside</a><a href="http://podkast.nrk.no/program/ut_i_naturen.rss" title="Poddkastadresse">Podkast</a><a class="ical" href="http://nrk.no/tvepg/iCal.aspx?from=20100106T095000Z&to=20100106T101917Z&channel=NRK1&program=Ut i naturen: Naturfilosofen&desc=Opplev norsk natur med Ut i naturen. Se og les mer på nrknatur.no!" title="Få påmindelse i kalender."><img src="http://fil.nrk.no/gull/programoversikt/img/ics.GIF" alt="Få påmindelse i kalender." /></a></span></p> </div> </li> What I want to parse is all the times in <em> tags, all the strings like Morgennytt inn <strong> tags. But here is a big problem. There are 4 channels in the HTML code. They are set up in 4 colons. <div class="col"> <h2>NRK1</h2> <ul class="channel_nrk1"> In the <h2> tag it is the name of the channel. But how can I insert all the em and strong tag contents, and know which channel they're connected to? Thanks

January 6, 2010

Parse bus plan

cordoprod replied to cordoprod's topic in PHP Coding Help

cags: i remember you did. But i tried outputting the way you explained to me, but still no luck. Do you think the regex is correct?

Sign In

Posts

Joined

Last visited

Profile Information

cordoprod's Achievements

Regular Member (3/5)

Reputation

Parsing HTML, starts at right place, but stops at wrong place

Insert to MySQL inside foreach

preg_match_all help

preg_match_all help

preg_match_all help

preg_match_all help

preg_match_all help

Parse bus plan, please help.

Parse bus plan, please help.

Parse bus plan, please help.

Very simple Regex help

Very simple Regex help

Very simple Regex help

Please help, trying to parse TV-listing site

Parse bus plan

Browse

Activity

Important Information