aviju Posted April 7, 2014 Share Posted April 7, 2014 Hello ! I'm new on this board and I need your help ! I explain you my problem. I'd like to collect the content of around 1000 URL in a textfile (I use the wget function in a bash). And then I want to parse this textfile in order to pick one type of content up in a csv file. 1) So my bash is this one : file=/home/julien/tests/file.txt for i in $(cat $file) do wget $i -O ->> songs_t.txt; done It works perfectly and the textfile songs_t.txt is well created. That textfile contains the content of the 1000 URLs. 2) Then I make a php script to parse songs_t.txt. I only want to get concert setlists (the setlist is only a part of a the content of each URL). So my approach is to remove tags such as 'a', 'h4', 'Title' and so on and save the rest in a csv file called 'SONGS.csv' An example of a URL can be seen here : http://members.tripod.com/~fun_fun_fun/8-17-63.html My part of the php script dealing with the parsing is this one : $html = file_get_html('songs_t.txt'); foreach ($html->find('title, script, div, center, style, img, noscript, h4, a') as $es) $es->outertext = 'title, script, div, center, style, img, noscript, h4, a'; $f = fopen('SONGS.csv', "w"); fwrite ($f, $html); fclose($f); The script works for the 35 first URL (I nearly only get the setlists) but as soon as the script has to deal with more than 35 URL, I have the following error message : Call to a member function find() on a non-object in /home/julien/tests/boys2.php on line 23. That line 23 corresponds to : foreach ($html->find('title, script, div, center, style, img, noscript, h4, a') as $es) 3) In order to test if my html object is valid, I use that code : html = file_get_html('songs_t.txt'); if (!is_object($html)){ echo "invalid object"; } And the result is "invalid object". This test is made in a textfile composed of the content of 50 URL. But when I apply that test on textfile composed of 30 URL, I have no error ! So how can I do to parse my HTML even if it's not a entire valid object ? Could you help me please ? Thanks ! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.