Jump to content

Read Array From Text File Screen Scrape


twittoris

Recommended Posts

I have created a small scraper which saves content to html and links from the scrape in an array within a text file. I want to have the script run for each link that is in the text document.

 

the text document looks like this:

 

Array

(

    [0] =>  id="cnn_switchEdition_intl" href="http://edition.cnn.com/?cnn_shwEDDH=1" title="CNN INTERNATIONAL"

    [1] =>  href="javascript:void(0)" onclick="showOverlay('profile_signup_overlay');return false;" title=""

    [2] =>  href="javascript:void(0)" onclick="showOverlay('profile_signin_overlay');return false;" title=""

    [3] =>  id="nav-home" class="nav-media no-border nav-on" href="/" title="Breaking News, U.S., World Weather Entertainment and Video News from CNN.com"

    [4] =>  id="nav-video" class="nav-media no-border" href="/video/" title="Video Breaking News Videos from CNN.com"

    [5] =>  id="nav-newspulse" class="nav-media" href="http://newspulse.cnn.com/" title="NewsPulse from CNN.com"

    [6] =>  id="nav-us" href="/US/" title="U.S. News Headlines Stories and Video from CNN.com"

    [7] =>  id="nav-world" href="/WORLD/" title="World News International Headlines Stories and Video from CNN.com"

Link to comment
Share on other sites

If you're planning on doing a load of link scraping, I'd suggest a database rather than a file. However, you still have to remove all those excess coding (such as the href="javascript...") and just save the URLs themselves. After that, a loop would be helpful to go back in and crawl the rest of those pages.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.