Jump to content

scraping the fully rendered page, not the html ?


jjk2

Recommended Posts

as you know, one can use REGEX to find specific data from a page.

 

however, some websites, use Javascripts to hide their data.

 

So the page you see in your browser, vs. the page in html format is different.

 

What is a possible solution? Is there any way to translate the fully rendered page, onto html, and then scraping it ?

 

 

Another difficulty is scraping flash. is it even possible to scrape texts on flash ? I do not see how its possible, unless the .swf file is downloaded, decompiled, and match for regex.....

Link to comment
Share on other sites

Can you not just adapt you regex to match the correct terms?

 

is it even possible to scrape texts on flash ?

And I wouldn't have thought so. And good luck writing a script to download and decompile a flash file lol. Probably illegal anyway.

Link to comment
Share on other sites

well i need a way to read the javascript with the html , and push the output to an array.

 

i do not know where i can find such tool or code, that will read the javascript + html, and completely push the output to an array, which then I can scrape.

 

as for flash decompiling, what makes you think its illegal ? its a simple way to extract data from otherwise difficult flash. take your armchair law & enforcement elsewhere kiddie

Link to comment
Share on other sites

You're facing the same problems as the big search engines like yahoo and google.

 

Google wipped something up to read a little bit of text in flash, but it's near useless atm. Don't think there's much you can do about javascript either.. it's why pages that make heavy use of javascript and flash often don't rank very well on the search results.

It's an accessibility issue that the maker of the page should avoid by never relying on javascript to be present, as for flash.. not really machine accessible either, if you figure out a good way I'm sure that all the writers of search engines as well the writers of screen readers etc would like to have a talk with ya.

 

So yes, that was basically a "just give up, it's too hard".

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.