Jump to content

Recommended Posts

I'm using cURL to crawl and scrape data from a website. This website contains tables with rows of data. When I send a cURL POST for the underlying data at a specific row(A), it will return the expected data. But when I move to the second row(B), the data returns blank or specifically, a tons of spaces (or nbsp's.) When I access the cURL's POST location by browser, I can see (B)'s data. The only difference in the 2 POST's are location ID's for the data. I don't think it's a problem with JavaScript as I can successfully return data from row (A) as I mentioned.

 

Website I'm trying to crawl: https://mycpa.cpa.state.tx.us/up/Search.jsp

 



 

Interestingly, you can combine the data location ID's to show more than 1 set of data per page. When trying this method, the first set of data(A) is displayed and the second(B) is shown as spaces (or nbsp.)


Link to comment
https://forums.phpfreaks.com/topic/290772-curl-crawling-data/
Share on other sites

It's most likely a js issue and hashtag/fragments are being ignored.

 

One should have approval from the website owner to scrape their data.

 

I read their link policy which is pretty strict, i doubt they want people scraping information.

http://www.window.state.tx.us/linkpolicy.html

Link to comment
https://forums.phpfreaks.com/topic/290772-curl-crawling-data/#findComment-1489584
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.