Jump to content

cURL Crawling Data


Sucrose

Recommended Posts

I'm using cURL to crawl and scrape data from a website. This website contains tables with rows of data. When I send a cURL POST for the underlying data at a specific row(A), it will return the expected data. But when I move to the second row(B), the data returns blank or specifically, a tons of spaces (or nbsp's.) When I access the cURL's POST location by browser, I can see (B)'s data. The only difference in the 2 POST's are location ID's for the data. I don't think it's a problem with JavaScript as I can successfully return data from row (A) as I mentioned.

 

Website I'm trying to crawl: https://mycpa.cpa.state.tx.us/up/Search.jsp

 



 

Interestingly, you can combine the data location ID's to show more than 1 set of data per page. When trying this method, the first set of data(A) is displayed and the second(B) is shown as spaces (or nbsp.)


Link to comment
https://forums.phpfreaks.com/topic/290772-curl-crawling-data/
Share on other sites

It's most likely a js issue and hashtag/fragments are being ignored.

 

One should have approval from the website owner to scrape their data.

 

I read their link policy which is pretty strict, i doubt they want people scraping information.

http://www.window.state.tx.us/linkpolicy.html

Link to comment
https://forums.phpfreaks.com/topic/290772-curl-crawling-data/#findComment-1489584
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.