Sucrose Posted September 1, 2014 Share Posted September 1, 2014 I'm using cURL to crawl and scrape data from a website. This website contains tables with rows of data. When I send a cURL POST for the underlying data at a specific row(A), it will return the expected data. But when I move to the second row(B), the data returns blank or specifically, a tons of spaces (or nbsp's.) When I access the cURL's POST location by browser, I can see (B)'s data. The only difference in the 2 POST's are location ID's for the data. I don't think it's a problem with JavaScript as I can successfully return data from row (A) as I mentioned. Website I'm trying to crawl: https://mycpa.cpa.state.tx.us/up/Search.jsp Working POST URL(A): https://mycpa.cpa.state.tx.us/up/searchresults.do?d-49216-p=&d-49216-s=&how=&last=bales&other=&d-49216-o=&zip=&_chk=74170700611986R2ZZZZ26&which=View+Details Non-working POST URL(B): https://mycpa.cpa.state.tx.us/up/searchresults.do?d-49216-p=&d-49216-s=&how=&last=bales&other=&d-49216-o=&zip=&_chk=74600015611995R1AC081084&which=View+Details Interestingly, you can combine the data location ID's to show more than 1 set of data per page. When trying this method, the first set of data(A) is displayed and the second(B) is shown as spaces (or nbsp.) Combined POST URL: https://mycpa.cpa.state.tx.us/up/searchresults.do?d-49216-p=&d-49216-s=&how=&last=bales&other=&d-49216-o=&zip=&_chk=74170700611986R2ZZZZ26&_chk=74600015611995R1AC081084&which=View+Details Quote Link to comment Share on other sites More sharing options...
QuickOldCar Posted September 2, 2014 Share Posted September 2, 2014 It's most likely a js issue and hashtag/fragments are being ignored. One should have approval from the website owner to scrape their data. I read their link policy which is pretty strict, i doubt they want people scraping information. http://www.window.state.tx.us/linkpolicy.html Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.