implitech Posted February 11, 2012 Share Posted February 11, 2012 So, I have this web crawler that I want to use to index .onion websites through the TOR network. I amOk, here's what I need. I have a PHP based web crawler. It is accessible here: http://rz7ocnxxu7ka6ncv.onion/ Now, my problem is that my spider that actually crawls pages needs to do so on a SOCKS port 9050. The thing is, I have to tunnel its connection through TOR so that It can resolve .onion domains, which is what I'm indexing. (Only ending in .onion.) I call this script from the command line using php crawl.php, and I add the appropriate parameters to crawl the page. Here is what I think: Is there any way to force it to use TOR? OR can i force my ENTIRE MACHINE to tunnel things through tor, and how? (Like forcing all traffic through 127.0.0.1:9050) perhaps if i set up global proxy settings, php would respect them? If any of my solutions work, how would I do it? (Step by step instructions please, I am a noob.) I just want to crate my own TOR search engine. (Don't recommend my p2p search engines- it's not what I want for this- I know they exist, I did my homework.) Here is the crawler source if you are interested to take a look at: Perhaps someone with a kind heart can modify it to use 127.0.0.1:9050 for all crawling requests? spider.php: http://pastebin.com/kscGJCc5 spiderfuncs.php: http://pastebin.com/m5y54RUh PLEASE someone help me! I am desperate. Quote Link to comment https://forums.phpfreaks.com/topic/256870-crawl-internet-through-socks-proxy/ Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.