Jump to content

Crawl internet through SOCKS proxy


implitech

Recommended Posts

So, I have this web crawler that I want to use to index .onion websites through the TOR network.

I amOk, here's what I need. I have a PHP based web crawler. It is accessible here: http://rz7ocnxxu7ka6ncv.onion/ Now, my problem is that my spider that actually crawls pages needs to do so on a SOCKS port 9050. The thing is, I have to tunnel its connection through TOR so that It can resolve .onion domains, which is what I'm indexing. (Only ending in .onion.) I call this script from the command line using php crawl.php, and I add the appropriate parameters to crawl the page. Here is what I think: Is there any way to force it to use TOR? OR can i force my ENTIRE MACHINE to tunnel things through tor, and how? (Like forcing all traffic through 127.0.0.1:9050) perhaps if i set up global proxy settings, php would respect them?

 

If any of my solutions work, how would I do it? (Step by step instructions please, I am a noob.)

 

I just want to crate my own TOR search engine. (Don't recommend my p2p search engines- it's not what I want for this- I know they exist, I did my homework.) Here is the crawler source if you are interested to take a look at: Perhaps someone with a kind heart can modify it to use 127.0.0.1:9050 for all crawling requests? spider.php: http://pastebin.com/kscGJCc5

spiderfuncs.php: http://pastebin.com/m5y54RUh

 

PLEASE someone help me! I am desperate. :(

 

Link to comment
https://forums.phpfreaks.com/topic/256870-crawl-internet-through-socks-proxy/
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.