dil_bert Posted December 8, 2017 Share Posted December 8, 2017 i try to rework a python script form python 2 to python 3 -welll the print function cannot be used any more import urllib from bs4 import BeautifulSoup import urlparse import mechanize # Set the startingpoint for the spider and initialize # the a mechanize browser object url = "http://sparkbrowser.com" br = mechanize.Browser() # create lists for the urls in que and visited urls urls = [url] visited = [url] # Since the amount of urls in the list is dynamic # we just let the spider go until some last url didn't # have new ones on the webpage while len(urls)>0: try: br.open(urls[0]) urls.pop(0) for link in br.links(): newurl = urlparse.urljoin(link.base_url,link.url) #print newurl if newurl not in visited and url in newurl: visited.append(newurl) urls.append(newurl) print newurl except: print "error" urls.pop(0) print visited Link to comment Share on other sites More sharing options...
dil_bert Posted December 9, 2017 Author Share Posted December 9, 2017 print "error"gets print ("error")see also 2to3https://docs.python.org/2/library/2to3.html Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.