freeloader Posted February 26, 2012 Share Posted February 26, 2012 Hi guys, I'm parsing a bunch of html sources, some of them (a minority) contain base64 encoded URLs. At least 20-30 URLs per page. Some of the information I need is taken from the URLs. I'm looking for a way to parse these URLs without losing too much time (20-30 URLs per page and I need to parse about 10k pages daily). Should I regex each source individually for base64 encoded URLs and then decode them each time and isn't that going to take a lot of time/resources (especially considering it's only a minority that have the base64 URLs in them)? Or is there a better way to do it? Code snippets are absolutely welcome! Thank you in advance Quote Link to comment https://forums.phpfreaks.com/topic/257801-base64-encoded-urls/ Share on other sites More sharing options...
silkfire Posted February 26, 2012 Share Posted February 26, 2012 How do you know it's base64 url in the first place? Quote Link to comment https://forums.phpfreaks.com/topic/257801-base64-encoded-urls/#findComment-1321415 Share on other sites More sharing options...
freeloader Posted February 26, 2012 Author Share Posted February 26, 2012 Because these are pre-parsed pages and in some of them, the URLs have been parsed through a base64 encoder. Giving an output like this: <link href="Oigregregregrer/gergeggege==" media="screen" rel="stylesheet" type="text/css" /> (The above is not an actual link.) Quote Link to comment https://forums.phpfreaks.com/topic/257801-base64-encoded-urls/#findComment-1321438 Share on other sites More sharing options...
silkfire Posted February 26, 2012 Share Posted February 26, 2012 I was more wondering if you could quickly filter out the base64 url strings to separate them from the rest and process the non-base64 first then the base64. Quote Link to comment https://forums.phpfreaks.com/topic/257801-base64-encoded-urls/#findComment-1321443 Share on other sites More sharing options...
freeloader Posted February 26, 2012 Author Share Posted February 26, 2012 Fixed it using preg_replace_callback() on the URL's, regex'ing them and base64 decoding them. Quote Link to comment https://forums.phpfreaks.com/topic/257801-base64-encoded-urls/#findComment-1321525 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.