wilbur_wc

Members

View Profile See their activity

Posts
11
Joined
November 17, 2009
Last visited
Never

Content Type

All Activity

Profiles

Forums

Topics
Posts

Everything posted by wilbur_wc

regex, strip spaces between numbers...

wilbur_wc replied to wilbur_wc's topic in Regex Help

awesome... thanks
- April 17, 2011
- 2 replies
regex, strip spaces between numbers...

wilbur_wc posted a topic in Regex Help

two part question... PART 1... part one is really simple, but for some reason it's just not happening for me... i need to find and replace all numbers which have a space between them... ex: 9 0 8would become 908 & 6 9 would become 69 what're the proper preg_replace attributes i'd need to achieve this? PART 2... this one is a little tricky... i also have some numbers like 1,000 that are showing up as 1 ,000 or 1, 000... so i need to get rid of the unwanted space, but there's a catch... if the line which contains the number we're potentially replacing also contains the string respective, then i need leave the space intact. the reason being, some number/comma combinations represent two separate values as opposed to a thousand delimiter (this case can be identified by the line containing somewhere the string respective). thanks so much
- April 16, 2011
- 2 replies
pdflib alternative...

wilbur_wc posted a topic in PHP Coding Help

i need to read a pdf and convert it to raw text (with line breaks - but that's as fancy as i need it)... pdflib does way more than i need, and it's super expensive, and i don't really see the need to install an app on my server just to read a simple pdf... there must be an alternative out there, but i can't seem to find it... and when all of php.net seems to reference pdflib, i start to get a little discouraged... it seems like a simple pdf reader class/package would be open source somewhere... any suggestions? thanks
- September 7, 2010
loading a secured link...

wilbur_wc replied to wilbur_wc's topic in PHP Coding Help

got it working... needed to make one call, set the cookie and then attempt the download... i'm sure there's some redundancy in there, but it works... more info: http://www.php.net/manual/en/function.curl-setopt.php $agent = $_SERVER[ 'HTTP_USER_AGENT' ]; $ref_url = "http://somesite.com"; // in case they don't allow automated logins $data = "handle=username&password=pass"; // syntax pulled from firebug's post $fp = fopen( "cookie.txt", "w" ); fclose( $fp ); $curl = curl_init(); curl_setopt( $curl, CURLOPT_URL, "http://somesite.com/login.php" ); curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt( $curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC ); curl_setopt( $curl, CURLOPT_USERPWD, "username:pass" ); curl_setopt( $curl, CURLOPT_USERAGENT, $agent ); curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true ); curl_setopt( $curl, CURLOPT_COOKIEFILE, "cookie.txt" ); curl_setopt( $curl, CURLOPT_COOKIEJAR, "cookie.txt" ); curl_setopt( $curl, CURLOPT_SSLVERSION, 3) ; curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, 0 ); curl_setopt( $curl, CURLOPT_SSL_VERIFYHOST, 0 ); curl_setopt( $curl, CURLOPT_HEADER, true ); curl_setopt( $curl, CURLOPT_POST, true ); curl_setopt( $curl, CURLOPT_TIMEOUT, 40 ); curl_setopt( $curl, CURLOPT_REFERER, $ref_url ); curl_setopt( $curl, CURLOPT_POSTFIELDS, $data ); ob_start(); $result = curl_exec( $curl ); if( $error = curl_error( $curl ) ) echo( "</br><--- cURL ERROR:" . $error . " --->" ); ob_end_clean(); curl_close( $curl ); //echo( "</br><--- curl:" . $result . " --->" ); $curl = curl_init(); curl_setopt( $curl, CURLOPT_URL, $this->pdfURL ); curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 ); curl_setopt( $curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC ); curl_setopt( $curl, CURLOPT_USERPWD, "username:pass" ); curl_setopt( $curl, CURLOPT_USERAGENT, $agent ); curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true ); curl_setopt( $curl, CURLOPT_COOKIEFILE, "cookie.txt" ); curl_setopt( $curl, CURLOPT_COOKIEJAR, "cookie.txt" ); curl_setopt( $curl, CURLOPT_SSLVERSION, 3) ; curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, 0 ); curl_setopt( $curl, CURLOPT_SSL_VERIFYHOST, 0 ); curl_setopt( $curl, CURLOPT_HEADER, true ); curl_setopt( $curl, CURLOPT_POST, true ); curl_setopt( $curl, CURLOPT_TIMEOUT, 40 ); curl_setopt( $curl, CURLOPT_REFERER, $ref_url ); curl_setopt( $curl, CURLOPT_POSTFIELDS, $data ); ob_start(); $result = curl_exec( $curl ); if( $error = curl_error( $curl ) ) echo( "</br><--- cURL ERROR:" . $error . " --->" ); ob_end_clean(); curl_close( $curl ); echo( "</br><--- curl:" . $result . " --->" );
- August 31, 2010
- 7 replies
loading a secured link...

wilbur_wc replied to wilbur_wc's topic in PHP Coding Help

oh, damn... now that i look at it, i see that it's redirecting in the same manner it does when doing it all manually through the browser... EX: if you had tried the download before logging in, you'd be prompted for your login, but once login is accepted it sends you to a different entry page (the uri listed in the 302 status) from which you have to re-navigate to the pdf download. is there a way to establish a connection (the way a browser does once you've logged in) and then attempt the download? or simply re-attempt the download without losing your logged in status? thanks
- August 31, 2010
- 7 replies
loading a secured link...

wilbur_wc replied to wilbur_wc's topic in PHP Coding Help

i added that as well as a couple other options... curl_setopt( $curl, CURLOPT_HEADER, true ); curl_setopt( $curl, CURLOPT_POST, true); curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true); and now curl_exec returns: HTTP/1.1 302 Found Date: Tue, 31 Aug 2010 05:39:29 GMT Server: Apache X-Powered-By: PHP/5.2.6 Location: /archive/2010/08/page/0001 Cache-Control: max-age=14400 Expires: Tue, 31 Aug 2010 09:39:29 GMT Content-Length: 0 Content-Type: text/html does this indicate that it was found the pdf? the content length 0 concerns me. how do i write the pdf file contents to a local variable? the pdf could be up to 500kb... won't i need to wait until it's been loaded with some sort of oncomplete callback? thanks
- August 31, 2010
- 7 replies
loading a secured link...

wilbur_wc replied to wilbur_wc's topic in PHP Coding Help

hmn... i've been playing around with curl stuff, and it looks like my login is working fine... but i'm still at a loss as to how i can then load a pdf file (large file) into a variable and know when the pdf is ready to be read... this is what i'm doing, and evidently the curl_exec returns true... $curl = curl_init(); curl_setopt( $curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC ); curl_setopt( $curl, CURLOPT_USERPWD, "user:pass" ); curl_setopt( $curl, CURLOPT_URL, $this->pdfURL); curl_exec( $curl ); how do i write the pdf file to a variable (fopen/fread aren't working)? how to i track the progress of the pdf download/write? thanks
- August 31, 2010
- 7 replies
loading a secured link...

wilbur_wc posted a topic in PHP Coding Help

i'm new to php and don't really know where to start here... i'm automating a system that scrapes a site for a particular pdf download link (got this far), downloads it, parses the pdf, etc... problem is that you must be logged in (while viewing in the browser) in order to access the pdf... if you're not logged in, you are redirected and the download fails... i do have a proper login... how would i go about utilizing my login in order automate the pdf download? is there a way to send the login with the url request? or open a stream, login, and retry the download? thanks
- August 31, 2010
- 7 replies
looking for a div query regex...

wilbur_wc replied to wilbur_wc's topic in Regex Help

thanks PFMaBiSmAd... that does the job, and then some... great class and i'm already up and running with it. however, i'm still curious if there's a regex solution. thanks
- August 31, 2010
- 3 replies
looking for a div query regex...

wilbur_wc replied to wilbur_wc's topic in Regex Help

and and one note... the number of divs/tags within the main div is not constant.
- August 30, 2010
- 3 replies
looking for a div query regex...

wilbur_wc posted a topic in Regex Help

i'm trying to parse an html page and retrieve one div (or better yet one specific item from within the div). here's the div that i'm looking for... <div class="content-item "> <div class="type">XXX</div> <div class="title"><a href="xxxxxxx">THIS IS MY SEARCH FLAG</a></div> <i></i> <div class="tags"></div> <a title="View the PDF version of this article" href="this-is-the-url-i-want-to-pull.pdf" class="pdf-link"><img alt="PDF" src="xxxxx" class="xxxx">PDF</a> <a title="xxxx" href="xxx" class="xx"><img alt="xxxx" src="xxxxx" class="xxxxx">XXXXX</a> </div> you can see that my query constant (the only thing i can constantly depend on existing in the same format) is a string represented in the html as 'THIS IS MY SEARCH FLAG' and the item i ultimately want to return is a url represented by 'this-is-the-url-i-want-to-pull.pdf' i'm new to php and regex is always something of trial and error for me anyhow... any help would be greatly appreciated. thanks
- August 30, 2010
- 3 replies

Sign In

wilbur_wc

Posts

Joined

Last visited

Content Type

Profiles

Forums

Everything posted by wilbur_wc

regex, strip spaces between numbers...

regex, strip spaces between numbers...

pdflib alternative...

loading a secured link...

loading a secured link...

loading a secured link...

loading a secured link...

loading a secured link...

looking for a div query regex...

looking for a div query regex...

looking for a div query regex...

Browse

Activity

Important Information