Returning files via an API

NotionCommotion · December 31, 2017

I am building an API which typically accepts and returns JSON. I am also building the webserver which consumes the API json. Both the webserver and api are using Slim. The browser does not have the API's password, and authenticates with the webserver, and the webserver needs to relay the password to the API. All works as expected.

Browser -----> WebServer   ----cURL---> API Server gets results and returns to WebServer which returns to browser.

I now have a need for the browser to be able to download a non-text file.

The browser http request to the webserver is trivial.

I haven't totally vetted how the API will respond to the request, but expect it will be pretty close to the following. It doesn't look 100% the "Slim" way by overwriting the response, but hopefully it is good enought.

//Will not be accessed unless header authentication key as verified.

$app->get('/downloads/{id}', function(Request $request, Response $response, $args) {
    $file = $this->get('FileManager')->getFileName($args['id']);
    $response = $request->withHeader('Content-Description', 'File Transfer')
    ->withHeader('Content-Type', 'application/octet-stream')
    ->withHeader('Content-Disposition', 'attachment;filename="'.basename($file).'"')
    ->withHeader('Expires', '0')
    ->withHeader('Cache-Control', 'must-revalidate')
    ->withHeader('Pragma', 'public')
    ->withHeader('Content-Length', filesize($file));
    readfile($file);
    return $response;
});

The part that has got me stumped is how the webserver will need to forward the request to the api and then return the results to the browser client.

What is the best option to forward the request? cURL (I am leaning against this one), file_get_content using stream context so I can send the security token header, exec() with wget, etc?. This part actually seems pretty straight forward.

After getting the response, I will somehow need to return the actual content plus the headers to the browser. How do you see this looking?

If I want to keep it Slim looking on the webserver, I am thinking something like the following, but it doesn't necessary have to be.

$app->get(_VER_.'/guids/{guid}/logs/{type}/{name}', function (Request $request, Response $response) {
    $rsp=someHowGetContentWithHeaders();
    $headers = $response->getHeaders();
    foreach ($headers as $name => $values) {
        echo $name . ": " . implode(", ", $values);
    }
    return $response->withHeaders($rsp); //Unfortunately, I don't think such a "withHeaders" method exists 
});

Thank you

requinix · December 31, 2017

Think of it like you're building a second API.

The API password is a part of the authentication process. You could build the API such that password authentication still exists, but also attempts a more automatic method - like an IP address check that allows all LAN clients.

If you did that then you could have the webserver's API act mostly like a proxy, forwarding requests and responses.

If you don't proxy everything then I would not proxy anything. So make the webserver's download API do the cURL to get the response headers and body, forward the relevant headers, and output the response.

NotionCommotion · December 31, 2017

Yes, I agree there are two APIs. I am going to call them the "website" API and the "main" API.

The website API will be responsible to authenticate users passwords which in turn dictates whether the request is passed to the main API using the following:

$app->add(function(Request $request, Response $response, $next) {
    if(isset($_SESSION['auth'])) {
        return $next($request, $response);
    }
    else {
        //user is not logged on.  Take steps to display logon form, etc...
    }
});
$app->run();

The website will sometime respond with its own content, other times make one or more requests to the main API and package the content often using a twig view, and other times directly proxies the request to the main API (which is very common for many of the AJAX requests). A couple typical such website AJAX endpoints might look like the following.

$app->post(_VER_.'/accounts', function (Request $request, Response $response) {
    $rsp=$this->get('base')->proxy($request->getParsedBody());
    return $response->withJson($rsp[0],$rsp[1]);
});
$app->delete(_VER_.'/accounts/{id:[0-9]+}', function (Request $request, Response $response) {
    $rsp=$this->get('base')->proxy([]);
    return $response->withJson($rsp[0],$rsp[1]);
});

The website API then handles it as follows:

    public function proxy(array $raw,array $args=[])
    {
        if($args) {
            $data=[];
            $missing=[];
            foreach($args as $index) {
                if(isset($raw[$index])) {
                    $data[$index]=$raw[$index];
                }
                else {
                    $missing[]=$index;
                }
            }
            if($missing) {
                return \CmsDB\ErrorResponse::missingValue($missing);
            }
        }
        else {
            $data=$raw;
        }
        $url=parse_url($_SERVER['REQUEST_URI']);
        $request=explode('/',$url['path']);
        $request[1]=$this->apiVersion;
        // Don't duplicate GET data in both URL and data
        $method=strtolower($_SERVER['REQUEST_METHOD']);
        $request=(!isset($url['query']) || $method=='get')?implode('/',$request):implode('/',$request).'?'.$url['query'];
        return $this->makeMainApiRequest($method,$request,$data);
    }

    public function makeMainApiRequest($method, $command, $data=[])
    {
        $url=$this->settings['ip'].'/'.ltrim($command, '/');
        $rsp=$this->CallAPI($method,$url,$data,[CURLOPT_HTTPHEADER=>['X-Secret-Key: '.$this->settings['key']]]);
        if($rsp['errno']) {
            $rsp['code']=400;
            $obj=(object)($rsp['errno']==6)
            ?['message'=>'Invalid Server IP','code'=>1]
            :['message'=>"cURL Error: $rsp[error] ($rsp[errno])"];
        }
        elseif(!isset($rsp['rsp']) || !$rsp['rsp']) {
            $obj=null;
        }
        else {
            $obj=json_decode($rsp['rsp']);
            if(json_last_error() != JSON_ERROR_NONE) {
                $rsp['code']=400;
                $obj=(object)['message'=>'Invalid JSON response','code'=>1];
            }
        }
        return [$obj,$rsp['code']];
    }

    private function CallAPI($method, $url, array $data, array $options=[])
    {
        $options=$options+[    //Don't use array_merge since it reorders!
            CURLOPT_RETURNTRANSFER => true,     // return web page
            CURLOPT_HEADER         => false,    // don't return headers
            CURLOPT_FOLLOWLOCATION => true,     // follow redirects
            CURLOPT_ENCODING       => "",       // handle all encodings
            CURLOPT_USERAGENT      => "unknown",// who am i
            CURLOPT_AUTOREFERER    => true,     // set referrer on redirect
            CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
            CURLOPT_TIMEOUT        => 120,      // timeout on response
            CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
        ];
        //Optional authentication
        if (isset($options[CURLOPT_USERPWD])) {$options[CURLOPT_HTTPAUTH]=CURLAUTH_BASIC;}
        switch (strtolower($method)) {
            case "get":
                if ($data) {$url = sprintf("%s?%s", $url, http_build_query($data));}
                break;
            case "post":
                $options[CURLOPT_POST]=1;
                // CURLOPT_POST requires CURLOPT_POSTFIELDS to be set!!!  PUT and DELETE don't seem to require.
                $options[CURLOPT_POSTFIELDS]=$data?http_build_query($data):'';
                break;
            case "put":
                //$options[CURLOPT_PUT]=1;
                $options[CURLOPT_CUSTOMREQUEST]="PUT";
                if ($data) {$options[CURLOPT_POSTFIELDS]=http_build_query($data);}
                break;
            case "delete":
                //$options[CURLOPT_DELETE]=1;
                $options[CURLOPT_CUSTOMREQUEST]="DELETE";
                if ($data) {$options[CURLOPT_POSTFIELDS]=http_build_query($data);}
                break;
            default:trigger_error("Invalid HTTP method.", E_USER_ERROR);
        }
        $options[CURLOPT_URL]=$url;
        $ch      = curl_init();
        curl_setopt_array( $ch, $options );
        $rsp=['rsp'=>curl_exec( $ch ),'errno'=>curl_errno($ch),'code'=>curl_getinfo($ch, CURLINFO_HTTP_CODE),'error'=>false];
        if($rsp['errno']) {
            $rsp['error']=curl_error($ch);
        }
        curl_close( $ch );
        return $rsp;
    }

The main API then does its authentication using:

$app->add(function(Request $request, Response $response, $next) {
    $key = $request->getHeaderLine('X-Secret-Key');
    //validate $key and throw exception if missing or invalid.
    return $next($request, $response);
});
$app->run();

I am not sure I understand your recommendation for "a more automatic method - like an IP address check that allows all LAN clients." Can you please elaborate. Also, your remark "If you don't proxy everything then I would not proxy anything." seems a little extreme. Why all or nothing?

Back to the subject at hand, the new file download endpoint described by my first post (can likely be improved but is close) will be added to the the main API:

$app->get('/downloads/{id}', function(Request $request, Response $response, $args) {
    $file = $this->get('FileManager')->getFileName($args['id']);
    $response = $request->withHeader('Content-Description', 'File Transfer')
    ->withHeader('Content-Type', 'application/octet-stream')
    ->withHeader('Content-Disposition', 'attachment;filename="'.basename($file).'"')
    ->withHeader('Expires', '0')
    ->withHeader('Cache-Control', 'must-revalidate')
    ->withHeader('Pragma', 'public')
    ->withHeader('Content-Length', filesize($file));
    readfile($file);
    return $response;
});

How would you go about creating the website API endpoint with will download the file?

I've recently learned much, and upon looking over what I posted above, would do some differently, however, am still good with most. After coming up with a good approach to download a file, I will likely make changes to proxy(), makeMainApiRequest(), and CallApi(), and make them more generalized.

Thank you

requinix · December 31, 2017

I am not sure I understand your recommendation for "a more automatic method - like an IP address check that allows all LAN clients." Can you please elaborate.

You said the API had a password and the browser wouldn't know it, thus the website API has to pass it along. You can remove the need for the API password if you allow authentication by IP address - being on the same network grants authentication just like how providing the right password does.

Also, your remark "If you don't proxy everything then I would not proxy anything." seems a little extreme. Why all or nothing?

Consistency. A couple proxyed endpoints would be fine if there was a good reason (like it's complicated work to duplicate) but personally I wouldn't want half the website API working manually and the other half acting as a proxy.

How would you go about creating the website API endpoint with will download the file?

However you want? If you're asking for the technical method to pass along everything, you could use cURL to do a request to /downloads/whatever making sure it captures the response headers (CURLOPT_HEADER). Grab pretty much everything there listed in the code (so not all the headers, just the important ones for the download) from the response and send them to the browser. Then output the response.

The download is basically just a GET request, right? Then easier would be to have the web server forward/proxy the request to the API server (ie, through Apache/nginx configuration), but if the API requires the password then that's not as easy... Thus the IP address authentication idea.

NotionCommotion · December 31, 2017

Each website has a unique key which constitutes a username and password. That being said, I understand your reply.

I agree with your proxy recommendations.

For downloading files, I've used mod_xsendfile before and will consider doing so here. Just don't want to make things to magical.

If not implemented directly via the webserver, seems like you recommend cURL over file_get_content() or exec() with wget, true? Good point about only responding with the important headers. Regardless of the means PHP uses to obtain the file content (which happens to be a tshark dump file), upon receiving it, just echo the applicable headers and file content whatever it may be?

requinix · December 31, 2017

For downloading files, I've used mod_xsendfile before and will consider doing so here. Just don't want to make things to magical.

Remember it's for actual files, so you could use that for the API server but not the web server.

If not implemented directly via the webserver, seems like you recommend cURL over file_get_content() or exec() with wget, true?

You need the headers - the Content-Type and Content-Disposition especially. file_get_contents() and stream contexts won't get you headers, and command-line wget won't make it easy to get them either.

I've looked a bit more into it and now I think you can use streams more easily. Mind you, cURL is more flexible so you should still consider it, but with streams you can get the headers in a partially-parsed format.

Look at stream_get_meta_data to get the response headers from a stream. Process them for the headers you care about.

Good point about only responding with the important headers. Regardless of the means PHP uses to obtain the file content (which happens to be a tshark dump file), upon receiving it, just echo the applicable headers and file content whatever it may be?

Not echo the headers, but yes. The point is that there may be others like Date that would be fine to accidentally pass through, but then there are others like Connection or Transfer-Encoding that shouldn't.

kicken · December 31, 2017

I wrote this code last week in one of my scraping scripts that needed updating (damn site updates breaking things :facewall: ).

    private function initCurl(){
        $this->ch = curl_init();
        $this->createCookieFile();

        $headerCallback = function(
            /** @noinspection PhpUnusedParameterInspection */
            $ch
            , $headerData
        ){
            foreach (explode("\r\n", $headerData) as $line){
                if ($line == ''){
                    continue;
                }

                if (substr($line, 0, 4) == 'HTTP'){
                    $header = 'HTTP';
                    $value = substr($line, strpos($line, ' ') + 1);
                } else {
                    list($header, $value) = explode(':', $line, 2);
                    $header = strtoupper($header);
                    $value = trim($value);
                }

                if ($header === 'SET-COOKIE'){
                    $this->lastResponseHeaders[$header][] = $value;
                } else {
                    $this->lastResponseHeaders[$header] = $value;
                }
            }

            return strlen($headerData);
        };

        $bodyCallback = function(
            /** @noinspection PhpUnusedParameterInspection */
            $ch
            , $bodyData
        ){
            $this->lastResponse .= $bodyData;

            return strlen($bodyData);
        };

        curl_setopt_array($this->ch, [
            CURLOPT_FOLLOWLOCATION => true
            , CURLOPT_SSL_VERIFYPEER => true
            , CURLOPT_SSL_VERIFYHOST => 2
            , CURLOPT_COOKIEFILE => $this->cookieJar
            , CURLOPT_COOKIEJAR => $this->cookieJar
            , CURLOPT_HEADERFUNCTION => $headerCallback
            , CURLOPT_WRITEFUNCTION => $bodyCallback
        ]);
    }

It could be easily adapted to act as a proxy between your main api and the browser for downloading files.

Have the header callback watch for the headers you need to forward (Content-type, Content-length, etc) and then pass them along with header. Have the body callback just echo out the received data as it comes in.

If you control both API's another thing you could do (if security requirements permit it) is just let the file be accessed directly. One API I've been working on needs to return image files to be displayed. Rather than have some endpoint serve up the image data it instead just returns a URL to the image that can be stuck in an image tag or whatever. When images are added they are given long random names so the URL's can't just be easily guessed and they aren't something that really needs to be secured anyway.

So in this case you're main API would just serve up a URL to the file and either put that URL into the webpage link initially or have your webserver just redirect the browser to that URL to access the file.

Edited December 31, 2017 by kicken

NotionCommotion · January 7, 2018

Thanks requinx and kicken, I ended up doing as shown below. If you have any recommend changes, I would appreciate hearing them.

Web Server:

$app->get(_VER_.'/{basetype:guids|tester}/{guid}/logs/network/{name}', function(Request $request, Response $response, $args) {
    $config=$this->get('settings')['config'];
    $context = stream_context_create(['http'=>['header'=>'X-Secret-Key: '.$config['key']]]);
    $fh = fopen($config['ip'].$_SERVER['REQUEST_URI'], 'rb', false, $context);  //r or rb?
    $stream = new \Slim\Http\Stream($fh);
    $headers = $stream->getMetadata()['wrapper_data'];
    $forwardHeader=[
        'Content-Description'=>'File Transfer',
        'Content-Type'=>'application/octet-stream',
        'Content-Transfer-Encoding'=>'binary',
        'Content-Disposition'=>'attachment; filename="' . $args['name'] . '"',
        'Expires'=>0,
        'Cache-Control'=>'must-revalidate, post-check=0, pre-check=0',
        'Pragma'=>'public',
        'Content-Length'=>false,   //Possible to get stream length?
    ];
    foreach ($headers as $header) {
        $header=explode(':',$header);
        if($header && count($header)==2 && isset($forwardHeader[$header[0]])){
            //Should I really be modifying the $response?
            $response=$response->withHeader($header[0],trim($header[1]));
            unset($forwardHeader[$header[0]]);
        }
    }
    //Is this necessary?
    foreach ($forwardHeader as $key=>$value) {
        if($value!==false){
            $response=$response->withHeader($key,$value);
        }
    }
    return $response->withBody($stream);
});

API Server

$app->get(_VER_.'/guids/{guid}/logs/network/{name}', function(Request $request, Response $response, $args) {
    $path=$this->get('Guids')->getLogFilePath('network',$args['guid'],$args['name']);   //Will throw an exception if path doesn't exist
    $fh = fopen($path, 'rb'); //r for readonly, and b for binary?
    $stream = new \Slim\Http\Stream($fh);
    return $response->withBody($stream)
    //->setOutputBuffering(false)
    ->withHeader('Content-Description', 'File Transfer')
    ->withHeader('Content-Transfer-Encoding', 'binary')
    ->withHeader('Content-Type', 'application/octet-stream')
    //->withHeader('Content-Type', 'application/force-download')    //Don't use?
    //->withHeader('Content-Type', 'application/download')          //Don't use?
    ->withHeader('Content-Disposition', 'attachment; filename="' . $args['name'] . '"')
    ->withHeader('Expires', '0')
    ->withHeader('Cache-Control', 'must-revalidate, post-check=0, pre-check=0')
    ->withHeader('Pragma', 'public')
    ->withHeader('Content-Length', filesize($path));
});

Sign In

Returning files via an API

Recommended Posts

NotionCommotion

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

NotionCommotion

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

NotionCommotion

Link to comment

Share on other sites

requinix

Link to comment

Share on other sites

kicken

Link to comment

Share on other sites

NotionCommotion

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information