Jump to content

Remove bytes off of a string


NotionCommotion

Recommended Posts

I have the following script.  When echoing $data, it included some strange character at the beginning of the message

<?php
require 'vendor/autoload.php';
$loop = React\EventLoop\Factory::create();
$socket = new React\Socket\Server($loop);
$socket->on('connection', function (\React\Socket\ConnectionInterface $client) use ($loop){
    $client->on('data', function($data){
        echo($data."\r\n");
    });
});
$socket->listen(1337,'0.0.0.0');
$loop->run();

Turns out that $data has 4 bytes pretended on it to represent the length of the message.  On the receiving connection, how can I get those 4 bytes and also remove them from $data.

 

Link to comment
Share on other sites

I'm still now sure if you understand the concept of streaming.

 

The data you receive is not the message. It's a portion of a byte stream, which means it could contain anything: a message fragment, a single complete message, multiple message. You never know.

 

That's why the “strange length bytes” exist: They tell you where the messages end. You must process this information and write an assembly logic for the messages.

Link to comment
Share on other sites

Thanks ginerjm,

 

Would it be substr with 2 or 4?  2 bytes per character, right?  Funny but I seem to get the same results using both 2 or 4.

 

Don't think I need http://php.net/manual/en/function.mb-substr.php, do you?

 

If I do the following and echo $firstFourBytes, it displays some unprintable text.

$firstFourBytes=substr($data,2);
$remainingContent=substr($data,0,2);

The four bytes are added with the following c++ script.

uint32_t size = message.size();
unsigned char mSize[sizeof(size)];
memcpy(&mSize, static_cast<void*>(&size), sizeof(size));
data.insert(data.begin(), mSize, mSize + sizeof(size));

How could I get the numerical value of those four bytes using PHP?

Link to comment
Share on other sites

I'm still now sure if you understand the concept of streaming.

 

The data you receive is not the message. It's a portion of a byte stream, which means it could contain anything: a message fragment, a single complete message, multiple message. You never know.

 

That's why the “strange length bytes” exist: They tell you where the messages end. You must process this information and write an assembly logic for the messages.

 

Your right, I still don't completely understand the concept of streaming, but understand way more than I did a while ago.

 

What I have been doing is using the JSONStream class kicken posted under https://forums.phpfreaks.com/topic/302840-http-server-with-two-hosts-and-same-port/?p=1540924.  It is my understanding that I can either use the length to extract the message as you just indicated, or do it the kicken way which looks for an end of line.  If I have the length (which I do), think that is the better way?  How do I actually get these four bytes as a number?  Can you point me in the right direction to implement?

 

Thanks

Link to comment
Share on other sites

The two hosts first need to agree on a byte order for the length. How integers are stored is machine-dependent, so there must be a common format on the wire (a good candidate is network byte order, i. e. big endian). C has the htonl() function for the conversion.

 

Then the PHP script can unpack the bytes:

<?php

// test: 16 in big endian
$lengthField = "\x00\x00\x00\x10";
$length = unpack('Nlen', $lengthField)['len'];

var_dump($length);

Whether you use length prefixes or delimiters is a design choice. Appearently you (or whoever wrote the C++ code) decided against delimiters.

Link to comment
Share on other sites

Whether you use length prefixes or delimiters is a design choice. Appearently you (or whoever wrote the C++ code) decided against delimiters.

 

I had just naively assumed delimiters as the C++ author and I never discussed it.  If necessary, the C++ code can be changed to use delimiters.

 

Do you feel one design choice is better than the other, or is the better approach based on the situation?  If the situation, what aspects influence the decision?

 

Thanks

Link to comment
Share on other sites

Well, then delimiters it is!  I've gone through too much just to settle for "slightly simpler".

As far as implementation, I am thinking of something like the following.  Before going down the path to invent something new, has this requirement been implemented many times before resulting in a better solution?

<?php
require 'vendor/autoload.php';
$loop = React\EventLoop\Factory::create();
$socket = new React\Socket\Server($loop);
$socket->on('connection', function (\React\Socket\ConnectionInterface $socket) use ($loop){
    $superSocket  = new DealWithSocket($socket);
    $superSocket->on('data', function($message) use ($superSocket){
        echo($message.PHP_EOL);
        $superSocket->send('thank you!');
    });
});
$socket->listen(1337,'0.0.0.0');
$loop->run();
<?php
class DealWithSocket implements Evenement\EventEmitterInterface{

    // Should Evenement be used?
    use Evenement\EventEmitterTrait;

    private $socket,
    $buffer='',
    $messageLength,
    $messageLengthPointer=0;

    public function __construct(React\Stream\DuplexStreamInterface $socket){
        $this->socket = $socket;

        $this->socket->on('data', function($data){
            $this->buffer .= $data;
            $this->parseBuffer();
        });
    }

    public function send($string){
        $this->socket->write(strlen($string).$string);  // I don't think this is right
    }

    private function getLength($string, $start=0){
        //My understanding is that "N" represents an unsigned long (always 32 bit, big endian byte order) and "len" is just what ever you want the array index name to be
        return unpack('Nlen', substr($data,$start,4))['len'];
    }

    private function parseBuffer(){
        // And this needs help...
        if(is_null($this->messageLength)) {
            //Save the first time data is received?
            $this->messageLength=$this->getLength($this->buffer);
        }
        while (strlen($this->buffer)>($this->messageLength+4)){
            $message = substr($this->buffer, 4, messageLength);
            $this->buffer = substr($this->buffer, messageLength+4);
            $this->emit('data', $message);
        }
    }
}
Link to comment
Share on other sites

A little better with my DealWithSocket  class...

<?php
class DealWithSocket implements Evenement\EventEmitterInterface{

    // Should Evenement be used?
    use Evenement\EventEmitterTrait;

    private $socket,
    $buffer='',
    $messageLength;

    public function __construct(React\Stream\DuplexStreamInterface $socket){
        $this->socket = $socket;

        $this->socket->on('data', function($data){
            $this->buffer .= $data;
            $this->parseBuffer();
        });
    }

    public function send($string){
        $this->socket->write(strlen($string).$string);  // I don't think this is right
    }

    private function getLength(){
        //My understanding is that "N" represents an unsigned long (always 32 bit, big endian byte order) and "len" is just what ever you want the array index name to be
        return strlen($this->buffer)>=4?unpack('Nlen', substr($this->buffer,0,4))['len']:0;
    }

    private function parseBuffer(){
        // Is using string functions like strlen() appropriate?
        if(!$this->messageLength && strlen($this->buffer)>=4) {
            //Save the first time data is received or it happened to end perfectly at the end?
            $this->messageLength=$this->getLength();
        }
        while ($this->messageLength && strlen($this->buffer)>=($this->messageLength+4)){
            $message = substr($this->buffer, 4, messageLength);
            $this->emit('data', $message);
            $this->buffer = substr($this->buffer, messageLength+4);
            $this->messageLength=$this->getLength();
        }
    }
}
Edited by NotionCommotion
Link to comment
Share on other sites

So are you going to switch to delimiters or keep the length prefix? Your post suggests delimiters but your code is still using lengths.

 

If you want to use delimiters, refer back to the JSONStream class and see how it handles parsing out individual items. Use strpos to search for the delimiter and then extract the message.

 

If you want to continue with length processing then you could try looking at the Gearman code I wrote as an example. It doesn't use react but it will demonstrate parsing out a packet. Areas of interest may be Connection::readPacket and Packet::fromString

 

    public function readPacket(){
        if (!$this->stream){
            $this->connect();
        }
        $header = $this->read(12);
        $size = substr($header, 8, 4);
        $size = Packet::fromBigEndian($size);
        $arguments = $size > 0?$this->read($size):'';
        return Packet::fromString($header . $arguments);
    }
The gearman protocol specifies that each packet begins with a 4-byte magic code, 4-byte packet type and a 4-byte packet size (in network byte order). So first 12-bytes are read (4*3) then the last 4-byte group is extracted and turned into an integer to determine how much additional data is read.

 

public static function fromString($data){
        $magic = substr($data, 0, 4);
        $type = substr($data, 4, 4);
        $type = static::fromBigEndian($type);
        $size = substr($data, 8, 4);
        $size = static::fromBigEndian($size);
        $arguments = substr($data, 12, $size);
        $validSize = strlen($arguments) === $size;
        if (!$validSize){
            throw new UnexpectedPacketException;
        }
        $arguments = explode(chr(0), $arguments);
        $packet = new static($magic, $type, $arguments);
        return $packet;
    }
Once the packet is read in whole it's sent here and broken down into individual fields for easy consumption.
Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.