Jump to content

Data Transformation


roopurt18

Recommended Posts

I spent a bit of time trying to come up with a flexible method of dealing with data transformation in my applications and thought others might be interested.  Essentially we're always dealing with data transformation.  We have to convert incoming data so that it's safe to use in the database.  We have to convert data before displaying it to the user to remove markup or malicious scripts.  Or we might have to apply a bad language filter, a language translator, etc.

 

None of this is hard to do in and of itself.  Where it gets tricky is when you have a lot of existing code that depends on a particular implementation and then a client or your boss comes along and says, "Hey, can you convert all data to upper case before inserting it into the database?  But I don't want to do this for everybody, I want it to be an option they can turn on in their preferences."

 

Go ahead and think about how hard that might be to accomplish in your current projects.

 

So without further ado I link you to my blog, where I've already typed this whole thing out.

http://rbredlau.com/drupal/node/11

 

If you don't want to do a lot of reading, here is a (probably incorrect) UML diagram that might explain things:

DataTransformer.png

Link to comment
Share on other sites

Well, I'll go ahead and reply just so you have one :)

 

My initial thoughts were that you definitely thought through this and have a nice little design, but I couldn't see the need for it in any of my own projects so I just kind of passed it by. I was left thinking that aside from cleansing data before insertion into the DB, I want to store the raw data, not some transformation thereof. If I want it to be presented as all caps I'd do a magical <span style="text-transform: uppercase;">$data</span> but would rather not modify the data itself. Then I got to thinking, roopurt is a smart guy (probably smarterer than me) and seems pretty stoked about this maybe I suxors and I'm missing something... I better wait and see if anyone else replies before I open my mouth.

 

Glad I could share the wondrous thoughts of Derek. I hope you enjoyed.

Link to comment
Share on other sites

Ok.  That's a start.

 

Basically the way I see this is as a replacement for that list of utility functions we almost all currently have that accomplish the same thing.

 

I agree with you on storing the user's data intact into the database.  But you still have to pass it through mysql_real_escape_string() and enclose it in single quotes, all of which I consider a transformation.  Likewise if you received it from $_GET or $_POST you need to undo the effects of Magic Quotes.

 

As for something like wrapping a piece of data within certain tags, such as a span for changing to uppercase, that too is a transformation.

 

I sat down to do some actual coding with this idea over the weekend and the idea I struggled with is it's too open-ended.  The original concept is to take data in one representation and transform it to another.  This could be something very simple (wrap a string in single quotes) or something very complicated (serialize any object to XML).

 

And on a much larger scale data transformation is what every program is all about.  We take one set of data (user input) and convert it to program actions.  Thinking about it like that, an entire program can be composed of objects that inherit from the DataTransformer class.  That's taking it a bit too far IMO though.

Link to comment
Share on other sites

I don't have time to read the whole thing right now, but thanks for that thorpe.  I did glance at it and noticed this in their consequences section:

 

Information Sharing is Inefficient

Sharing information between filters can be inefficient, since by definition each filter is loosely coupled. If large amounts of information must be shared between filters, then this approach may prove to be costly.

 

That seems to a be a big PITA that I deal with constantly in the web world and probably my main struggle with the MVC pattern.

Link to comment
Share on other sites

I can get on board with input cleansing being a transformation, however this just seems far too complicated for the majority of the problems I face... maybe I'm just not there yet, I dunno.

 

My approach for utility functions has been sort of the 80/20 rule. I look at where 80% of my time is spent... which is often in 20% of tasks. Since I can see the biggest improvements in these areas, that is where I focus my efforts.

 

For example, at work I had to write an application that had 60+ input fields per page and I think 8 pages. The processing of this code was insane. After seeing the headache that was created and the intense amount of overhead I was creating I wrote a little utility that generates input validation rules from an XML file. Not only did it simplify the code, but it further separated business logic from my application. I could now pound through forms w/o worrying about the validation too much up front, b/c I knew I could just drop in the rules later.

 

My guess is you're experiencing a similar thing with these data transformations. Code aside... perhaps it would help if you could illustrate the need/time savings/benefit that your approach provides. Right now it's cool, just too much work for a relatively simple task, IMO.

Link to comment
Share on other sites

The thing is, in general your going to be applying these filters to either request or response data. I would normally have this (the request and response) in a registry, somewhere I can easily get my hands on it.

 

Theres a good section in the Symfony Manual that covers how there filters work they may also be a good read.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.