Jump to content

XML Feed Query


Omzy

Recommended Posts

I've just been set a university assignment, which goes something like this:

 

A company provides a product feed in XML which contains thousands of items. The feed is generated from a back end database which stores details of all their products. I need to generate a web interface which will process this feed and return a new feed based on the parameters the user specifies in the form. So for example the user might only be interested in 'keyboards'.

 

I don't need to write any code for this, just need to provide a solution to the problem.

 

Now I would say that it's easier to just create an interface which connects directly to the back end database, queries it and generates a custom feed from there, rather than processing the main feed and generating a new feed from it. But I don't know if this is the answer they are after.

If I was to take the raw "unfiltered" feed how would I go about running a query on this feed so that it can be filtered and a new feed be generated from it? Can it be done without using a database?

Link to comment
Share on other sites

First of all have a look at for example: http://in2.php.net/SimpleXML to get an idea of what you can extract from the XML feed using PHP.

 

If you can't use a database which already have "search algorithms implemented to perform queries" you need to come up with your own so to speak. One way would be to simply iterate over all the items in the feed and compare it to a list of items you want.

 

So each time you encounter a item from the category 'keyboards' you add that item to your custom feed. This will take O(n) time since it's a linear search where you need to examine all n items to decide whether you want them or not.

<inventory>
    <item category="keyboard">
        <name>Logitech Extreme</name>
   </item>
    <item category="food">
        <name>Apple</name>
   </item>
    <item category="toys">
        <name>Killer Panda</name>
   </item>
</inventory>

 

Alternatively you could hope that the company has it feed set up such that all the keyboard elements (items) is children of a category element in which case you need only to examine the categories and decide if you want all (or just some) items of that category.

 

<category="keyboards">
    <item>
        <name>Logitech Extreme</name>
    </item>
    <item>
        <name>Logitech Extreme 2</name>
    </item>
</category>

 

It's really going to depend on the feed. You could also decide to parse the feed once a day and maintain your own database with the information (which would then at max be 1 day old, or some other time you specify). That would easily allow you to create more advanced queries and maybe even do some advanced caching on the most popular requests ect.

Link to comment
Share on other sites

The question states:

 

"Think about what technologies you will use to query the feed, whether you will leave the feed in its original format or need to convert it to another database format."

 

Considering the data is coming from a database originally, it seems a bit odd to then  re-insert it back in to another database so that we can query it. Which is why I said it would have been easier if we could interface directly to the original database. But perhaps that's just how it is.

 

I've done some Googling on querying XML Data and it mentions something called XQuery. I'm wondering if this would be the best solution? However I've not managed to get it to run on my server - does XQuery require its own interpreter?

Link to comment
Share on other sites

http://www.w3schools.com/XSL/xsl_intro.asp

 

 

But if I were your professor or teacher on this computer science topic I would expect something other than just "what methods from known 3rd party packages / applications will you use". I would expect some theory on how to actually parse the document.

 

It is basically a text file you get and you have to extract information from it. Which algorithms would you use to search through it, what data structures would you use to store the information. What will be the resource cost and running time of your algorithms ect. Or alternatively - how does XSLT parse the xml? What search algorithms does it use...

 

Suppose you were the one inventing XSLT or similar or maybe making a competing package for doing the same job, only better. How would you perform the task of turning a xml file into something queriable like a database?

Link to comment
Share on other sites

You can use XSLT (XSL) to do your job instead of XPath.

You can use xpath expressions in xslt

 

Considering the data is coming from a database originally, it seems a bit odd to then  re-insert it back in to another database so that we can query it.

Not at all, in a lot of cases access to a database is not desired for outsiders. And you'll have to filter the provided xml to get the data you need. And in some situations the amount of times a certain xml can be downloaded has a limit for bandwidth reasons. In that case parsing a xml and inserting it into your own database is a good solution.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.