XML Feed Query

Omzy · June 1, 2009

I've just been set a university assignment, which goes something like this:

A company provides a product feed in XML which contains thousands of items. The feed is generated from a back end database which stores details of all their products. I need to generate a web interface which will process this feed and return a new feed based on the parameters the user specifies in the form. So for example the user might only be interested in 'keyboards'.

I don't need to write any code for this, just need to provide a solution to the problem.

Now I would say that it's easier to just create an interface which connects directly to the back end database, queries it and generates a custom feed from there, rather than processing the main feed and generating a new feed from it. But I don't know if this is the answer they are after.

If I was to take the raw "unfiltered" feed how would I go about running a query on this feed so that it can be filtered and a new feed be generated from it? Can it be done without using a database?

Wuhtzu · June 1, 2009

First of all have a look at for example: http://in2.php.net/SimpleXML to get an idea of what you can extract from the XML feed using PHP.

If you can't use a database which already have "search algorithms implemented to perform queries" you need to come up with your own so to speak. One way would be to simply iterate over all the items in the feed and compare it to a list of items you want.

So each time you encounter a item from the category 'keyboards' you add that item to your custom feed. This will take O(n) time since it's a linear search where you need to examine all n items to decide whether you want them or not.

<inventory>
    <item category="keyboard">
        <name>Logitech Extreme</name>
   </item>
    <item category="food">
        <name>Apple</name>
   </item>
    <item category="toys">
        <name>Killer Panda</name>
   </item>
</inventory>

Alternatively you could hope that the company has it feed set up such that all the keyboard elements (items) is children of a category element in which case you need only to examine the categories and decide if you want all (or just some) items of that category.

<category="keyboards">
    <item>
        <name>Logitech Extreme</name>
    </item>
    <item>
        <name>Logitech Extreme 2</name>
    </item>
</category>

It's really going to depend on the feed. You could also decide to parse the feed once a day and maintain your own database with the information (which would then at max be 1 day old, or some other time you specify). That would easily allow you to create more advanced queries and maybe even do some advanced caching on the most popular requests ect.

Omzy · June 1, 2009

The question states:

"Think about what technologies you will use to query the feed, whether you will leave the feed in its original format or need to convert it to another database format."

Considering the data is coming from a database originally, it seems a bit odd to then re-insert it back in to another database so that we can query it. Which is why I said it would have been easier if we could interface directly to the original database. But perhaps that's just how it is.

I've done some Googling on querying XML Data and it mentions something called XQuery. I'm wondering if this would be the best solution? However I've not managed to get it to run on my server - does XQuery require its own interpreter?

Omzy · June 1, 2009

What about XPath?

PHP includes a domxpath method.

Omzy · June 2, 2009

Anyone?

Omzy · June 2, 2009

How can I use XPath to read in an XML document, run a query to filter out some of the data and then save the filtered data as a new XML file?

Illusion · June 2, 2009

You can use XSLT (XSL) to do your job instead of XPath.

Omzy · June 2, 2009

Thanks. Can XSLT be used to save the transformed data as a new XML file?

Omzy · June 2, 2009

Any idea anyone?

Wuhtzu · June 4, 2009

http://www.w3schools.com/XSL/xsl_intro.asp

But if I were your professor or teacher on this computer science topic I would expect something other than just "what methods from known 3rd party packages / applications will you use". I would expect some theory on how to actually parse the document.

It is basically a text file you get and you have to extract information from it. Which algorithms would you use to search through it, what data structures would you use to store the information. What will be the resource cost and running time of your algorithms ect. Or alternatively - how does XSLT parse the xml? What search algorithms does it use...

Suppose you were the one inventing XSLT or similar or maybe making a competing package for doing the same job, only better. How would you perform the task of turning a xml file into something queriable like a database?

RichardRotterdam · June 4, 2009

You can use XSLT (XSL) to do your job instead of XPath.

You can use xpath expressions in xslt

Considering the data is coming from a database originally, it seems a bit odd to then re-insert it back in to another database so that we can query it.

Not at all, in a lot of cases access to a database is not desired for outsiders. And you'll have to filter the provided xml to get the data you need. And in some situations the amount of times a certain xml can be downloaded has a limit for bandwidth reasons. In that case parsing a xml and inserting it into your own database is a good solution.

Sign In

XML Feed Query

Recommended Posts

Omzy

Link to comment

Share on other sites

Wuhtzu

Link to comment

Share on other sites

Omzy

Link to comment

Share on other sites

Omzy

Link to comment

Share on other sites

Omzy

Link to comment

Share on other sites

Omzy

Link to comment

Share on other sites

Illusion

Link to comment

Share on other sites

Omzy

Link to comment

Share on other sites

Omzy

Link to comment

Share on other sites

Wuhtzu

Link to comment

Share on other sites

RichardRotterdam

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information