How can I sort Problematic Characters in PHP?

loo9162 · April 7, 2013

Basically, I am writing a script in PHP, which can take YouTube videos from playlists, items from RSS feeds and podcasts, and individual YouTube videos and files, and places them into an XML document, so they can browsed and kept in one place. I also have a script which removes these items, if the user wants.

The problem I'm facing is with characters. Because I can't control what the user will name their videos/files, or how they're named in the feed, the titles could have quotes, brackets, ampersands, hashes etc, which causes problems when they're being removed and Because I'm using Xpath (which can be temperamental at the best of times) in the remove script, any items with titles with these characters won't get removed.

Here's my remove code:

<?php

$q = $_GET["q"];

$q = stripslashes($q);

$q = explode('|^', $q);

$counts = count($q);

unset($q[$counts-1]);

$dom = new DOMDocument;

$dom->preserveWhiteSpace = false;

$dom->formatOutput = true;

$dom->Load("../$userid.xml");

$xpath = new DOMXPath($dom);

foreach ($q as $r) {

$r = preg_replace("|&|", '&', $r);

$r = preg_replace('|"|', '"', $r);

$query1 = 'channel/item[title='.$r.']/title';

$query2 = 'channel/item[title='.$r.']/media:content';

$query3 = 'channel/item[title='.$r.']';

$entries = $xpath->query($query1);

$entries2 = $xpath->query($query2);

$entries3 = $xpath->query($query3);

foreach ($entries as $entry) {

foreach ($entries2 as $entry2) {

foreach ($entries3 as $entry3) {

$oldchapter = $entry->parentNode->removeChild($entry);

$oldchapter2 = $entry2->parentNode->removeChild($entry2);

$oldchapter3 = $entry3->parentNode->removeChild($entry3);

$dom->preserveWhiteSpace = false;

}

$dom->preserveWhiteSpace = false;

$dom->formatOutput = true;

$dom->save("../$userid.xml")

?>

How it works is when the user selects the items they want to remove, using a select box, the selections are put into the URL. My code extracts the titles from the URL, separated by "|^" (For example title1|^title2|^title3|^). Because the "|^" is appended to the end of each title, I have to remove the empty value from the array. Then I load a new DOMdocument, and find the titles from the URL in my existing XML document. Then I want the code to remove the whole items (titles, urls and the item itself) which have the same titles as the ones in the URL, and then save the document, but because some of the titles could have &, ", * or #, they don't get removed.

Is there a way that I can maybe screen, and change the characters to get it to work (I tried this with "preg_replace", but it didn't work), or even change them before they're saved to the XML in the first place?

Any advice?

Christian F. · April 7, 2013

I think the best advice would be to build this application around a proper database, which would allow you to handle stuff like this with ease. That would also allow you to drop the triple-nested loops, which is something that's really bad for performance. Especially if you get a lot of data.

Then, if you need to export to XML, then write a script that does just that: Export.

Also, any reason why you're escaping XML characters manually, as opposed to be using the proper functions for it?

loo9162 · April 7, 2013

I can't use a database, because the video player I'm using only supports xml and rss playlists. I don't understand what you mean by export to XML (sorry). Also, what are the proper functions I should be using. (sorry, complete noob)

Andy-H · April 7, 2013

He means store it in a database, and create a script that queries the database then formats the data in XML and outputs it.

Sign In

How can I sort Problematic Characters in PHP?

Recommended Posts

loo9162

Link to comment

Share on other sites

Christian F.

Link to comment

Share on other sites

loo9162

Link to comment

Share on other sites

Andy-H

Link to comment

Share on other sites

Archived

Browse

Activity

Important Information