Jump to content

PHP Parsing of XML (RSS) in different Languages (Arabic, Russian, etc.)


Recommended Posts

Hi All, New here and having a trouble with code.

 

I'm trying to parse RSS feeds in multiple languages and character sets using the same code.

 

Here it is, warts and all:

 

<?php ob_start( );
/// three-feed parser

$insideitem = false;

$tag = "";

$title = "";

$description = "";

$link = "";
$result3 = mysql_query("SELECT * FROM comingsoon WHERE punycode='$domainname'")
or die(mysql_error());
while($row = mysql_fetch_array(INSERTVARIABLESHERE)) {
$feed1 = $row['rss1'];
}
$locations = array($feed1);

srand((float) microtime() * 10000000); // seed the random gen 

$random_key = array_rand($locations);

function startElement($parser, $name, $attrs) {

global $insideitem, $tag, $title, $description, $link;

if ($insideitem) {

  $tag = $name;

} elseif ($name == "ITEM") {

  $insideitem = true;

}

}

function endElement($parser, $name) {

global $insideitem, $tag, $title, $description, $link;

if ($name == "ITEM") {

  echo "<dt><b><a href='" . trim($link) . "'>" . trim($title) . "</a></b></dt>\n";

  echo "<dt>" . trim($description) . "</dt><br><br>\n";

  //printf("<dt><b><a href='%s' target=new>%s</a></b></dt>",trim($link),htmlspecialchars(trim($title)));

  //printf("<dt>%s</dt><br><br>",htmlspecialchars(trim($description)));

  $title = "";

  $description = "";

  $link = "";

  $insideitem = false;

}

}

function characterData($parser, $data) {

global $insideitem, $tag, $title, $description, $link;

if ($insideitem) {

switch ($tag) {

  case "TITLE":

  $title .= $data;

  break;

  case "DESCRIPTION":

  $description .= $data;

  break;

  case "LINK":

  $link .= $data;

  break;
}

}

}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");

xml_set_character_data_handler($xml_parser, "characterData");



$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $locations[$random_key]);

curl_setopt($ch, CURLOPT_HEADER, 0);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$data = curl_exec($ch);

curl_close($ch);



xml_parse($xml_parser, $data) or die(sprintf("XML error: %s at line %d", xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser)));

xml_parser_free($xml_parser);

?>

 

Some of that I've added myself, some of it I've stripped away myself, some of it I have no idea about.

 

Suffice it to say, INSERTVARIABLESHERE works for certain variables, provided they are valid RSS feeds in certain languages.

 

Notably, English, Chinese (simplified, never tried traditional), and Japanese.

 

You can see it at work at:

 

xn--1tq374c79r.biz (simp. Chinese)

xn--3ckxd6aza9447ccy3c.com (Japanese)

 

And you can see it not work at:

 

xn--mgbgvp7e0ao.net (Arabic)

xn--e1aapgcbl6d.com (Russian)

 

Note that each page is utf-8 encoded and has the appropriate language and character tags.

 

NOTE: this script is intended to use multiple RSS feeds and rotate them randomly via array; I am only using one URL in each case, so the random generation and rotation is not intended to work yet.

 

Any help in getting these other character sets to work would be greatly appreciated.

 

TIA.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.