Jump to content

OsvaldoM

Members
  • Posts

    16
  • Joined

  • Last visited

    Never

Everything posted by OsvaldoM

  1. I'm a very basic user of regex and I bumped into a problem in which im a bit lost... I have a piece of text that comes with html tags and then through Solr it receives 2 custom tags to highlight words (<em> & <b>). My main problem is that i want to keep Solr's html tags but skip the rest and cleaning up the text before supplying it to Solr is not posible due to search configurations... So I want to keep something like "this is normal text <em><strong>myText</strong></em> and in here it continues" but skip all other html tags (including <strong> and <em> but that are not next to each other...) So this text: <strong>To be</strong> or <em><strong>not to</strong</em> <em>be</em> Should be stripped into this: To be or <em><strong>not to</strong</em> be Help would be highly appreciated!.
  2. Don't know what happened... but i tried this again today, and it worked perfectly. Probably one of those weird cookie/cache errors was making my script display the local date as i originally had it. $date = Zend_Date::now()->setTimezone('US/Eastern')->get('YYYY-MM-dd HH:mm:ss'); That works perfectly as stated by thorpe, hope this helps someone out... Thanks 4 the reply. And yeah, i format the date via the get function (You ought to never really trust Zend's docs, they are in lack of a ton of info)
  3. Hey everyone, if there is one thing i don't like about Zend Framework is that the date functions are a bit confusing and that it takes a few lines to tackle down what i really need... so i was making this line of code: $date = Zend_Date::now()->get('YYYY-MM-dd HH:mm:ss', 'Europe/Vienna'); but i realized it never gets me the timezone i wanted... This doesn't work either: $date = Zend_Date::now()->setTimezone('Asia/Kabul')->get('YYYY-MM-dd HH:mm:ss'); Do note this never throws an exception, it outputs the correct time (local) but it's unable to change the timezone of the now() constant. I've been able to change the timezone with multiple lines of code for now(), but i can't seem to get it on just one... Is it possible to do it on just one line? or is it just me being lazy? Thanks in advance for any replies...
  4. Just so you know, i was able to get what i want: preg_match_all('~<article .*(.*)</article>~isU', $articlesData, $pieces); notice the space after the word article... that was the trick. I am know building a regexp which will erase espaces between tags "> <" should be "><". Also do notice that preg_match_all might not be the best idea if you are looking for good performance of your query, 500 articles almost crash my firefox, good thing this is for a cron job... Hope the above code help someone out!
  5. The main two issues are: unclosed tags or feed isn't complete and illegal characters make the feed XML un-parsable. The feed comes in several languages and the guys which send us this feed apparently don't know what C-DATA and validation is...
  6. I know, we used to do this before, though the feeds we get are quite large and if we escaped one of the feeds the data loss was considerable... anyways, thanks for the link, reading at the moment...
  7. Hi all, hope you can help me out in this one, have been struggling all day with this issue: We get an XML feed from a third party and sometimes the feed is corrupt, though we still need to extract as much info as we can from it, obviously being XML a very strict language, trying to parse it with any PHP-XML libraries will not work, and cleanup libraries such as Tidy HTML haven't done the job, so I am trying to manually break the feed in parts, then process it. We usually get stuff like this: <?xml version="1.0" encoding="utf-8"?> <articles extbatchid="877" nextextbatchid="903" profileid="2012234"> <article articleid="1141402135"> <url>http://www.courant.com/features/bal-friends-sweeps0429,0,7443638.story</url> <headline_text>TV lines up big doings, grand finales for the sweeps</headline_text> <outlet>Hartford Courant</outlet> <influential>By Hal Boedeker</influential> <language>English</language> <country>United States</country> <publish_date>2004-04-29 12:45:29 UTC</publish_date> <extract>Last \'Friends\' episode, \'Idol\' conclusion will drive May...</extract> </article> <article articleid="114140sdfsdf2135"> <url>http://www.mysite.com/y</url> <headline_text>Osvaldo makes the headlines</headline_text> <outlet>Dont Know</outlet> <influential>By Hsfsdfsdfsdf</influential> <language>English</language> <country>MEXICO</country> <publish_date>2004-04-29 12:45:29 UTC</publish_date> <extract>HELLLLLLLLLLLOOOOOOOOOOOOOOO!!</extract> </article> <article Bad, broken down>This is a bad article<arcle> What i am trying to accomplish is to make a regex for preg_match_all that will break down the article info into an array, and each key of it will hold all the article info, e.g: array [0] => <article articleid="1141402135"> EVERYTHING IN BETWEEN </article> [1] => <article articleid="112345677"> EVERYTHING IN BETWEEN </article> [2] => <article articleid="123353457"> EVERYTHING IN BETWEEN </article> I have already accomplished to get everything between two tags with preg_match_all('/<tag(.*)(.*)?<\/tag>/', $articlesData, $pieces); which works fine with most of the tags, except the one I really need: preg_match_all('/<article(.*)(.*)?<\/article>/', $articlesData, $pieces); the problem is that if I ran the above code i will get everything from the parent node <articles>, instead of the child <article>, i haven't been able to apply the proper "/b" nor to actually get closer to what i need. Any help is highly appreciated, thanks!
  8. After all, it seems preg_match is quite simple: i found this in the manual: if (preg_match("/\b$wordLW\b/i", $v)) { return $v; } $wordLW is the word i am looking for $v is the text or sentence to search the word in It runs a insensitive search for the word and only returns exact matches...
  9. Hello, i am using strpos for quick searches in large texts, everything was perfect till i found out strpos return the ocurrences for the string in-spite of the fact that is the whole word or just a part of it, and i need only to return exact matches... To exemplify what i am trying to say: say, i use strpos to look for "wonder" in a string, i would like it to return: I wonder if... I like Wonder bread... but i need strpos to ignore matches of words such as: Alice in Wonderland... She uses wonder-bra... I am thinking I could build a function to read the # of characters in the word, then run strpos() and check if there is whitespace around the word when found, if there is, return it, if not, ignore it. Though it seems to me there should be already a way to do this within strpos(), flags? or extra-values? Nothing in the manual talked about exact matches, and preg_match_all is not a very frendly function for newbies... so basically i am asking if i should continue with strpos() or better start looking for something else?
  10. you guys rock, thanks for this... somehow i skipped the array_merge_recursive function in the manual. As i've been dealing with a lot of arrays lately, i've noticed that foreach is not the best way to deal with large arrays, built-in php functions for arrays should be the way to go, sadly, im still not that familiar with all of them...
  11. Hello, i've been reading the php manual and googling around with no luck so far. Basically what i want to do is to merge two arrays the following way: $array1 = array("dog" => "brown", "cat" => "white"); $array2 = array("dog" => "black", "duck" => "white") ; and have this as a result: $array3 = array("dog" => array("brown", "black"), "cat" => "white", "duck" => "white") ; I know it's possible and probably quite simple, but my tests so far have failed miserable, basically i get stuck trying to push "black" into the array, whenever "dog" is repeated. Any pointers or suggestions would be quite appreciated!
  12. Thanks guys, as it turned out my client just give me the ok to install as many modules as i want so i will be using the zip library. Though, i found out there is a way with Zlib, check this additional library, couldnt get it to work, but i imagine it should... http://www.winimage.com/zLibDll/minizip.html
  13. I've looking for a way to unzip zip files without actually installing the zip module and using my defaults bzip2 or zlib. The documentation in zlib claims it cannot due it by itself... And bzip2, well, it has no documentation pretty much. could someone point me a tutorial or resource that shows how to do it? (in case it's possible). Thanks in advance
  14. Ok, so after reading and reading, finding unanswered posts all over the web about the same topic and trying different approaches I believe there is no way to hold the results, then unzip, then read the xml without actually downloading the file. I'd tried cookies, system stores and the results where the same, the most i got was to store the entire results in system cache, but couldn't advance from there. Hope this conclusion of mine helps and saves time to other people looking to do the same approach.
  15. Hello everyone, i've been stuck for 2 days with a little piece of code that keeps giving me problems. Here is the deal: I'm connecting to the MSN Adcenter API, which for some reports it provides an xml withina zip file for download. Everything used to be fine running under localhost until i uploaded it to my server and i found out i do not have write permissions in the folder. So basically the question is, how to get the zip, unzip it and then get the contents of the xml without actually downloading any of the files?. I was able to read the zip contents without actually having to unzip the xml, though as for how to do it with the zip file im stuck. Here is the code i have so far. $ch = curl_init($downloadURL); if (! $ch) { die( "Cannot allocate a new PHP-CURL handle" ); } //$fp = fopen (dirname(__FILE__) . '/' . $fileName, 'w+');//This is the file where we save the information curl_setopt($ch, CURLOPT_HEADER, false); // Don’t return the header, just the html curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // Return contents as a string curl_setopt($ch, CURLOPT_BINARYTRANSFER,true); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); //curl_setopt($ch, CURLOPT_FILE, $fp); curl_setopt($ch, CURLOPT_TIMEOUT, 50); // CHECK THIS ONE curl_setopt($ch, CURLOPT_URL, $downloadURL); //curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // Avoids error curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); //Shouldn't be used curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true); curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2); curl_setopt($ch, CURLOPT_CAINFO, getcwd() . "/GTECyberTrustGlobalRoot.crt"); curl_setopt($ch, CURLOPT_PROXY, "XXXXXX.proxy.XXXXXX.XXXXX.com"); curl_setopt($ch, CURLOPT_PROXYPORT, XXXXXX); // curl_setopt ($ch, CURLOPT_PROXYUSERPWD, "username:password"); $data = curl_exec ($ch); //print $data; echo gettype($data); //var_dump($data); // var_dump($fp); if (!$data) { echo "<br />cURL error number:" .curl_errno($ch); echo "<br />cURL error:" . curl_error($ch); exit; } else { echo "<br>curl succeeded with" . $fileName; } // echo var_dump($data); curl_close($ch); //============================================= //================================= $zip = zip_open($fileName); //$zip = getcwd() . '/$fileName'; if (is_resource($zip)) { echo "reading the zip"; while ($zip_entry = zip_read($zip)) { echo "Name: " . zip_entry_name($zip_entry) . "\n"; echo "Actual Filesize: " . zip_entry_filesize($zip_entry) . "\n"; echo "Compressed Size: " . zip_entry_compressedsize($zip_entry) . "\n"; echo "Compression Method: " . zip_entry_compressionmethod($zip_entry) . "\n"; if (zip_entry_open($zip, $zip_entry, "r")) { echo "File Contents:\n"; $contentsXML = zip_entry_read($zip_entry, zip_entry_filesize($zip_entry)); $data = simplexml_load_string($contentsXML); zip_entry_close($zip_entry); } echo "\n"; } zip_close($zip); } else { echo "It is not a resource"; } So basically, i believe what i am looking for is to "unzip" the $data returned from the cURL? Any suggestions? Thanks Im running: PHP Version 5.3.1 zlib, compress.bzip2, phar, zip Enabled
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.