pyrodude Posted July 16, 2007 Share Posted July 16, 2007 So, I made a PHP code that contacts babel.altavista.com with text to be translated, and then uses substr() to pull out the translated text. I'm just starting out in the wide world of PHP, and discovered the preg_match function, however I can't seem to find any documentation written for the layperson. I know it's possible to optimize and shorten my code using preg_match (at least I think it is...), and was hoping someone here could help me out. Again, my code works, but I'd rather not use all the strpos() and substr() calls if I can avoid it. The code in question is the strip_translation() function (lines 11-36) <html> <head> <title>Untitled</title> </head> <body> <? // Function to reduce the Babelfish page to just the translated text function strip_translation($htmlfile) { // String at the beginning of the translated text $str1 = "<td bgcolor=white class=s><div style=padding:10px;>"; // String just after the translated text $str2 = "</div>"; $file = fopen($htmlfile,"r"); $readfile = fread($file,filesize($htmlfile)); fclose($file); // Location of first string in $readfile $startpos = strpos($readfile,$str1); //Location of second string in $readfile $endpos = strpos($readfile,$str2); // Length of translated text (plus a little extra junk) $len = ($endpos - $startpos); // Strip off most of the excess garbage $stripped = substr($readfile,$startpos,$len); $value = $stripped; // 23 is the garbage around it //$lenS = (strlen($stripped) - 23); // 0 is the start position of the text //$value = substr($stripped,0,$lenS); // Clean up $ delete the $output file // unlink($htmlfile); return $value; } $filename = "test.txt"; if (isset($_POST['transtext'])) { $filewrite = @fopen($filename,"w"); fwrite($filewrite,$_POST['transtext']); fclose($filewrite); } if (filesize($filename) > 0) { $cont = @fopen($filename,"r"); $contread = fread($cont,filesize($filename)); fclose($cont); } if (isset($_POST['transtext']) && (strlen($_POST['lang'])>1)){ // Store search values in $file and the output file is $outputfile $filename = "test.txt"; $filewrite = fopen($filename,"w"); fwrite($filewrite,$_POST['transtext']); fclose($filewrite); $outputfile = "test.html"; $readfile = fopen($filename,"r"); // Set $site = Altavista's translation page $site = "http://babel.altavista.com/tr?trtext=".str_replace(" ","+",fgets($readfile))."&lp={$_POST['lang']}"; // Read the translation page $remote = fopen($site,r); // Open the local copy to save $local = fopen($outputfile,w); // Read $remote and write to $local while(!feof($remote)) { $line = fgets($remote, 1024); fputs($local,$line,strlen($line)); } // Close files fclose($remote); fclose($local); fclose($readfile); } elseif (isset($_POST['trans'])) { echo "<h1><b>Please select text and language for translation.</b></h1>"; } ?> Text to translate:<br> <form method=post><div style="float:right;">Select Language:<br><select name="lang" style="font-size: 0.8em;"> <option value="">Select from and to languages</option> <option value="zh_en">Chinese-simp to English</option> <option value="zt_en">Chinese-trad to English</option> <option value="en_zh">English to Chinese-simp</option> <option value="en_zt">English to Chinese-trad</option> <option value="en_nl">English to Dutch</option> <option value="en_fr">English to French</option> <option value="en_de">English to German</option> <option value="en_el">English to Greek</option> <option value="en_it">English to Italian</option> <option value="en_ja">English to Japanese</option> <option value="en_ko">English to Korean</option> <option value="en_pt">English to Portuguese</option> <option value="en_ru">English to Russian</option> <option value="en_es">English to Spanish</option> <option value="nl_en">Dutch to English</option> <option value="nl_fr">Dutch to French</option> <option value="fr_en">French to English</option> <option value="fr_de">French to German</option> <option value="fr_el">French to Greek</option> <option value="fr_it">French to Italian</option> <option value="fr_pt">French to Portuguese</option> <option value="fr_nl">French to Dutch</option> <option value="fr_es">French to Spanish</option> <option value="de_en">German to English</option> <option value="de_fr">German to French</option> <option value="el_en">Greek to English</option> <option value="el_fr">Greek to French</option> <option value="it_en">Italian to English</option> <option value="it_fr">Italian to French</option> <option value="ja_en">Japanese to English</option> <option value="ko_en">Korean to English</option> <option value="pt_en">Portuguese to English</option> <option value="pt_fr">Portuguese to French</option> <option value="ru_en">Russian to English</option> <option value="es_en">Spanish to English</option> <option value="es_fr">Spanish to French</option> </select></div><textarea cols=40 rows=6 name="transtext"><?=$contread;?></textarea><br><p> <input type="submit" name="trans" value="Translate<?if (isset($_POST['trans'])){echo " Again";}?>"></form> <? if (isset($_POST['trans']) && (strlen($_POST['lang'])>1)){ echo "<br><br><div style=\"border=solid 1px #000000;float:left;padding:5px;\">"; echo strip_translation($outputfile); echo "</div>"; } ?> </body> </html> Quote Link to comment Share on other sites More sharing options...
effigy Posted July 16, 2007 Share Posted July 16, 2007 <pre> <?php // Example snip. $html = <<<HTML <form action="http://www.altavista.com/web/results" method=get> <td valign=top><b class=m><font color=#0000000> Auf deutsch:</font></b></td> </tr> <tr> <td bgcolor=white class=s><div style=padding:10px;>Wie heißen Sie?</div></td> </tr> <tr> HTML; preg_match('#(?<=<td bgcolor=white class=s><div style=padding:10px;>)(.*?)(?=</div>)#', $html, $matches); print_r($matches); ?> </pre> Quote Link to comment Share on other sites More sharing options...
pyrodude Posted July 17, 2007 Author Share Posted July 17, 2007 I appreciate the response! Is there any way you can break it down is to what it means? I grasp the fact that the #'s on the end denote where the expression is, but why does the first one start with (?<= while the second one starts with just (?= ? Also, what does the center term that yields the results mean? (.*?) returns all matches? I guess I don't understand the roles of the operators in this function. Thanks again! Quote Link to comment Share on other sites More sharing options...
effigy Posted July 17, 2007 Share Posted July 17, 2007 They are called lookarounds because they only look; they don't match characters, but positions between characters. (.*?) will match (and capture) 0 or more characters in an ungreedy fashion. There's more information here. Quote Link to comment Share on other sites More sharing options...
pyrodude Posted July 18, 2007 Author Share Posted July 18, 2007 Thanks for the extra reading material! Much appreciated! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.