Jump to content

[SOLVED] preg_match()


pyrodude

Recommended Posts

So, I made a PHP code that contacts babel.altavista.com with text to be translated, and then uses substr() to pull out the translated text.  I'm just starting out in the wide world of PHP, and discovered the preg_match function, however I can't seem to find any documentation written for the layperson.  I know it's possible to optimize and shorten my code using preg_match (at least I think it is...), and was hoping someone here could help me out.

 

Again, my code works, but I'd rather not use all the strpos() and substr() calls if I can avoid it.

 

The code in question is the strip_translation() function (lines 11-36)

 

<html>
<head>
        
<title>Untitled</title>

</head>
<body>

<?
// Function to reduce the Babelfish page to just the translated text
function strip_translation($htmlfile) {
    // String at the beginning of the translated text
    $str1 = "<td bgcolor=white class=s><div style=padding:10px;>";
    // String just after the translated text
    $str2 = "</div>";
    $file = fopen($htmlfile,"r");
    $readfile = fread($file,filesize($htmlfile));
    fclose($file);
    // Location of first string in $readfile
    $startpos = strpos($readfile,$str1);
    //Location of second string in $readfile
    $endpos = strpos($readfile,$str2);
    // Length of translated text (plus a little extra junk)
    $len = ($endpos - $startpos);
    // Strip off most of the excess garbage
    $stripped = substr($readfile,$startpos,$len);
    $value = $stripped;
    // 23 is the garbage around it
    //$lenS = (strlen($stripped) - 23);
    // 0 is the start position of the text
    //$value = substr($stripped,0,$lenS);
    // Clean up $ delete the $output file
    // unlink($htmlfile);

    return $value;
}
$filename = "test.txt";
if (isset($_POST['transtext'])) {
    $filewrite = @fopen($filename,"w");
    fwrite($filewrite,$_POST['transtext']);
    fclose($filewrite);
}
if (filesize($filename) > 0) {
    $cont = @fopen($filename,"r");
    $contread = fread($cont,filesize($filename));
    fclose($cont);
}
if (isset($_POST['transtext']) && (strlen($_POST['lang'])>1)){
    // Store search values in $file and the output file is $outputfile
    $filename = "test.txt";
    $filewrite = fopen($filename,"w");
    fwrite($filewrite,$_POST['transtext']);
    fclose($filewrite);
    $outputfile = "test.html";
    $readfile = fopen($filename,"r");
    // Set $site = Altavista's translation page
    $site = "http://babel.altavista.com/tr?trtext=".str_replace(" ","+",fgets($readfile))."&lp={$_POST['lang']}";
    // Read the translation page
    $remote = fopen($site,r);
    // Open the local copy to save
    $local = fopen($outputfile,w);
    // Read $remote and write to $local
    while(!feof($remote)) {
        $line = fgets($remote, 1024);
        fputs($local,$line,strlen($line));
    }
    // Close files
    fclose($remote);
    fclose($local);
    fclose($readfile);
}
elseif (isset($_POST['trans'])) {
    echo "<h1><b>Please select text and language for translation.</b></h1>";
}
?>
Text to translate:<br>
<form method=post><div style="float:right;">Select Language:<br><select name="lang" style="font-size: 0.8em;">
<option value="">Select from and to languages</option>
<option value="zh_en">Chinese-simp to English</option>
<option value="zt_en">Chinese-trad to English</option>
<option value="en_zh">English to Chinese-simp</option>
<option value="en_zt">English to Chinese-trad</option>
<option value="en_nl">English to Dutch</option>
<option value="en_fr">English to French</option>
<option value="en_de">English to German</option>
<option value="en_el">English to Greek</option>
<option value="en_it">English to Italian</option>
<option value="en_ja">English to Japanese</option>
<option value="en_ko">English to Korean</option>
<option value="en_pt">English to Portuguese</option>
<option value="en_ru">English to Russian</option>
<option value="en_es">English to Spanish</option>
<option value="nl_en">Dutch to English</option>
<option value="nl_fr">Dutch to French</option>
<option value="fr_en">French to English</option>
<option value="fr_de">French to German</option>
<option value="fr_el">French to Greek</option>
<option value="fr_it">French to Italian</option>
<option value="fr_pt">French to Portuguese</option>
<option value="fr_nl">French to Dutch</option>
<option value="fr_es">French to Spanish</option>
<option value="de_en">German to English</option>
<option value="de_fr">German to French</option>
<option value="el_en">Greek to English</option>
<option value="el_fr">Greek to French</option>
<option value="it_en">Italian to English</option>
<option value="it_fr">Italian to French</option>
<option value="ja_en">Japanese to English</option>
<option value="ko_en">Korean to English</option>
<option value="pt_en">Portuguese to English</option>
<option value="pt_fr">Portuguese to French</option>
<option value="ru_en">Russian to English</option>
<option value="es_en">Spanish to English</option>
<option value="es_fr">Spanish to French</option>
</select></div><textarea cols=40 rows=6 name="transtext"><?=$contread;?></textarea><br><p>
<input type="submit" name="trans" value="Translate<?if (isset($_POST['trans'])){echo " Again";}?>"></form>
<?
if (isset($_POST['trans']) && (strlen($_POST['lang'])>1)){
    echo "<br><br><div style=\"border=solid 1px #000000;float:left;padding:5px;\">";
    echo strip_translation($outputfile);
    echo "</div>";
}
?>
</body>
</html>

Link to comment
Share on other sites

<pre>
<?php
// Example snip.
$html = <<<HTML
<form action="http://www.altavista.com/web/results" method=get>
	    <td valign=top><b class=m><font color=#0000000>

Auf deutsch:</font></b></td>
	   </tr>	   <tr>
	    <td bgcolor=white class=s><div style=padding:10px;>Wie heißen Sie?</div></td>
	   </tr>
	   <tr>
HTML;
preg_match('#(?<=<td bgcolor=white class=s><div style=padding:10px;>)(.*?)(?=</div>)#', $html, $matches);
print_r($matches);
?>
</pre>

Link to comment
Share on other sites

I appreciate the response!  Is there any way you can break it down is to what it means?  I grasp the fact that the #'s on the end denote where the expression is, but why does the first one start with (?<= while the second one starts with just (?= ?  Also, what does the center term that yields the results mean? (.*?) returns all matches?  I guess I don't understand the roles of the operators in this function.

 

Thanks again!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.