Jump to content


Photo

Counting the number of alphabets in a text file..


  • Please log in to reply
3 replies to this topic

#1 vidyashankara

vidyashankara
  • Members
  • PipPipPip
  • Advanced Member
  • 75 posts

Posted 07 June 2006 - 10:29 PM

I am stumped with this one.... I have the following text file.

Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9

The script must check if A exists and add 1, Then check if B exists and add 1 again. In the aboce case the total is 2.

If you have a 100 rows with A and a 100 more with B, and a 100 more with C, and a 100 more rows with D, It must say 4 no matter how many times they repeat.

Any function to do that?



#2 poirot

poirot
  • Members
  • PipPipPip
  • Advanced Member
  • 646 posts
  • LocationAustin, TX

Posted 07 June 2006 - 10:51 PM

<?php

// Sets the string to search
$str =
'Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9';

// If you want to retrieve the values from a file, use this:
// $str = file_get_contents('file.txt');

preg_match_all("/Atom ([A-Za-z]+)*/", $str, $m);

$matches = array_unique($m[1]);
sort($matches);

echo 'There are ' . count($matches) . ' unique atoms, and they are: ' . implode(", ", $matches);
?>

Which will produce:
There are 2 unique atoms, and they are: A, B

Acceptable characters for atoms are alpha characters only. Following it, everything will be ignored.

OFF-TOPIC: Chemistry remembers me of one of my biggest failures during high school time. I wanted to get to the International Chemistry Olympiad, but failed even to get a medal in my country's national olympiad.
~ D Kuang

#3 vidyashankara

vidyashankara
  • Members
  • PipPipPip
  • Advanced Member
  • 75 posts

Posted 07 June 2006 - 11:10 PM

[!--quoteo(post=381204:date=Jun 7 2006, 06:51 PM:name=poirot)--][div class=\'quotetop\']QUOTE(poirot @ Jun 7 2006, 06:51 PM) View Post[/div][div class=\'quotemain\'][!--quotec--]
<?php

// Sets the string to search
$str =
'Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9';

// If you want to retrieve the values from a file, use this:
// $str = file_get_contents('file.txt');

preg_match_all("/Atom ([A-Za-z]+)*/", $str, $m);

$matches = array_unique($m[1]);
sort($matches);

echo 'There are ' . count($matches) . ' unique atoms, and they are: ' . implode(", ", $matches);
?>

Which will produce:
There are 2 unique atoms, and they are: A, B

Acceptable characters for atoms are alpha characters only. Following it, everything will be ignored.

OFF-TOPIC: Chemistry remembers me of one of my biggest failures during high school time. I wanted to get to the International Chemistry Olympiad, but failed even to get a medal in my country's national olympiad.
[/quote]

Thats the exact output i am looking for... but if the file has this format?
ATOM   1518  CA  SER  A 228         8.050  59.549  36.083  1.00133.17           C  
ATOM   1509  NZ  LYS   A 226         8.799  52.838  30.855  1.00185.44           N  
ATOM   1510  N    VAL   A  227        6.540  60.967  32.439  1.00185.44           N  
ATOM   1511  CA  VAL   A  227        6.596  61.986  33.485  1.00185.44           C  
ATOM   1512  C    VAL   A 227         7.460  61.548  34.678  1.00185.44           C  
ATOM   2678  CA  TYR B 166      84.798   4.913  43.573  1.00110.37           C  
ATOM   2679  C   TYR B 166      83.466   4.230  43.723  1.00110.37           C  
ATOM   2680  O   TYR B 166      82.569   4.716  44.405  1.00110.37           O  
ATOM   2681  CB  TYR B 166      85.058   5.370  42.152  1.00110.37           C  
ATOM   2682  CG  TYR B 166      86.540   5.592  41.920  1.00110.37           C  
ATOM   2683  CD1 TYR B 166      87.211   6.635  42.558  1.00110.37           C  
ATOM   2684  CD2 TYR B 166      87.278   4.742  41.096  1.00110.37           C  
ATOM   2685  CE1 TYR B 166      88.578   6.831  42.384  1.00110.37           C  

the 2nd, 3rd and 4rt column can be anything. The script should read based on the 1st and 5th column.


#4 poirot

poirot
  • Members
  • PipPipPip
  • Advanced Member
  • 646 posts
  • LocationAustin, TX

Posted 08 June 2006 - 12:46 AM

The following will do:

<?php

$str = 'BE SURE THE ORDER IS CORRECT';

// If you want to retrieve the values from a file, use this:
// $str = file_get_contents('file.txt');

$array = explode("\n", $str);

for ($i=0; $i<count($array); $i++) {
   preg_match_all("/([A-Za-z0-9\.]+)(?:[ ]+)*/", $array[$i], $m);
   $matches[$i] = $m[1];
   $fifth[$i] = $m[1][4];
}

$fifth = array_unique($fifth);
sort($fifth);

echo '<pre>';
echo 'All Matches: ' . "\n\r";
print_r($matches);
echo '</pre>';

echo 'There are ' . count($fifth) . ' unique atoms, and they are: ' . implode(", ", $fifth);

?>

$fifth will contain the fifth item of each line.
$matches is a multidimensional array with all the matches (easier to handle).
~ D Kuang




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users