Jump to content

Archived

This topic is now archived and is closed to further replies.

vidyashankara

Counting the number of alphabets in a text file..

Recommended Posts

I am stumped with this one.... I have the following text file.

[code]
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
[/code]

The script must check if A exists and add 1, Then check if B exists and add 1 again. In the aboce case the total is 2.

If you have a 100 rows with A and a 100 more with B, and a 100 more with C, and a 100 more rows with D, It must say 4 no matter how many times they repeat.

Any function to do that?

Share this post


Link to post
Share on other sites
[code]<?php

// Sets the string to search
$str =
'Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9';

// If you want to retrieve the values from a file, use this:
// $str = file_get_contents('file.txt');

preg_match_all("/Atom ([A-Za-z]+)*/", $str, $m);

$matches = array_unique($m[1]);
sort($matches);

echo 'There are ' . count($matches) . ' unique atoms, and they are: ' . implode(", ", $matches);
?>[/code]

Which will produce:
There are 2 unique atoms, and they are: A, B

Acceptable characters for atoms are alpha characters only. Following it, everything will be ignored.

OFF-TOPIC: Chemistry remembers me of one of my biggest failures during high school time. I wanted to get to the International Chemistry Olympiad, but failed even to get a medal in my country's national olympiad.

Share this post


Link to post
Share on other sites
[!--quoteo(post=381204:date=Jun 7 2006, 06:51 PM:name=poirot)--][div class=\'quotetop\']QUOTE(poirot @ Jun 7 2006, 06:51 PM) [snapback]381204[/snapback][/div][div class=\'quotemain\'][!--quotec--]
[code]<?php

// Sets the string to search
$str =
'Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom A 65.1
Atom A 65.0
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9
Atom B 65.9
Atom B 87.9';

// If you want to retrieve the values from a file, use this:
// $str = file_get_contents('file.txt');

preg_match_all("/Atom ([A-Za-z]+)*/", $str, $m);

$matches = array_unique($m[1]);
sort($matches);

echo 'There are ' . count($matches) . ' unique atoms, and they are: ' . implode(", ", $matches);
?>[/code]

Which will produce:
There are 2 unique atoms, and they are: A, B

Acceptable characters for atoms are alpha characters only. Following it, everything will be ignored.

OFF-TOPIC: Chemistry remembers me of one of my biggest failures during high school time. I wanted to get to the International Chemistry Olympiad, but failed even to get a medal in my country's national olympiad.
[/quote]

Thats the exact output i am looking for... but if the file has this format?
[code]
ATOM   1518  CA  SER  A 228         8.050  59.549  36.083  1.00133.17           C  
ATOM   1509  NZ  LYS   A 226         8.799  52.838  30.855  1.00185.44           N  
ATOM   1510  N    VAL   A  227        6.540  60.967  32.439  1.00185.44           N  
ATOM   1511  CA  VAL   A  227        6.596  61.986  33.485  1.00185.44           C  
ATOM   1512  C    VAL   A 227         7.460  61.548  34.678  1.00185.44           C  
ATOM   2678  CA  TYR B 166      84.798   4.913  43.573  1.00110.37           C  
ATOM   2679  C   TYR B 166      83.466   4.230  43.723  1.00110.37           C  
ATOM   2680  O   TYR B 166      82.569   4.716  44.405  1.00110.37           O  
ATOM   2681  CB  TYR B 166      85.058   5.370  42.152  1.00110.37           C  
ATOM   2682  CG  TYR B 166      86.540   5.592  41.920  1.00110.37           C  
ATOM   2683  CD1 TYR B 166      87.211   6.635  42.558  1.00110.37           C  
ATOM   2684  CD2 TYR B 166      87.278   4.742  41.096  1.00110.37           C  
ATOM   2685  CE1 TYR B 166      88.578   6.831  42.384  1.00110.37           C  
[/code]

the 2nd, 3rd and 4rt column can be anything. The script should read based on the 1st and 5th column.

Share this post


Link to post
Share on other sites
The following will do:

[code]<?php

$str = 'BE SURE THE ORDER IS CORRECT';

// If you want to retrieve the values from a file, use this:
// $str = file_get_contents('file.txt');

$array = explode("\n", $str);

for ($i=0; $i<count($array); $i++) {
   preg_match_all("/([A-Za-z0-9\.]+)(?:[ ]+)*/", $array[$i], $m);
   $matches[$i] = $m[1];
   $fifth[$i] = $m[1][4];
}

$fifth = array_unique($fifth);
sort($fifth);

echo '<pre>';
echo 'All Matches: ' . "\n\r";
print_r($matches);
echo '</pre>';

echo 'There are ' . count($fifth) . ' unique atoms, and they are: ' . implode(", ", $fifth);

?>[/code]

$fifth will contain the fifth item of each line.
$matches is a multidimensional array with all the matches (easier to handle).

Share this post


Link to post
Share on other sites

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.