Jump to content

[SOLVED] Help with pulling out info from a txt file


Recommended Posts

Hi guys,

 

I hope someone will be able to help me on this one. I'm completely a newbie when it comes to regular expressions. After having looked google up and down for tutorials, guides and manuals for several hours i am still somewhat blank on what to do (i have attempted numerous times and failed :\).

 

Okay so here goes. I have an exported .txt file with a bunch of info in it (it can't be exported in any other state than showed beneath so this will have to do). The info as you can see is fairly similar formated.

 

Please see the sc_export.txt example for reference.

1. I will do a search for "ABC05234 "

2. This should return the stuff from that line in two arrays

- array1 = Text text

- array2 = 47 37 83 93

 

And that is basically it. As you can see the first array will always have the text part in it and the second part the numbers (phone numbers). Please bare i mind that a search on for example "ABC10239" would return a more complex line to be pulled out.

 

 

 

Example of the text file to search in (File: sc_export.txt):

-------------------------------------------------------

ABC02938 Text 3939 4940

ABC05234 Text text 47 37 83 93

ABC10239 Text-text 39483948

ABC10239 Text.text +44 3943 283948

 

....data continues with more simular data. Howefer there can be a difference in how many spaces there are between the data.

-------------------------------------------------------

 

I thank all of you for your help in advance.

 

Christoffer

Probably not the best regex, but my take:

 

/(\w*\d*) ([a-zA-Z]*.[a-zA-Z]*) (.*)/

 

you can change the first part to the actual ABC123 if you are searching for 1 particular line like so:

 

/ABC02938 ([a-zA-Z]*.[a-zA-Z]*) (.*)/

 

<?php
$example[] = "ABC02938 Text 3939 4940";
$example[] = "ABC05234 Text text 47 37 83 93";
$example[] = "ABC10239 Text-text 39483948";
$example[] = "ABC10239 Text.text +44 3943 283948";

foreach($example as $string) {
   echo "$string<br/>";
   preg_match("/(\w*\d*) ([a-zA-Z]*.[a-zA-Z]*) (.*)/",$string,$result);
   echo "<pre>";
   print_r($result);
   echo "</pre><br />";
}
?>

 

output:

 

ABC02938 Text 3939 4940

Array
(
    [0] => ABC02938 Text 3939 4940
    [1] => ABC02938
    [2] => Text 
    [3] => 3939 4940
)


ABC05234 Text text 47 37 83 93

Array
(
    [0] => ABC05234 Text text 47 37 83 93
    [1] => ABC05234
    [2] => Text text
    [3] =>  47 37 83 93
)


ABC10239 Text-text 39483948

Array
(
    [0] => ABC10239 Text-text 39483948
    [1] => ABC10239
    [2] => Text-text
    [3] =>  39483948
)


ABC10239 Text.text +44 3943 283948

Array
(
    [0] => ABC10239 Text.text +44 3943 283948
    [1] => ABC10239
    [2] => Text.text
    [3] =>  +44 3943 283948
)

[/code]

Do you know what.. That looks amazing..  ;D

Almost what i was looking for. I think you misunderstood me slightly there.. I only need to pull out one line - being the line of which it matches the search.

 

The example could be that i search for the line "ABC05234" which could be stated in for example $search_query. It finds this line:

ABC05234 Text text 47 37 83 93

 

And then returns:

Array
(
    [0] => ABC05234 Text text 47 37 83 93
    [1] => ABC05234
    [2] => Text text
    [3] =>  47 37 83 93
)

 

... Also the info would have to be pulled out from the txt file.. So the function should import the data in the txt file and search in that..

I edited my post because I thought maybe you were looking for 1 line returned, but you were already in thread 

 

$search = "ABC02938";

preg_match("/(" . $search . ") ([a-zA-Z]*.[a-zA-Z]*) (.*)/",$string,$result);

 

 

... Also the info would have to be pulled out from the txt file.. So the function should import the data in the txt file and search in that..

 

for that you would loop through each line of the text file.  You can for instance use fgets to read the file line by line or file to put all the lines into an array and loop through each array element. 

I am not quite sure how to do this.

 

I really appreciate your help. Would you be kind enough to provide an example of how this might be put into work with the rest of the code? Bearing in mind that i only want the array containing the result from the search.

 

Thank you so much.

Yeah sorry i did however try to implement file imput which only returns an error:

Warning: Invalid argument supplied for foreach()

 

<?php
$search = "ABC10239";

$handle = @fopen("sc_export.txt", "r");
if ($handle) {
    while (!feof($handle)) {
        $buffer = fgets($handle, 4096);

foreach($buffer as $string) {
   echo "$string<br/>";
   preg_match("/(" . $search . ") ([a-zA-Z]*.[a-zA-Z]*) (.*)/",$string,$result);
   echo "<pre>";
   print_r($result);
   echo "</pre><br />";
}
    }
    
    fclose($handle);
}
?>

 

Also i noticed that the test file i have with only those 4 lines in it takes quite some time for the script to load.. So i'm afraid that it will completely stall when when i am to use the real file with several 100 lines in it...

$buffer is not an array; it's a string containing the info of the current line in your file from the while loop.  You don't need to use a foreach loop with fgets.  You already have a loop there: the while loop.  Get rid of the foreach loop and preg_match $buffer.

Ah yes of course.. :)

 

Okay so this is what i've got so far which works. Two things though.

 

Firstly: I would want the script to recognize which returns the true result and then put it into a variable for each array. I don't get how to do that.

 

Secondly: I noticed if i were to put a 3rd line of text into the the line it returns it in the 3rd array instead of the 2nd. Is there no way to make it not care about spaces and just capture all that starts with letters in the 2nd array?

 

The code so far:

<?php
$search = "ABC11923";
$handle = @fopen("sc_export.txt", "r");
if ($handle) {
    while (!feof($handle)) {
        $string = fgets($handle, 4096);
   preg_match("/(" . $search . ") ([a-zA-Z]*.[a-zA-Z]*) (.*)/",$string,$result);
   echo "<pre>";
   print_r($result);
   echo "</pre><br />";
   
    }
    fclose($handle);
}
?>

 

Returns:

Array
(
)


Array
(
)


Array
(
)


Array
(
    [0] => ABC11923 Fiiiz KEla dag +43 323332 3234 33
    [1] => ABC11923
    [2] => Fiiiz KEla
    [3] => dag +43 323332 3234 33
)

Firstly: I would want the script to recognize which returns the true result and then put it into a variable for each array. I don't get how to do that.

Nevermind this.. It is of course obvious how to do it.. Sorry for my stupidity :)

 

Okay so this is how the code now looks, which works smoothly:

<?php
$search = "ABC11923";
$handle = @fopen("sc_export.txt", "r");
if ($handle) {
    while (!feof($handle)) {
        $string = fgets($handle, 4096);
            }
    fclose($handle);
}
   preg_match("/(" . $search . ") ([a-zA-Z]*.[a-zA-Z]*) (.*)/",$string,$result);
echo $result[0] . "<br>";
   echo $result[1] . "<br>";
   echo $result[2] . "<br>";
   echo $result[3];
?>

 

Output:

ABC11923 Fiiiz KEla dag +43 323332 3234 33
ABC11923
Fiiiz KEla
dag +43 323332 3234 33

 

So back to the important second question i was asking about:

I noticed if i were to put a 3rd line of text into the the line it returns it in the 3rd array instead of the 2nd. Is there no way to make it not care about spaces and just capture all that starts with letters in the 2nd array?

Okay well I know if you do this:

 

preg_match("/(" . $search . ") ([a-zA-Z]*.[a-zA-Z]*.[a-zA-Z]*) (.*)/",$string,$result);

 

It will match up to 3 instead of 2.  I'm not that great at regex either though, so if you're wanting it do any number and not have to keep adding on extra .[a-zA-Z]* in there...I'm not sure how to do that.  I tried different variations of

 

preg_match("/(" . $search . ") ([a-zA-Z]*.)* (.*)/",$string,$result);

 

but so far no dice.  Maybe someone more versed in regex will come along in the mean time...

Alright yeah you can surely do it like that.. And then just add the maximum number i would guess it could contain.

 

I did however make a couple of changes to the code i posted just before as it was not giving the correct output (just posting for historic reference).

 

Code (incuding the new changes):

$search = "ABC11923";
$handle = @fopen("sc_export.txt", "r");
if ($handle) {
while (!feof($handle)) {
	$string = fgets($handle, 4096);
if( preg_match("/(" . $search . ") ([a-zA-Z]*.[a-zA-Z]*.[a-zA-Z]*) (.*)/",$string,$result)) 
	{  
	$check_return = "true";      
	echo $result[0] . "<br>";
	echo $result[1] . "<br>";
	echo $result[2] . "<br>";
	echo $result[3];
	}
}
fclose($handle);
}

Output being:

ABC11923 Fiiiz KEla dag +43 323332 3234 33
ABC11923
Fiiiz KEla dag
+43 323332 3234 33

 

This does exactly what i was looking for.

I will let this topic open for some time if anyone should come along with a more sofisticated solution to the regex pattern  :)

 

But you have been a great help. Thank you so much !  :D

okay I got it!

 

preg_match("/(" . $search . ") ([a-zA-Z].*[a-zA-Z]) (.*)/",$string,$result);

 

I decided who cares about trying to match variable amounts of words and - or . or spaces.  Overall, it starts with a letter and ends with a letter, right?  so just grab everything from first letter of first word to last letter of last word and move on.

Hehe it was nagging you wasn't it :P

 

Well.. Yeah I think so - almost sure. Honestly I haven't got the exported file with me at the moment. I will however have access to it on Monday. So I will be looking into it there.

 

It is working like a charm now so I won't be messing more with it until Monday when I will be able to use it in production with the real txt file.

Hey.. So i'm checking it out with the real data file just now. And it works like a charm. It's pretty fast too (i feared it being slow with so much data to search through)...

Only problem is that some of the text is with foreign letters which is not in the english alphabet. Hence it wont collect that text

 

Example:

ABC09988 Køtæv Blah 5544 5410

 

... These letters Ø and Æ aren't collected as letters. Hence it will stop already at the K and figure the rest is not letters.

 

 

How the **** do i get around this??  ???

Note: I tried disgarding the whole file import function and just do a $test_string = "ABC09988 Køtæv Blah 5544 5410"; and do the preg_match on that... Still same problem. I don't recognize the letters as letters i guess......  :'(

 

I tried the below in the hope it would then include those letters. But i guess php don't recognizes it cause that doesn't work either:

([a-zA-Zæøå].*[a-zA-Zæøå])

 

Do you know what.. I'm a fool.. The test actually did return those letters.. Don't know what i was doing to think i didn't....

 

So it is actually my ajax script transporting the text back to a another page that mess up.. I'll have a look at this.. Don't mind any of the above :P

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.