Jump to content

[SOLVED] Import data from text file


lindm

Recommended Posts

Hi all.

 

Got some help here some months ago with a script to import data from a text file according to following:

 

Part of text file:

#UB	0	1930	100000
#UB	0	1940	200000
#RES	0	6620	300000
#RES	-1	6620	400000

The code extracts the number to the right depending on the three first identifiers: #XX, 0 or -1, four digit number. The final code allows for the sum of a range of numbers to be returned:


function calculate_range($cat, $bool, $range_min, $range_max)
{
    
$contents = file($_FILES['userfile']['tmp_name']);
    foreach($contents as $line_value)
    {
        list($_cat, $_bool, $_id, $num) = explode("\t", $line_value);

        $_cat = str_replace('#', '', $_cat);

        $number[$_cat][$_bool][$_id] = trim($num);
    }

    $total = 0;
    for($i = $range_min; $i <= $range_max; $i++)
    {
        if(isset($number[$cat][$bool][$i]))
        {
            $total += $number[$cat][$bool][$i];
        }
    }

    return $total;

}

 

Example: calculate_range('UB', 0, 1930, 1940) would return 300000 from the data above.

 

Now my problem. I have now found that the files I try to import come with various delimiters. The code is written for a tab based delimiter I believe but I want it also to work for a single blank space delimiter. Been fiddeling around with the code but a no go so far. Help appreciated.

 

Text file with single space delimiter:

#RES 0 3016 -10711941.97
#RES 0 3031 -23125.00
#RES 0 3080 -1685803.07
#RES 0 3740 3.46
#RES 0 4027 1280448.49
#RES 0 4610 2833530.00
#RES 0 4620 197317.72
#RES 0 5010 28944.00

 

 

 

Link to comment
Share on other sites

Try:

 

function calculate_range($cat, $bool, $range_min, $range_max, $tab = TRUE)

 

Then do inside the function:

 

$split_by = $tab ? "\t" : " ";

 

Then change:

 

list($_cat, $_bool, $_id, $num) = explode("\t", $line_value);

 

To:

 

list($_cat, $_bool, $_id, $num) = explode($split_by, $line_value);

 

And if you're splitting by just a single space, when calling your function:

 

calculate_range($cat, $bool, $range_min, $range_max, FALSE);

 

Add false as your last parameter.

 

If you're doing by the tab, just keep it:

 

calculate_range($cat, $bool, $range_min, $range_max);

Link to comment
Share on other sites

Alternatively, replace this line:

list($_cat, $_bool, $_id, $num) = explode("\t", $line_value);

 

With:

list($_cat, $_bool, $_id, $num) = preg_split("/(\t| )/",$line_value);

 

That way the function wont require an extra parameter, and wont care if the file is space or tab delimited.

Link to comment
Share on other sites

preg_split seems to be working but now to another problem. Sometimes large text files are used and only a part of this file contains the needed data.

The text files normally contains loads of this (no interest for the function):

{
#TRANS	2890	{	}	-1080.00	20070408	""	
#TRANS	7330	{	}	1080.00	20070408	""	
}
#VER	A	"88"	20070410	"ver 87"	20070410 
{
#TRANS	2890	{	}	1080.00	20070410	""	
#TRANS	1930	{	}	-1080.00	20070410	""	
}
#VER	A	"89"	20070410	"Kort Mars"	20070410 
{
#TRANS	1930	{	}	-8025.00	20070410	""	

and then at the end or beginning follows the interesting part.

 

Furthermore the calculate_range($cat, $bool, $range_min, $range_max) function is called I guess 200 times when executing my whole php page. This seems to be causing an timeout (30 sec) on the server. Is the above code efficient?

 

 

 

Link to comment
Share on other sites

Two things occur to me. Firstly, does the string '#RES' ever occur outside of the part of the text file you are interested in? If not, i suggest you read the file as a string, and find out where the first and last occurances of this are, swe we're not dealing with lots of redundant data.

 

Second. You say your function may get called around 200 times. Why is that? Is that because you are are working out totals for 200 different categories? If so, then your code is very inefficient The first part of the function is being repeated unnecessarily. The part that generates the array of values only needs to happen once, it is just the totalling which needs to be repeated.

Link to comment
Share on other sites

Think I got it faster now. I separated the variable from the function it it is sooo fast now.

$contents = file($_FILES['userfile']['tmp_name']); 

foreach($contents as $line_value)
{

list($cat, $int, $id, $num) = preg_split("/(\t| )/",$line_value);

    $cat = str_replace('#', '', $cat);

    $number[$cat][$int][$id] = trim($num);
}
}
function calculate_range($cat, $bool, $range_min, $range_max)
{
    global $number;

    $total = 0;

    for($i = $range_min; $i <= $range_max; $i++)
    {
        if(isset($number[$cat][$bool][$i]))
        {
            $total += $number[$cat][$bool][$i];
        }
    }

    return $total;
}

 

Still curious about the filter though...

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.