chadrt Posted July 8, 2007 Share Posted July 8, 2007 I am hoping I have the forum correct it was a toss up. I am looking to parse out what I believe to be a flat file. I want to load this file into a MySQL database it seems to be very dificult to read a lot of lines multiple rows and very time consuming. I want to put it in a MySQL so I can create indexs that will make the querys much faster. Thanks for any help you may be able to provide me. Here is the location of the file http://status.irlp.net/nohtmlstatus.txt Chad Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/ Share on other sites More sharing options...
Yesideez Posted July 8, 2007 Share Posted July 8, 2007 Wow, lots of data! I've copied and pasted the lot into my text editor and can see that you've got all the columns separated by tabs. What I'd do is replace all these tab characters with a character that isn't used, like the bar (|) symbol. Next create a table in your database that will hold the data. I'd write a PHP script to read the file in then loop through each line exploding it. That would separate each piece of data into a long array which you can then easily add into the database. Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-292424 Share on other sites More sharing options...
Yesideez Posted July 8, 2007 Share Posted July 8, 2007 I've replaced all the tabs for you with a | symbol and uploaded it here for you: http://www.pictureinthesky.net/external/nohtmlstatus.txt Let me know when you've saved it and I'll delete it. Using PHP, in a loop use on the file to explode the contents into an array: $array=explode('|',$row); $array is the array containing the line $row. Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-292432 Share on other sites More sharing options...
chadrt Posted July 9, 2007 Author Share Posted July 9, 2007 Changing the tabs to pipes seams logical. Using that I could easily load the file into mysql using BASH shell scripts that I created for something else I have already done. But the problem is that I have to laod the data from that file every couple minutes. That file is used to determine connection status of amateur radios. we dont have access to the actual database that does that but that file is produced every three minutes so it is semi accurate. My thoguht was to possibly load that into a mysql database. I need figure out what each of the colums are VAR, INT ect. and then I will create it but I tried to load it into a database once already but I had a heck of time. I thought it was TAB delimited but not being able to change it with a script is very dificult. If there was a way to do a replace on the fly or a way to use a mysql statement that would load it as is into a database that would be great. Like the one below. delete from AM; load data local infile './AM.dat' into table AM fields terminated by '|'; That would be an ideal situation to work with. I could run that in a shell script every couple minutes and it would only around 10 seconds to run the database would be fairly easy to keep updated. Chad Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-292951 Share on other sites More sharing options...
Yesideez Posted July 9, 2007 Share Posted July 9, 2007 You can change it with a PHP script. Inside a loop, run the following: $withbars=strreplace("\r","",$withtabs); Run that on each line and you'll do the same as I have done - I converted them to bars using my text editor for speed. Now you've said you're getting this list so often it may also be a good idea to look at how you may want to enter this data. One option would be to use FTP and upload it each time but the easiest way would be to write a script to upload it instead, then parse the data and populate a database with the results. Two questions: 1. Do you need to keep the old data you uploaded previously? 2. If you need to keep it is the data to be kept separate or appended onto the old? Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-293217 Share on other sites More sharing options...
chadrt Posted July 9, 2007 Author Share Posted July 9, 2007 I have written some basic shell scripts in the past to handle some of the tasks you speak of. The idea I came up with was a small shell script that looks like the following. #/bin/bash cd ~/irlp wget http://status.irlp.net/nohtmlstatus.txt time /usr/bin/mysql -u DBUSER -pDBPASS --local-infile irlp < load.sql rm nohtmlstatus.txt That little snippet run by cron would do exactly what I need it to do. But I am not sure how to word load.sql to handle the tab delimited file or if it is even capable of handling it. Will it freeze up because of the trailing data on the first line of the flat file or will it ignore that because it there is not field defined for it. I cant get it to recognize tab delimiters. The snippet in my previous posting was what I tried before only I tried putting things like "T" or "t" even "tab" and "TAB" for the field termination. Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-293522 Share on other sites More sharing options...
Wildbug Posted July 9, 2007 Share Posted July 9, 2007 Here's a little command line awk+sed routine to convert that file into a typical MySQL "dump" file, suitable for importing like "mysql dbname < dumpfile.sql". I only captured a few lines in the nohtmlstatus.txt file for this example, but you get the idea. This can be used in a cron/sh script. $ awk 'BEGIN{FS="\t+";getline; print "INSERT INTO tbl VALUES"}{printf "(%d,\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",%d,\"%s\",%d,%d,%d,%d,%d,%d,\"%s\",\"%s\",%d),\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17}' nohtmlstatus.txt | sed '$s/,$/;/' > output.sql $ cat output.sql INSERT INTO tbl VALUES (1000,"VE7RHS","Vancouver","BC","Canada","IDLE",1000,"1998-11-12",49,-123,1184001135,145,-600,100,"ve7ltd","http://www.ars.ams.ubc.ca",1184000779), (1010,"VE7RHS","Vancouver","BC","Canada","IDLE",1010,"2005-12-30",49,-123,1184000812,441,5000,0,"VE7LTD","http://www.ars.ams.ubc.ca",1183947442), (1015,"VE7RNV","North Vancouver","BC","Canada","IDLE",1015,"2003-07-08",49,-123,1184000963,444,5000,0,"VE7TLH","http://www.nsarc.ca",1183835505), (1020,"VE7RNA","Chemainus","BC","Canada","OFFLINE",1020,"2001-01-20",48,-123,1184001029,146,-600,0,"VE7FRG","http://www.cvars.com",1183305560), (1030,"VE7VIC","Victoria","BC","Canada","IDLE",1030,"2001-05-15",48,-123,1184001051,146,-600,100,"VE7BEU","http://www.ve7vic.ca",1183686731); Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-293605 Share on other sites More sharing options...
chadrt Posted July 10, 2007 Author Share Posted July 10, 2007 Could that be used to in some way to convert a FFD with specified column lengths. To a dump like file? Here is a link to the file: http://205.236.99.41/~indicatif/download/amateur.zip within that zip file is a file called amateur.rpt Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-294841 Share on other sites More sharing options...
chadrt Posted July 11, 2007 Author Share Posted July 11, 2007 That works so fast I thought something was wrong. Thank you. It will work very nicely. Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-294850 Share on other sites More sharing options...
chadrt Posted July 11, 2007 Author Share Posted July 11, 2007 It is a real pitty we cant just edit our previous posts... Wildbug, I have a couple of questions regarding this method that you posted. Once I try to load it into a database there are problems. Here is the error itself. ERROR 1064 (42000) at line 1: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '_blank"",1184113474), (2018,"VE2REH","Gatineau","QC","Canada","IDLE",2018,"2002-' at line 93 s15242893:/irlp # ls . .. irlp.sh nohtmlstatus.txt output.sql s15242893:/irlp # I have tried everything I can think of to make it work. Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-294984 Share on other sites More sharing options...
Wildbug Posted July 11, 2007 Share Posted July 11, 2007 2009 VE9LC Saint John -EchoIRLP #285237 NB Canada IDLE 2009 2005-07-23 45.30333 -66.07316 1184001810 147.2700 600.0000 0.00 VE9NPS http://www.lcarc.ca/ TARGET="_blank" 1183941823 O That's the line generating the error. It's the double quotes ("_blank") that are causing the problem; they need to be escaped. Here's a solution: awk 'BEGIN{FS="\t+";getline; print "INSERT INTO tbl VALUES"}{gsub(/"/,"\\\"");printf "(%d,\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",%d,\"%s\",%d,%d,%d,%d,%d,%d,\"%s\",\"%s\",%d),\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17}' nohtmlstatus.txt | sed '$s/,$/;/' > output.sql The gsub() function will replace any " with \". You may be able to write the whole command in one line without using intermediary files. mysql db -u user -ppass < \ wget -qO - http://status.irlp.net/nohtmlstatus.txt | \ awk 'BEGIN{FS="\t+";getline; print "INSERT INTO tbl VALUES"}{gsub(/"/,"\\\"");printf "(%d,\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",%d,\"%s\",%d,%d,%d,%d,%d,%d,\"%s\",\"%s\",%d),\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17}' nohtmlstatus.txt | \ sed '$s/,$/;/' Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-295424 Share on other sites More sharing options...
Wildbug Posted July 11, 2007 Share Posted July 11, 2007 Correction to last code example in previous post: mysql db -u user -ppass < \ wget -qO - http://status.irlp.net/nohtmlstatus.txt | \ awk 'BEGIN{FS="\t+";getline; print "INSERT INTO tbl VALUES"}{gsub(/"/,"\\\"");printf "(%d,\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",%d,\"%s\",%d,%d,%d,%d,%d,%d,\"%s\",\"%s\",%d),\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17}' | \ sed '$s/,$/;/' Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-295600 Share on other sites More sharing options...
chadrt Posted July 15, 2007 Author Share Posted July 15, 2007 Here is what I have so far I am using it in a short script so a one liner is not necessary but what I keep getting is this Your script line: awk 'BEGIN{FS="\t+";getline; print "INSERT INTO tbl VALUES"}{gsub(/"/,"\\\"");printf "(%d,\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",%d,\"%s\",%d,%d,%d,%d,%d,%d,\"%s\",\"%s\",%d),\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17}' nohtmlstatus.txt | sed '$s/,$/;/' > output.sql My Output: ERROR 1136 (21S01) at line 1: Column count doesn't match value count at row 1 I am going to check my table and be sure that I havent messed up something. Just out of curiosity would using the wrong column "type" cause that problem. I am not that proficieint so I used a bunch of VARCHAR fields to create the table. Table SQL: CREATE TABLE `IRLP` ( `node` tinyint(5) NOT NULL default '0', `callsign` varchar( NOT NULL default '', `city` varchar(50) NOT NULL default '', `prov_state` varchar(25) NOT NULL default '', `country` varchar(25) NOT NULL default '', `status` varchar(20) NOT NULL default '', `record` tinyint(6) NOT NULL default '0', `install_date` varchar(10) NOT NULL default '', `lat` varchar(15) NOT NULL default '', `long` varchar(15) NOT NULL default '', `lastupdate` varchar(10) NOT NULL default '', `freq` varchar(25) NOT NULL default '', `offset` varchar(10) NOT NULL default '', `pl` varchar(20) NOT NULL default '', `owner` varchar(75) NOT NULL default '', `url` varchar(100) NOT NULL default '', `last_status` varchar(25) NOT NULL default '', `avrs_status` varchar(5) NOT NULL default '', PRIMARY KEY (`node`), KEY `node` (`node`), KEY `callsign` (`callsign`), KEY `prov_state` (`prov_state`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-298972 Share on other sites More sharing options...
chadrt Posted July 16, 2007 Author Share Posted July 16, 2007 Actually I think I found the problem but I am not sure how exactly how to fix it. There are 18 fields but only 17 in the conversion script. I could fix it but I am not sure what each field should be, like wether to put quotes or not. The dates and such are what I need to know about I was not sure. Does having a date hyphenated mean that it should be quoted or is it still a number filed that doesnt need to be quoted. node callsign city prov_state country status record install_date lat long lastupdate freq offset pl owner url last_status avrs_status I am trying to understand how that line works so that I am learning it rather than having someone else do it for me all the time. Would not do me or any one here any good if I come here to simply have others write the code for me. Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-299156 Share on other sites More sharing options...
Wildbug Posted July 16, 2007 Share Posted July 16, 2007 You're right; I missed the last column. Updated awk line only: awk 'BEGIN{FS="\t+";getline; print "INSERT INTO tbl VALUES"}{gsub(/"/,"\\\"");printf "(%d,\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",%d,\"%s\",%.6f,%.6f,%d,%.6f,%.6f,%.2f,\"%s\",\"%s\",%d,\"%s\"),\n",$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18}' nohtmlstatus.txt I looked at your table definition, and here is a suggested replacement: CREATE TABLE `IRLP` ( `node` SMALLINT UNSIGNED NOT NULL DEFAULT 0, `callsign` VARCHAR( NOT NULL DEFAULT '', `city` VARCHAR(64) NOT NULL DEFAULT '', `prov_state` VARCHAR(64) NOT NULL DEFAULT '', `country` VARCHAR(64) NOT NULL DEFAULT '', `status` VARCHAR(20) NOT NULL DEFAULT '', `record` SMALLINT UNSIGNED NOT NULL DEFAULT 0, `install_date` DATE NOT NULL DEFAULT '0000-00-00', `lat` FLOAT(9) NOT NULL DEFAULT '', `long` FLOAT(9) NOT NULL DEFAULT '', `lastupdate` INT UNSIGNED NOT NULL DEFAULT 0, `freq` FLOAT(9) NOT NULL DEFAULT 0, `offset` FLOAT(9) NOT NULL DEFAULT 0, `pl` DECIMAL(5,2) NOT NULL DEFAULT 0, `owner` VARCHAR(75) NOT NULL DEFAULT '', `url` VARCHAR(255) NOT NULL DEFAULT '', `last_status` INT UNSIGNED NOT NULL DEFAULT 0, `avrs_status` VARCHAR(5) NOT NULL DEFAULT '' PRIMARY KEY (`node`), KEY `node` (`node`), KEY `callsign` (`callsign`), KEY `prov_state` (`prov_state`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; I mostly left VARCHAR lengths alone, but changed the column types to best suit the data. In some cases your integer columns were not large enough to hold the data (as in TINYINT on the first column can only hold integers up to 128 in SIGNED mode; with SMALLINT UNSIGNED you'll be good up to 65535). With the proper column types you'll be able to better query your data. I didn't test the CREATE TABLE statement, so let me know if there are any errors I missed. As far as the awk code, I just learned it myself. Awk is a UNIX program that reads a file line-by-line, assumes the data is in columns, and can perform functions on this data before outputting it. Statements consist of a regular expression and an action. Lines matching the regex get the action. Omitting either regex or action means "all lines" or "print the line," respectively. Two special regexes are BEGIN and END, and they mean to perform the action before or after reading the file, respectively. BEGIN { # Before reading anything, do this FS="\t+"; # Change the Field Separator to one or more tabs getline; # Read in the first line (header) and discard it print "INSERT INTO tbl VALUES" # Print the beginning of our SQL statement } { # No regex; do this for every line gsub(/"/,"\\\"") # Perform a global substitution, replacing " with \" # $0 represents the entire line; $1 represents the first field, $2 second, etc. # printf is just like printf in C or PHP printf "(%d,\"%s\",\"%s\",\"%s\",\"%s\",\"%s\",%d,\"%s\",%.6f,%.6f,%d,%.6f,%.6f,%.2f,\"%s\",\"%s\",%d,\"%s\"),\n", $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18 } sed is a UNIX command line stream editor (sed); I used it to replace the last comma from the last line with a semi-colon. sed '$s/,$/;/' The first $ means "the last line only." ,$ is a regular expression matching a terminating comma. ; is the replacement. Quote Link to comment https://forums.phpfreaks.com/topic/58919-parsing-a-flat-file-to-load-into-mysql/#findComment-299522 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.