knar Posted July 7, 2011 Share Posted July 7, 2011 I made a PHP/MySQL database site for the sandbox MMO Earthrise (www.play-earthrise.com) but manually entering hundreds of items/quests/NPCs would take an insane amount of time. What I would like to do is write a script, or otherwise extract the data, then put it all into a database automatically. This is a project that was OK'd by the makers of Earthrise, so it's not an attempt to get help with an exploit or other illegal thing. I know it would involve regular expressions, but I have no experience using them or automatically entering large amount of data into a DB. Here is one example of the LUA file, for the Parts. Parts can be recycled into raw resources, and each part turns into the set amount of each resource listed. The columns in the table are partname, description, r1, r2, r3, r4, r5, r6, c1, c2, c3, c4, c5, c6. r1-r6 are resource name (example: zirconium) and c1-c6 are the count or number of that resource you get (example: 39). Some parts only give 1 type of resource, while others give 6, so it needs to identify how many resources it gives instead of looking for 6 every time. I would also need a way to replace underscores "_" with a space " " and remove the escape slashes "\" from the descriptions. This is one entry from the Parts file. r1 would be zirconium, r2=nanoplastic, and r3=regenerative plasteel. c1=39, c2=14, c3=5. item_part 'Servo-Control' || pretty_name 'Servo-Control' cost 81 weight 297 max_stack_size 5 resources || object 'Items\\Ingredients\\Ingredients.lua?Zirconium' count 39 end || object 'Items\\Ingredients\\Ingredients.lua?Nanoplastic' count 14 end || object 'Items\\Ingredients\\Ingredients.lua?Regenerative_Plasteel' count 5 end icon 'Icons\\Items\\Parts\\Part_Servo-Control' description 'Complex control mechanism used to maintain the battlesuit\'s auxiliary movements.' end Any help would be greatly appreciated. I would prefer to be shown how to do it myself so I can set this up for each type of entry, but I wouldn't refuse if someone wrote some code . Link to comment https://forums.phpfreaks.com/topic/241291-text-mining-data-from-lua-files-into-mysql-database/ Share on other sites More sharing options...
btherl Posted July 7, 2011 Share Posted July 7, 2011 A reliable way to parse something like that would be a loop with a few flags to remember where it is. Something like this: $in_item = false; $in_resources = false; while ($line = fgets($input_fp)) { $line = trim($line); if ($line == '') continue; if ($in_item) { if (strpos($line, 'pretty_name') === 0) { $pretty_name = str_replace("pretty_name ", "", $line); # Note - doesn't remove quotes } if ( ... ) } if ($in_resources) { if ($strpos($line, "object ") === 0) { $resources[] = ... } } if (we're at the end of the resources list) { # Store the resources } if (we're at the end of an item) { # Store item } } I've been a bit general there as I'm not sure of the exact file structure. But the basic idea is to remember what you're doing, and then handle each line as it comes according to what you are currently doing. The other key part is to trigger storing of each item/resource/whatever at the appropriate times. Googling for "php lua parser" also showed up some results, but I don't know what they are capable of parsing and if they apply to your situation. Link to comment https://forums.phpfreaks.com/topic/241291-text-mining-data-from-lua-files-into-mysql-database/#findComment-1239406 Share on other sites More sharing options...
knar Posted July 7, 2011 Author Share Posted July 7, 2011 if (strpos($line, 'pretty_name') === 0) { $pretty_name = str_replace("pretty_name ", "", $line); # Note - doesn't remove quotes } So the code above will grab just the text following "pretty_name" and put it in the variable $line? Would I then assign $line as array[0] for partname, array[1] for description and so on, then put that all into the database at the end of the loop? I'm also confused because you define $in_resources and $in_item as false, then do "if ($in_item)". Wouldnt it always be false and not do the strpos($line, 'pretty_name') stuff? Link to comment https://forums.phpfreaks.com/topic/241291-text-mining-data-from-lua-files-into-mysql-database/#findComment-1239730 Share on other sites More sharing options...
xyph Posted July 7, 2011 Share Posted July 7, 2011 I think this is complex enough to use Regex I use free-spacing so, keep the in mind in my examples. First I'd match item_part 'Whatever' using item_part\ '[^']+'\s+ It matches item_part literally, then a space (i have to use '\ ' due to free-spacing), then a quote, then as many characters as I can that aren't a quote, followed by another quote. \s represents a space,line break,tab etc so \s+ will take care of any whitespace between the important stuff. Will keep editing this as I go Link to comment https://forums.phpfreaks.com/topic/241291-text-mining-data-from-lua-files-into-mysql-database/#findComment-1239790 Share on other sites More sharing options...
xyph Posted July 7, 2011 Share Posted July 7, 2011 Whoops, gonna go about this a different way now. Btherl's solution is good. I'm going to use the following RegEx to break your LUA up. ^\t*([a-z_]+) (?: (?:\ (??:'([^']+)')|([0-9]+))){0,1} (?:\s*(\|\|)){0,1} ) So using the code <?php $expr = '/^\t*([a-z_]+) (?: (?:\ (??:\'([^\']+)\')|([0-9]+))){0,1} (?:\s*(\|\|)){0,1} )/mx'; preg_match_all( $expr, getFile(), $matches, PREG_SET_ORDER ); print_r( $matches ); function getFile() { return <<<HEREDOC } item_part 'Servo-Control' || pretty_name 'Servo-Control' cost 81 weight 297 max_stack_size 5 resources || object 'Items\\Ingredients\\Ingredients.lua?Zirconium' count 39 end || object 'Items\\Ingredients\\Ingredients.lua?Nanoplastic' count 14 end || object 'Items\\Ingredients\\Ingredients.lua?Regenerative_Plasteel' count 5 end icon 'Icons\\Items\\Parts\\Part_Servo-Control' description 'Complex control mechanism used to maintain the battlesuit\'s auxiliary movements.' end HEREDOC; } ?> I get the following Array ( [0] => Array ( [0] => item_part 'Servo-Control' || [1] => item_part [2] => Servo-Control [3] => [4] => || ) [1] => Array ( [0] => pretty_name 'Servo-Control' [1] => pretty_name [2] => Servo-Control ) [2] => Array ( [0] => cost 81 [1] => cost [2] => [3] => 81 ) [3] => Array ( [0] => weight 297 [1] => weight [2] => [3] => 297 ) [4] => Array ( [0] => max_stack_size 5 [1] => max_stack_size [2] => [3] => 5 ) [5] => Array ( [0] => resources || [1] => resources [2] => [3] => [4] => || ) [6] => Array ( [0] => object 'Items\Ingredients\Ingredients.lua?Zirconium' [1] => object [2] => Items\Ingredients\Ingredients.lua?Zirconium ) [7] => Array ( [0] => count 39 [1] => count [2] => [3] => 39 ) [8] => Array ( [0] => end || [1] => end [2] => [3] => [4] => || ) [9] => Array ( [0] => object 'Items\Ingredients\Ingredients.lua?Nanoplastic' [1] => object [2] => Items\Ingredients\Ingredients.lua?Nanoplastic ) [10] => Array ( [0] => count 14 [1] => count [2] => [3] => 14 ) [11] => Array ( [0] => end || [1] => end [2] => [3] => [4] => || ) [12] => Array ( [0] => object 'Items\Ingredients\Ingredients.lua?Regenerative_Plasteel' [1] => object [2] => Items\Ingredients\Ingredients.lua?Regenerative_Plasteel ) [13] => Array ( [0] => count 5 [1] => count [2] => [3] => 5 ) [14] => Array ( [0] => end [1] => end ) [15] => Array ( [0] => icon 'Icons\Items\Parts\Part_Servo-Control' [1] => icon [2] => Icons\Items\Parts\Part_Servo-Control ) [16] => Array ( [0] => description 'Complex control mechanism used to maintain the battlesuit\' [1] => description [2] => Complex control mechanism used to maintain the battlesuit\ ) [17] => Array ( [0] => end [1] => end ) ) My next code will be how to parse this array! Link to comment https://forums.phpfreaks.com/topic/241291-text-mining-data-from-lua-files-into-mysql-database/#findComment-1239814 Share on other sites More sharing options...
xyph Posted July 7, 2011 Share Posted July 7, 2011 The above regex should be $expr = '/^\t*([a-z_]+) (?: (?:\ (??:\'((??!\\\\\')[^\']|\\\\\')+)\')|([0-9.]+))){0,1} (?:\s*(\|\|)){0,1} )/mx'; This will ignore escaped quotes, and allow for decimals in numeric fields. Working on your function now. Sorry about the delays, had a visitor to the office. Link to comment https://forums.phpfreaks.com/topic/241291-text-mining-data-from-lua-files-into-mysql-database/#findComment-1239824 Share on other sites More sharing options...
knar Posted July 7, 2011 Author Share Posted July 7, 2011 Wow thanks a lot...regular expressions are beyond me it seems. I'm not sure if this makes a difference in how the code would be written, but the table columns are in this order "partname, description, r1, c1, r2, c2, r3, c3, r4, c4, r5, c5, r6, c6" Link to comment https://forums.phpfreaks.com/topic/241291-text-mining-data-from-lua-files-into-mysql-database/#findComment-1239830 Share on other sites More sharing options...
xyph Posted July 7, 2011 Share Posted July 7, 2011 That's going to be for you to do. My script will simply take a LUA file and give you an array. Parsing that array, and putting it unto your query will be the part I'll help you with. Link to comment https://forums.phpfreaks.com/topic/241291-text-mining-data-from-lua-files-into-mysql-database/#findComment-1239858 Share on other sites More sharing options...
xyph Posted July 8, 2011 Share Posted July 8, 2011 Well, I've parsed it, and it ends up with a huge array. It's manageable though. It assumes properly formatted LUA <?php $expr = '/^\t*([a-z_]+) (?: (?:\ (??:\'((??!\\\\\')[^\']|\\\\\')+)\')|([0-9.]+))){0,1} (?:\s*(\|\|)){0,1} )/mx'; preg_match_all( $expr, getFile(), $matches, PREG_SET_ORDER ); // print_r( $matches ); print_r( matchesToTree($matches) ); // $matches format // 1 - element name, 2 - string value, 3 - int value, 4 - subgroup identifier function matchesToTree( $matches, $current=0 ) { $r = array(); $total = count($matches); $key = 0; for( $i = $current; $i < $total; $i++ ) { if( $matches[$i][1] == 'end' && !isset($matches[$i][4]) && $current != 0 ) { return array($r, $i); } if( isset($matches[$i][3]) && !empty($matches[$i][3]) ) $value = $matches[$i][3]; elseif( isset($matches[$i][2]) && !empty($matches[$i][2]) ) $value = $matches[$i][2]; else $value = ''; if( $matches[$i][1] != 'end' ) { $r[$key] = array($matches[$i][1] => $value); if( isset($matches[$i][4]) && $matches[$i][1] != 'end' ) { $children = matchesToTree( $matches, $i+1 ); $r[$key]['_children'] = $children[0]; $i = $children[1]; } $key++; } } return $r; } function getFile() { return <<<HEREDOC item_part 'Servo-Control' || pretty_name 'Servo-Control' cost 81 weight 297 max_stack_size 5 resources || object 'Items\\Ingredients\\Ingredients.lua?Zirconium' count 39 end || object 'Items\\Ingredients\\Ingredients.lua?Nanoplastic' count 14 end || object 'Items\\Ingredients\\Ingredients.lua?Regenerative_Plasteel' count 5 end icon 'Icons\\Items\\Parts\\Part_Servo-Control' description 'Complex control mechanism used to maintain the battlesuit\'s auxiliary movements.' end HEREDOC; } ?> Creates Array ( [0] => Array ( [item_part] => Servo-Control [_children] => Array ( [0] => Array ( [pretty_name] => Servo-Control ) [1] => Array ( [cost] => 81 ) [2] => Array ( [weight] => 297 ) [3] => Array ( [max_stack_size] => 5 ) [4] => Array ( [resources] => [_children] => Array ( [0] => Array ( [object] => Items\Ingredients\Ingredients.lua?Zirconium ) [1] => Array ( [count] => 39 ) [2] => Array ( [object] => Items\Ingredients\Ingredients.lua?Nanoplastic ) [3] => Array ( [count] => 14 ) [4] => Array ( [object] => Items\Ingredients\Ingredients.lua?Regenerative_Plasteel ) [5] => Array ( [count] => 5 ) ) ) [5] => Array ( [icon] => Icons\Items\Parts\Part_Servo-Control ) [6] => Array ( [description] => Complex control mechanism used to maintain the battlesuit\'s auxiliary movements. ) ) ) ) I'd like it to produce Array ( [0] => Array ( [item_part] => Servo-Control [_children] => Array ( [0] => Array ( [pretty_name] => Servo-Control ) [1] => Array ( [cost] => 81 ) [2] => Array ( [weight] => 297 ) [3] => Array ( [max_stack_size] => 5 ) [4] => Array ( [resources] => [_children] => Array ( [0] => Array ( [object] => Items\Ingredients\Ingredients.lua?Zirconium [count] => 39 ) [1] => Array ( [object] => Items\Ingredients\Ingredients.lua?Nanoplastic [count] => 14 ) [2] => Array ( [object] => Items\Ingredients\Ingredients.lua?Regenerative_Plasteel [count] => 5 ) ) ) [5] => Array ( [icon] => Icons\Items\Parts\Part_Servo-Control ) [6] => Array ( [description] => Complex control mechanism used to maintain the battlesuit\'s auxiliary movements. ) ) ) ) But that requires some look-ahead that the function doesn't account for. This should be enough to get it working though. Link to comment https://forums.phpfreaks.com/topic/241291-text-mining-data-from-lua-files-into-mysql-database/#findComment-1239905 Share on other sites More sharing options...
knar Posted July 10, 2011 Author Share Posted July 10, 2011 Thanks again xyph, now I just have to figure out how to use those arrays Link to comment https://forums.phpfreaks.com/topic/241291-text-mining-data-from-lua-files-into-mysql-database/#findComment-1240969 Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.