Jump to content

"text mining" data from LUA files into MySQL database


knar

Recommended Posts

I made a PHP/MySQL database site for the sandbox MMO Earthrise (www.play-earthrise.com) but manually entering hundreds of items/quests/NPCs would take an insane amount of time. What I would like to do is write a script, or otherwise extract the data, then put it all into a database automatically. This is a project that was OK'd by the makers of Earthrise, so it's not an attempt to get help with an exploit or other illegal thing. I know it would involve regular expressions, but I have no experience using them or automatically entering large amount of data into a DB.

 

Here is one example of the LUA file, for the Parts. Parts can be recycled into raw resources, and each part turns into the set amount of each resource listed. The columns in the table are partname, description, r1, r2, r3, r4, r5, r6, c1, c2, c3, c4, c5, c6. r1-r6 are resource name (example: zirconium) and c1-c6 are the count or number of that resource you get (example: 39). Some parts only give 1 type of resource, while others give 6, so it needs to identify how many resources it gives instead of looking for 6 every time. I would also need a way to replace underscores "_" with a space " " and remove the escape slashes "\" from the descriptions.

 

This is one entry from the Parts file. r1 would be zirconium, r2=nanoplastic, and r3=regenerative plasteel. c1=39, c2=14, c3=5.

 

item_part 'Servo-Control'
||
pretty_name 'Servo-Control'
cost 81
weight 297
max_stack_size 5

resources
||
	object 'Items\\Ingredients\\Ingredients.lua?Zirconium'
	count 39
end
||
	object 'Items\\Ingredients\\Ingredients.lua?Nanoplastic'
	count 14
end
||
	object 'Items\\Ingredients\\Ingredients.lua?Regenerative_Plasteel'
	count 5
end

icon 'Icons\\Items\\Parts\\Part_Servo-Control'
description 'Complex control mechanism used to maintain the battlesuit\'s auxiliary movements.'
end

 

Any help would be greatly appreciated. I would prefer to be shown how to do it myself so I can set this up for each type of entry, but I wouldn't refuse if someone wrote some code  ;).

Link to comment
Share on other sites

A reliable way to parse something like that would be a loop with a few flags to remember where it is.  Something like this:

 

$in_item = false;
$in_resources = false;
while ($line = fgets($input_fp)) {
  $line = trim($line);
  if ($line == '') continue;

  if ($in_item) {
    if (strpos($line, 'pretty_name') === 0) {
      $pretty_name = str_replace("pretty_name ", "", $line); # Note - doesn't remove quotes
    }
    if ( ... )
  }
  if ($in_resources) {
    if ($strpos($line, "object ") === 0) {
      $resources[] = ...
    }
  }

  if (we're at the end of the resources list) {
    # Store the resources
  }

  if (we're at the end of an item) {
    # Store item
  }
}

 

I've been a bit general there as I'm not sure of the exact file structure.  But the basic idea is to remember what you're doing, and then handle each line as it comes according to what you are currently doing.  The other key part is to trigger storing of each item/resource/whatever at the appropriate times.

 

Googling for "php lua parser" also showed up some results, but I don't know what they are capable of parsing and if they apply to your situation.

Link to comment
Share on other sites

 if (strpos($line, 'pretty_name') === 0) {
      $pretty_name = str_replace("pretty_name ", "", $line); # Note - doesn't remove quotes
    }

So the code above will grab just the text following "pretty_name" and put it in the variable $line? Would I then assign $line as array[0] for partname, array[1] for description and so on, then put that all into the database at the end of the loop?

 

I'm also confused because you define $in_resources and $in_item as false, then do "if ($in_item)". Wouldnt it always be false and not do the strpos($line, 'pretty_name') stuff?

Link to comment
Share on other sites

I think this is complex enough to use Regex

 

I use free-spacing so, keep the in mind in my examples.

 

First I'd match item_part 'Whatever' using

item_part\ '[^']+'\s+

It matches item_part literally, then a space (i have to use '\ ' due to free-spacing), then a quote, then as many characters as I can that aren't a quote, followed by another quote. \s represents a space,line break,tab etc so \s+ will take care of any whitespace between the important stuff.

 

 

Will keep editing this as I go

 

 

Link to comment
Share on other sites

Whoops, gonna go about this a different way now.

 

Btherl's solution is good. I'm going to use the following RegEx to break your LUA up.

 

^\t*([a-z_]+)
(?:
(?:\ (??:'([^']+)')|([0-9]+))){0,1}
(?:\s*(\|\|)){0,1}
)

 

So using the code

<?php

$expr = '/^\t*([a-z_]+)
(?:
	(?:\ (??:\'([^\']+)\')|([0-9]+))){0,1}
	(?:\s*(\|\|)){0,1}
)/mx';

preg_match_all( $expr, getFile(), $matches, PREG_SET_ORDER );

print_r( $matches );

function getFile() {
return <<<HEREDOC
}
item_part 'Servo-Control'
||
pretty_name 'Servo-Control'
cost 81
weight 297
max_stack_size 5

resources
||
	object 'Items\\Ingredients\\Ingredients.lua?Zirconium'
	count 39
end
||
	object 'Items\\Ingredients\\Ingredients.lua?Nanoplastic'
	count 14
end
||
	object 'Items\\Ingredients\\Ingredients.lua?Regenerative_Plasteel'
	count 5
end

icon 'Icons\\Items\\Parts\\Part_Servo-Control'
description 'Complex control mechanism used to maintain the battlesuit\'s auxiliary movements.'
end
HEREDOC;
}

?>

 

I get the following

 

Array
(
    [0] => Array
        (
            [0] => item_part 'Servo-Control'

||
            [1] => item_part
            [2] => Servo-Control
            [3] => 
            [4] => ||
        )

    [1] => Array
        (
            [0] => 	pretty_name 'Servo-Control'
            [1] => pretty_name
            [2] => Servo-Control
        )

    [2] => Array
        (
            [0] => 	cost 81
            [1] => cost
            [2] => 
            [3] => 81
        )

    [3] => Array
        (
            [0] => 	weight 297
            [1] => weight
            [2] => 
            [3] => 297
        )

    [4] => Array
        (
            [0] => 	max_stack_size 5
            [1] => max_stack_size
            [2] => 
            [3] => 5
        )

    [5] => Array
        (
            [0] => 	resources

||
            [1] => resources
            [2] => 
            [3] => 
            [4] => ||
        )

    [6] => Array
        (
            [0] => 		object 'Items\Ingredients\Ingredients.lua?Zirconium'
            [1] => object
            [2] => Items\Ingredients\Ingredients.lua?Zirconium
        )

    [7] => Array
        (
            [0] => 		count 39
            [1] => count
            [2] => 
            [3] => 39
        )

    [8] => Array
        (
            [0] => 	end

||
            [1] => end
            [2] => 
            [3] => 
            [4] => ||
        )

    [9] => Array
        (
            [0] => 		object 'Items\Ingredients\Ingredients.lua?Nanoplastic'
            [1] => object
            [2] => Items\Ingredients\Ingredients.lua?Nanoplastic
        )

    [10] => Array
        (
            [0] => 		count 14
            [1] => count
            [2] => 
            [3] => 14
        )

    [11] => Array
        (
            [0] => 	end

||
            [1] => end
            [2] => 
            [3] => 
            [4] => ||
        )

    [12] => Array
        (
            [0] => 		object 'Items\Ingredients\Ingredients.lua?Regenerative_Plasteel'
            [1] => object
            [2] => Items\Ingredients\Ingredients.lua?Regenerative_Plasteel
        )

    [13] => Array
        (
            [0] => 		count 5
            [1] => count
            [2] => 
            [3] => 5
        )

    [14] => Array
        (
            [0] => 	end
            [1] => end
        )

    [15] => Array
        (
            [0] => 	icon 'Icons\Items\Parts\Part_Servo-Control'
            [1] => icon
            [2] => Icons\Items\Parts\Part_Servo-Control
        )

    [16] => Array
        (
            [0] => 	description 'Complex control mechanism used to maintain the battlesuit\'
            [1] => description
            [2] => Complex control mechanism used to maintain the battlesuit\
        )

    [17] => Array
        (
            [0] => end
            [1] => end
        )

)

 

My next code will be how to parse this array!

Link to comment
Share on other sites

The above regex should be

$expr = '/^\t*([a-z_]+)
(?:
	(?:\ (??:\'((??!\\\\\')[^\']|\\\\\')+)\')|([0-9.]+))){0,1}
	(?:\s*(\|\|)){0,1}
)/mx';

 

This will ignore escaped quotes, and allow for decimals in numeric fields.

 

Working on your function now. Sorry about the delays, had a visitor to the office.

Link to comment
Share on other sites

Wow thanks a lot...regular expressions are beyond me it seems.

 

I'm not sure if this makes a difference in how the code would be written, but the table columns are in this order "partname, description, r1, c1, r2, c2, r3, c3, r4, c4, r5, c5, r6, c6"

Link to comment
Share on other sites

That's going to be for you to do. My script will simply take a LUA file and give you an array.

 

Parsing that array, and putting it unto your query will be the part I'll help you with.

Link to comment
Share on other sites

Well, I've parsed it, and it ends up with a huge array. It's manageable though. It assumes properly formatted LUA

 

<?php

$expr = '/^\t*([a-z_]+)
(?:
	(?:\ (??:\'((??!\\\\\')[^\']|\\\\\')+)\')|([0-9.]+))){0,1}
	(?:\s*(\|\|)){0,1}
)/mx';

preg_match_all( $expr, getFile(), $matches, PREG_SET_ORDER );

// print_r( $matches );
print_r( matchesToTree($matches) );



// $matches format
// 1 - element name, 2 - string value, 3 - int value, 4 - subgroup identifier
function matchesToTree( $matches, $current=0 ) {
$r = array(); $total = count($matches); $key = 0;
for( $i = $current; $i < $total; $i++ ) {

	if( $matches[$i][1] == 'end' && !isset($matches[$i][4]) && $current != 0 ) {
		return array($r, $i);
	}

	if( isset($matches[$i][3]) && !empty($matches[$i][3]) ) $value = $matches[$i][3];
	elseif( isset($matches[$i][2]) && !empty($matches[$i][2]) ) $value = $matches[$i][2];
	else $value = '';

	if( $matches[$i][1] != 'end' ) {
		$r[$key] = array($matches[$i][1] => $value);

		if( isset($matches[$i][4]) && $matches[$i][1] != 'end' ) {
			$children = matchesToTree( $matches, $i+1 );
			$r[$key]['_children'] = $children[0];
			$i = $children[1];
		}
		$key++;
	}

}
return $r;
}

function getFile() {
return <<<HEREDOC
item_part 'Servo-Control'
||
pretty_name 'Servo-Control'
cost 81
weight 297
max_stack_size 5

resources
||
	object 'Items\\Ingredients\\Ingredients.lua?Zirconium'
	count 39
end
||
	object 'Items\\Ingredients\\Ingredients.lua?Nanoplastic'
	count 14
end
||
	object 'Items\\Ingredients\\Ingredients.lua?Regenerative_Plasteel'
	count 5
end

icon 'Icons\\Items\\Parts\\Part_Servo-Control'
description 'Complex control mechanism used to maintain the battlesuit\'s auxiliary movements.'
end

HEREDOC;
}

?>

 

Creates

 

Array
(
    [0] => Array
        (
            [item_part] => Servo-Control
            [_children] => Array
                (
                    [0] => Array
                        (
                            [pretty_name] => Servo-Control
                        )

                    [1] => Array
                        (
                            [cost] => 81
                        )

                    [2] => Array
                        (
                            [weight] => 297
                        )

                    [3] => Array
                        (
                            [max_stack_size] => 5
                        )

                    [4] => Array
                        (
                            [resources] => 
                            [_children] => Array
                                (
                                    [0] => Array
                                        (
                                            [object] => Items\Ingredients\Ingredients.lua?Zirconium
                                        )

                                    [1] => Array
                                        (
                                            [count] => 39
                                        )

                                    [2] => Array
                                        (
                                            [object] => Items\Ingredients\Ingredients.lua?Nanoplastic
                                        )

                                    [3] => Array
                                        (
                                            [count] => 14
                                        )

                                    [4] => Array
                                        (
                                            [object] => Items\Ingredients\Ingredients.lua?Regenerative_Plasteel
                                        )

                                    [5] => Array
                                        (
                                            [count] => 5
                                        )

                                )

                        )

                    [5] => Array
                        (
                            [icon] => Icons\Items\Parts\Part_Servo-Control
                        )

                    [6] => Array
                        (
                            [description] => Complex control mechanism used to maintain the battlesuit\'s auxiliary movements.
                        )

                )

        )

)

 

I'd like it to produce

 

Array
(
    [0] => Array
        (
            [item_part] => Servo-Control
            [_children] => Array
                (
                    [0] => Array
                        (
                            [pretty_name] => Servo-Control
                        )

                    [1] => Array
                        (
                            [cost] => 81
                        )

                    [2] => Array
                        (
                            [weight] => 297
                        )

                    [3] => Array
                        (
                            [max_stack_size] => 5
                        )

                    [4] => Array
                        (
                            [resources] => 
                            [_children] => Array
                                (
                                    [0] => Array
                                        (
                                            [object] => Items\Ingredients\Ingredients.lua?Zirconium
                                            [count] => 39
                                        )

                                    [1] => Array
                                        (
                                            [object] => Items\Ingredients\Ingredients.lua?Nanoplastic
                                            [count] => 14
                                        )

                                    [2] => Array
                                        (
                                            [object] => Items\Ingredients\Ingredients.lua?Regenerative_Plasteel
                                            [count] => 5
                                        )

                                )

                        )

                    [5] => Array
                        (
                            [icon] => Icons\Items\Parts\Part_Servo-Control
                        )

                    [6] => Array
                        (
                            [description] => Complex control mechanism used to maintain the battlesuit\'s auxiliary movements.
                        )

                )

        )

)

 

But that requires some look-ahead that the function doesn't account for. This should be enough to get it working though.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.