Jump to content

Dummy looking for a little help


KingOfHeart

Recommended Posts

preg_match/preg_match_all ........something about this function always screw me with understanding it, way too many symbols. So when helping me, try to simplfy when I need to do to understand.

 

 

http://tf2spreadsheet.blogspot.com/

 

I want to extract the quality, class, item, refined, alt version(notes not needed)

 

How do I echo all this data? Using file_get_contests I can get all the main part, but now how do I split it up?

 

Please don't link me to the "http://php.net/manual/en/function.preg-match-all.php"

Too many patterns at once..any way we can break it up to teaching me one step at a time for all the symbols I need to use and stuff?

Link to comment
Share on other sites

Okay, I would accept "because I'd like to try using regular expressions" as a reason, though only after I made sure you understood that a regex is not the solution to scanning an HTML page.

 

But as it so happens regex is not a solution at all for this. The table is built entirely using Javascript, with JSON returned from Google Docs. The closest you could get would be running the regex against the JSON but it would be entirely ridiculous to try to construct something that would work as you'd want.

Sorry but this isn't a problem you can use to learn regular expressions.

Link to comment
Share on other sites

I found a script I used, now all I need help with is for someone to teach me how it works exactly.

 

 

function bbcode2html($message) {

 $preg = array(
	  '/(?<!\\\\)\[color(?::\w+)?=(.*?)\](.*?)\[\/color(?::\w+)?\]/si'   => "<span style=\"color:\\1\">\\2</span>",
	  '/(?<!\\\\)\[size(?::\w+)?=(.*?)\](.*?)\[\/size(?::\w+)?\]/si'	 => "<span style=\"font-size:\\1pt\">\\2</span>",
	  '/(?<!\\\\)\[font(?::\w+)?=(.*?)\](.*?)\[\/font(?::\w+)?\]/si'	 => "<span style=\"font-family:\\1\">\\2</span>",
	  '/(?<!\\\\)\[align(?::\w+)?=(.*?)\](.*?)\[\/align(?::\w+)?\]/si'   => "<div style=\"text-align:\\1\">\\2</div>",
	  '/(?<!\\\\)\[b(?::\w+)?\](.*?)\[\/b(?::\w+)?\]/si'				 => "<span style=\"font-weight:bold\">\\1</span>",
	  '/(?<!\\\\)\[i(?::\w+)?\](.*?)\[\/i(?::\w+)?\]/si'				 => "<span style=\"font-style:italic\">\\1</span>",
	  '/(?<!\\\\)\[u(?::\w+)?\](.*?)\[\/u(?::\w+)?\]/si'				 => "<span style=\"text-decoration:underline\">\\1</span>",
	  '/(?<!\\\\)\[center(?::\w+)?\](.*?)\[\/center(?::\w+)?\]/si'	   => "<div style=\"text-align:center\">\\1</div>",

	  // [email]
	  '/(?<!\\\\)\[email(?::\w+)?\](.*?)\[\/email(?::\w+)?\]/si'		 => "<a href=\"mailto:\\1\" class=\"bb-email\">\\1</a>",
	  '/(?<!\\\\)\[email(?::\w+)?=(.*?)\](.*?)\[\/email(?::\w+)?\]/si'   => "<a href=\"mailto:\\1\" class=\"bb-email\">\\2</a>",
	  // [url]
	  '/(?<!\\\\)\[url(?::\w+)?\]www\.(.*?)\[\/url(?::\w+)?\]/si'	    => "<a href=\"http://www.\\1\" target=\"_blank\" class=\"bb-url\">\\1</a>",
	  '/(?<!\\\\)\[url(?::\w+)?\](.*?)\[\/url(?::\w+)?\]/si'			 => "<a href=\"\\1\" target=\"_blank\" class=\"bb-url\">\\1</a>",
	  '/(?<!\\\\)\[url(?::\w+)?=(.*?)?\](.*?)\[\/url(?::\w+)?\]/si'	  => "<a href=\"\\1\" target=\"_blank\" class=\"bb-url\">\\2</a>",


// [img]
	  '/(?<!\\\\)\[img(?::\w+)?\](.*?)\[\/img(?::\w+)?\]/si'			 => "<img width = 100 height = 100 src=\"\\1\" alt=\"\\1\" class=\"bb-image\" />",

'/(?<!\\\\)\[img(?::\w+)?=(.*?)x(.*?)\](.*?)\[\/img(?::\w+)?\]/si' => "<img width=\"\\1\" height=\"\\2\" src=\"\\3\" alt=\"\\3\" class=\"bb-image\" />",

	  // [list]
	  '/(?<!\\\\)(?:\s*<br\s*\/?>\s*)?\[\*(?::\w+)?\](.*?)(?=(?:\s*<br\s*\/?>\s*)?\[\*|(?:\s*<br\s*\/?>\s*)?\[\/?list)/si' => "\n<li class=\"bb-listitem\">\\1</li>",
	  '/(?<!\\\\)(?:\s*<br\s*\/?>\s*)?\[\/list(?!u|o)\w+)?\](?:<br\s*\/?>)?/si'    => "\n</ul>",
	  '/(?<!\\\\)(?:\s*<br\s*\/?>\s*)?\[\/list:u(:\w+)?\](?:<br\s*\/?>)?/si'		 => "\n</ul>",
	  '/(?<!\\\\)(?:\s*<br\s*\/?>\s*)?\[\/list:o(:\w+)?\](?:<br\s*\/?>)?/si'		 => "\n</ol>",
	  '/(?<!\\\\)(?:\s*<br\s*\/?>\s*)?\[list(?!u|o)\w+)?\]\s*(?:<br\s*\/?>)?/si'   => "\n<ul>",
	  '/(?<!\\\\)(?:\s*<br\s*\/?>\s*)?\[list:u(:\w+)?\]\s*(?:<br\s*\/?>)?/si'	    => "\n<ul class=\"bb-list-unordered\">",
	  '/(?<!\\\\)(?:\s*<br\s*\/?>\s*)?\[list:o(:\w+)?\]\s*(?:<br\s*\/?>)?/si'	    => "\n<ol>",
	  '/(?<!\\\\)(?:\s*<br\s*\/?>\s*)?\[list(?:)?(:\w+)?=1\]\s*(?:<br\s*\/?>)?/si' => "\n<ol class=\"bb-list-ordered,bb-list-ordered-d\">",
	  '/(?<!\\\\)(?:\s*<br\s*\/?>\s*)?\[list(?:)?(:\w+)?=i\]\s*(?:<br\s*\/?>)?/s'  => "\n<ol class=\"bb-list-ordered,bb-list-ordered-lr\">",
	  '/(?<!\\\\)(?:\s*<br\s*\/?>\s*)?\[list(?:)?(:\w+)?=I\]\s*(?:<br\s*\/?>)?/s'  => "\n<ol class=\"bb-list-ordered,bb-list-ordered-ur\">",
	  '/(?<!\\\\)(?:\s*<br\s*\/?>\s*)?\[list(?:)?(:\w+)?=a\]\s*(?:<br\s*\/?>)?/s'  => "\n<ol class=\"bb-list-ordered,bb-list-ordered-la\">",
	  '/(?<!\\\\)(?:\s*<br\s*\/?>\s*)?\[list(?:)?(:\w+)?=A\]\s*(?:<br\s*\/?>)?/s'  => "\n<ol class=\"bb-list-ordered,bb-list-ordered-ua\">",

	  //line breaks
	  '/\n/'															   => "<br>",
	  // escaped tags like \[b], \[color], \[url], ...
	  '/\\\\(\[\/?\w+(?::\w+)*\])/'									  => "\\1"

 );

  $message = preg_replace(array_keys($preg), array_values($preg), $message);
  return $message;
}
*/

function bbcode2html($message)
{
   $bbcode = array(
   "'\[center\](.*?)\[/center\]'is" => "<center>\\1</center>",
   "'\[left\](.*?)\[/left\]'is" => "<div style='text-align: left;'>\\1</div>",
   "'\[right\](.*?)\[/right\]'is" => "<div style='text-align: right;'>\\1</div>",
   "'\[pre\](.*?)\[/pre\]'is" => "<pre>\\1</pre>",
   "'\[b\](.*?)\[/b\]'is" => "<b>\\1</b>",
   "'\[quote\](.*?)\[/quote\]'is" => "<div class='top'><b>Quote:</b><hr>\\1</div>",
   "'\[i\](.*?)\[/i\]'is" => "<i>\\1</i>",
   "'\[u\](.*?)\[/u\]'is" => "<u>\\1</u>",
   "'\[s\](.*?)\[/s\]'is" => "<del>\\1</del>",
   "'\[url\](.*?)\[/url\]'is" => "<a href='\\1' target='_BLANK'>\\1</a>",
   "'\[url=(.*?)\](.*?)\[/url\]'is" => "<a href=\"\\1\" target=\"_BLANK\">\\2</a>",
   "'\[page=(.*?)\](.*?)\[/page\]'is" => "<a href=\"http://openzelda.thegaminguniverse.org/\\1\" target=\"_BLANK\">\\2</a>",
   "'\[img\](.*?)\[/img\]'is" => "<img border=\"0\" src=\"\\1\">",
   "'\[img=(.*?)\]'" => "<img border=\"0\" src=\"\\1\">",
   "'\[email\](.*?)\[/email\]'is" => "<a href='mailto: \\1'>\\1</a>",
   "'\[size=(.*?)\](.*?)\[/size\]'is" => "<span style='font-size: \\1;'>\\2</span>",
   "'\[font=(.*?)\](.*?)\[/font\]'is" => "<span style='font-family: \\1;'>\\2</span>",
   "'\[color=(.*?)\](.*?)\[/color\]'is" => "<span style='color: \\1;'>\\2</span>",
   "'\n'is" => "<br>",
   "'    'is" => "    ",
   "'    'is" => "    ",
   "'\[list=o\](.*?)\[/list\]'is" => "<ol>\\1</ol>",
   "'\[list=u\](.*?)\[/list\]'is" => "<ul>\\1</ol>",
   "'\[list\](.*?)\[/list\]'is" => "<ol>\\1</ol>",
   "'\[li\](.*?)\[/li\]'is" => "<li>\\1</li>",
   "'\[code\](.*?)\[/code\]'is" => "<div class='code'>\\1</div>",
   "'\[spoiler=(.*?)\](.*?)\[/spoiler\]'is" => "<a href=\"javascript:unhide('\\1');\">\\2</a>",
   "'\[hide=(.*?)\](.*?)\[/hide\]'is" => "<div id='\\1' class='hidden'>\\2</div>"
   );
   $message = preg_replace(array_keys($bbcode), array_values($bbcode), $message);
   return $message;
}

 

What does the \\1 and \\2 mean? (.*?) < where can I find out about this?

What other symbols are there?

Link to comment
Share on other sites

The world of regular expressions is very complicated. Not something people can cover in an online forum. You should pick up a book (an actual book) on the subject; I believe the owl book is still the reigning champion. Meanwhile look for tutorials on Perl or PCRE syntax (and not POSIX).

 

Eh, I'll type out the basics. It's like 99% of what you may ever need.

Miscellaneous

.  Dot-all. Any character except newlines
|  Alternation. Either everything to the left or everything to the right. Can be limited to a group; can be chained


Metacharacters

\\  Backslash
\d  Digit
\D  Non-digit (opposite of \d)
\n  Unix newline (LF); \r\n is the Windows newline (CRLF)
\r  Mac newline (CR)
\s  Whitespace (spaces and tabs)
\S  Non-whitespace (opposite of \w)
\t  Tab
\w  "Word" character (lower- and uppercase letters, numbers, and underscores)
\W  Non-word character (opposite of \w)


Character sets (most special symbols, like . and | and ?, lose their meanings; metacharacters still apply)

[abc]    Either 'a' or 'b' or 'c'
[^abc]   Neither 'a' nor 'b' nor 'c'
[a-c]    Either 'a' or 'c' or anything between them
[-ac]    Either '-' or 'a' or 'c' (hyphen becomes normal at the beginning)
[ac^-]   Either 'a', 'c', '^', or '-' (^ is only special at the beginning, hyphen becomes normal at the end)
[a-c-f]  Either 'a', 'c', something between them, '-', or 'f' (cannot chain ranges, hyphen becomes normal)


Quantifiers (work on the preceeding single unit)

X       Exactly one of X
X?      Zero or one of X (ie, X is optional)
X*      Zero or more of X
X+      One or more of X
X{a}    Exactly a-many of X (eg, {2} is exactly two)
X{a,b}  Between a- and b-many of X; either a or b can be optional (eg, {2,} is at least two)
X*+     Possessive. Zero or more of X, as many as possible and no backtracking
X+?     Lazy. One or more of X, as few as possible


Grouping and capturing

(abc)    Group "abc" together as one unit and capture it for later
\N       The N-th captured group, counting capturing (s from left to right
$N       Same as \N
(?:abc)  Group "abc" together as one unit but do not capture it


Assertions/anchors (none of these capture or consume characters)

^       Beginning of the string
$       End of the string
\b      Word boundary (between a \w and a \W)
\B      Non-word boundary (opposite of \B)
(?=X)   Positive lookahead. Ensure that X follows immediately
(?!X)   Negative lookahead. Ensure that X does not follow immediately
(?<=X)  Positive lookbehind. Ensure that X preceeded immediately
(?<!X)  Negative lookbehind. Ensure that X did not preceed immediately


Flags

/e  Eval, preg_replace() only. Replacement string is evaluated as PHP code first. Deprecated in favor of using preg_replace_callback()
/i  Letters (besides \w) are case-insensitive
/m  ^ and $ can also match the beginning and end of a line
/s  Dot-all will match newlines
/x  Whitespace is ignored, line comments allowed

See also the manual.

Edited by requinix
Link to comment
Share on other sites

BTW, what does the :: mean??

 

I managed to create some new bbcodes but not 100% sure on the required 1 space thing.

 

'/(?<!\\\\)\[img(?::\s+)width(?::\w+)?=(.*?)(?::\s+)height(?::\w+)?=(.*?)(?::\s)src(?::\w+)?=(.*?)\]\]/si' => "<img width=\"\\1\" height=\"\2\" src=\"\\3\"/>",

 

108.gif < this should output an image with a width of 24 and a height of 40.

I manged to create a bbode without the width and height so far.

Link to comment
Share on other sites

Only that first colon matters: it's a (?:...) where the first character inside is a colon.

 

And if you didn't guess, a backslash can escape an otherwise important character. "[abc]" would be a character set while "\[abc\]" is literally a left bracket, "abc", and a right bracket.

Link to comment
Share on other sites

Figured the backslash part since I used it lots of times in other scripts.

I understand now for the colon.

 

Still can't get the required space so far..trying different combinations.

 

'/(?<!\\\\)\[img(?::\s+)? width(?::\w+)?=(.*?)(?::\s+)? height(?::\w+)?=(.*?)(?::\s+)? src(?::\w+)?=(.*?)\]\]/si' => "<img width=\"\\1\" height=\"\2\" src=\"\\3\"/>",

Link to comment
Share on other sites

The w+ worked in all the other scripts so far but none of the scripts required spacing.

 

The w+ allowed the numbers after the equal sign if I understood correct.

 

I know \s+ or maybe it's just \s that's used to check for spacing but not sure yet which symbols to use next to it.

 

Guess I could/should replace the ones that require numbers with a d+ right if I only allowed numbers.. but I might also want to allow things like this

"width=123px height=123px src=location" or "width=123% height=123% src=location"

 

as an option.

Edited by KingOfHeart
Link to comment
Share on other sites

Not those \w+s. I mean the ones with the colons. That you were asking about before.

 

I'd still love to hear exactly what strings you're trying to match but how about

/\[img(?:\s+width=(\d+(?:px|%)?)|\s+height=(\d+(?:px|%)?)|\s+src=([^\s\]]+))+\]/i

Take a minute to look over that. $1 will be the width, $2 the height, and $3 the src. Possibly empty, like if one wasn't given.

 

Then you can feed that to preg_replace_callback() like

$new = preg_replace_callback('...', function($matches) {
// need the src, width, and height to work
if (!isset($matches[1], $matches[2], $matches[3])) {
	return $matches[0]; // unchanged
}
list(, $width, $height, $src) = $matches;

// you may want to check for a valid width and height here
// eg, is there a risk that someone will use width=9999999px?

if (ctype_digit($width)) {
	$width .= "px";
}
if (ctype_digit($height)) {
	$height .= "px";
}

return "<img src=\"" . htmlentities($src) . "\" style=\"width: {$width}; height: {$height}\" />";
}, $old);

Link to comment
Share on other sites

Your actually getting one step ahead so let me state the problem.

 

 

As of right now when I use 108.gif

it returns it as

108.gif

meaning plain text...it did not find this exact match.

 

 

--------------

I was wondering how I shoujld go about setting limits.

Haven't used too much preg yet..would I just use it like this...

 

function bbcode2html($message) {

 $preg = array(
  '/(?<!\\\\)\[img(?::\s+)? width(?::\w+)?=(.*?)(?::\s+)? height(?::\w+)?=(.*?)(?::\s+)? src(?::\w+)?=(.*?)\]\]/si' => "<img width=\"\\1\" height=\"\2\" src=\"\\3\"/>"              

 );

  preg_replace_callback(array_keys($preg), array_values($preg), $message);// < do I call it like this???
  $message = preg_replace(array_keys($preg), array_values($preg), $message);
  return $message;
}

$new = preg_replace_callback('...', function($matches) {
       // need the src, width, and height to work
       if (!isset($matches[1], $matches[2], $matches[3])) {
               return $matches[0]; // unchanged
       }
       list(, $width, $height, $src) = $matches;

       // you may want to check for a valid width and height here
       // eg, is there a risk that someone will use width=9999999px?

       if (ctype_digit($width)) {
               $width .= "px";
       }
       if (ctype_digit($height)) {
               $height .= "px";
       }

       return "<img src=\"" . htmlentities($src) . "\" style=\"width: {$width}; height: {$height}\" />";
}, $old);

 

-------------------------

I think I'll create a shorter function like [test ]

and make it only work if I have a space after the word test and work my way on from there

Link to comment
Share on other sites

Sweet, I got the image preg to work finally....Not sure if this is how you would do it, but at least now I can use sizes if needed...I'll try to see if I can use your function

 

'/(?<!\\\\)\[img(?::\w+)?=(.*?)\]/si' => "<img src=\"\\1\" alt=\"\\1\"/>",

'/(?<!\\\\)\[img(?:\s+)width(?::\w+)?=(.*?)(?:\s+)height(?::\w+)?=(.*?)(?:\s+)src(?::\w+)?=(.*?)\]/si' => "<img width=\"\\1\" height=\"\\2\" src=\"\\3\"/>",

Link to comment
Share on other sites

I don't consider myself ready to use something like this.

If it becomes an issue I'll use non-preg or remove the rights for the user to use this function.

Unless if you think you can work with me, I'm going to just pass.

 

Also besides px, they're allowed to use % as well

Edited by KingOfHeart
Link to comment
Share on other sites

I found something shorter and simpler.

 

preg_match_all("/(?<!\\\\)\[img(?:\s+)width(?::\d+)?=(.*?)(?:\s+)height(?::\d+)?=(.*?)(?:\s+)src(?::\w+)?=(.*?)\]/si", $message, $matches);


   foreach ($matches[1] as $num) {
       $wid = "width=" . $num;
       $message = ($num > 700)? str_replace("$wid", "width=700", $message) : $message;
   }
   foreach ($matches[2] as $num2) {
       $hei = "height=" . $num2;
       $message = ($num2 > 700)? str_replace("$hei", "height=700", $message) : $message;
   }

 

 

I decided to remove px and % but don't know how to make the bbcode work only if you use numbers. I started to use d+ but it still converted the bbcode into html.

I got to go to bed now but any idea how?

 

'/(?<!\\\\)\[img(?:\s+)width(?::\d+)?=(.*?)(?:\s+)height(?::\d+)?=(.*?)(?:\s+)src(?::\w+)?=(.*?)\]/si' => "<img width=\"\\1\" height=\"\\2\" src=\"\\3\"/>",

 

I want

width=30 height=30 src=image to convert it to an image with the width and height < this works fine

and I want

width=30px height=30px src=image to convert to plain text as width=30px height=30px src=image < no idea how

 

Hope all I have to do is add another symbol to the search.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.