Jump to content

random_

New Members
  • Posts

    9
  • Joined

  • Last visited

random_'s Achievements

Newbie

Newbie (1/5)

0

Reputation

  1. Very much appreciate the explanation. Thanks bro. Just to add something I stumbled upon. Class defined like this [^g] will match any character but "g" just like [a-fh-z]. I don't quite get it cause I know ^ defines begining of a line or maybe I'm wrong - maybe when in class ^ acts somehow different...
  2. Thanks .josh. That thing you are sugesting is matching any charachter but what if I want to match only words that doesn't contain "g"? e.g. words: house laguna window and I want to match only house and window but not la una. Why (?!g)[\w] doens't work here?
  3. Hi guys. Is it possible to exclude character, word, number, pattern or anything from regex? e.g. [a-z] but I want to exclude lets say character "g". My guess it can be done like this [a-f][h-z] but it doesn't seem to work... Any opinion is very much apreciated. Thanks.
  4. trq ment you will get banned from the remote server because script will have to many requests to server. Anyhow you are limited by php maximum execution time. Look for max_execution_time in php.ini or if you are using shared host crate phpinfo script and look there, it should be 30 or 60 seconds. If you have shared hosting, you can try to run script with loop limited at lets say 10 times (visits) with sleep pauses having in mind max execution time, than from that script you navigate to the same scrip passing the last url parameter of dynamic variable you are using via get or post method to that same scritp and that way you can have sort of infinite exectuion time. If you have dedicated server or your have *AMP on your PC you can just edit php.ini and but some crazy value.
  5. Thank you .josh, you were more than helpful. I was buliding my own pattern but now I see its to literal and not flexible like yours in the example above: $tmp = preg_match_all('/<(\w*\s\w*="\w*"\s\w*="\w*")>/', $file, $matches); I knew about escaping certain characters like / or ) but I didnt know that I can use different pattern delimiter than / /. I dont know can a ' " = < > be escaped or are they members of any special char group... I bumped on greedy matching today when I used /<.*>/ as a pattern (it just returns everything) and I noticed if I used it like this /<(.*)>/ it spits out another array without everything wraped around < >. I also bumped on lookaheads and lookbehinds today when I was unable to match the rest of that code by adding \r\n and multiple \s to match the next tag using code I posted, but I didnt quite understood them. Excilent tip about pattern flexibility, I would never tought about that and rather used literal definition with some conditions... I know wrong Now I know how much I dont know, so I will need to learn Once more, thanks.
  6. I know about DOM, I used it to create custom URLs from html page, but now I want to get in touch with regular expressions, so I dont need to parse the output, I just want it returned to see what preg_match_all registred and returned (int() is not enough info ). I touht html is good example cause of its complexity <=./text and etc. so I tryed one one example and I was like let copy/paste this html and use preg match all instead of preg_match to catch more than one ocurrence at output. I know, Im doing it wrong, learning from more complex to simple is wrong, should be the other way around So again I hit the wall, this code works: $tmp = preg_match_all('/<div class="(\w*)" id="(\w*)">/', $file, $matches); the problem is how to define the all char text until the end tag <\/div>. I tried . but does . matches all text with special characters or should I use that square braces to define what to expect there eg. [a-z*+,A-Z*+,\t,\n,\Q,\w,\W,\s,\d] this definition in braces dont work, think of it just as ilustration. One more time, Its just practice and I dont need this to be parsed. Hmmmm now I just realised something, this is the $patern - /<div class="(\w*)" id="(\w*)">/ and if I type something between / / directly it wont get returned unless it is metacharacter eg. \w so $pattern should look like this /<\w\s\w="\w"\s\w\"\w">/ (its just ilustration I know this doesnt work so tell me what I'm doing wrong here).
  7. So I have HTML file with code: <div class="className" id="va_56"> <div class="newclName"> <div class="another"> <a href="/va56/md"><img src="http://imageshack.us/someimage.jpg" id="name_56" /></a> </div> <p><a href="/va56/md">md</a></p> <p class="ptext"> <span class="de"> <span class="done">(5369 max)</span> Some text: 82% </span> </p> </div> </div> <div class="className" id="va_57"> <div class="newclName"> <div class="another"> <a href="/va57/md"><img src="http://imageshack.us/someimage2.jpg" id="name_57" /></a> </div> <p><a href="/va57/md">md</a></p> <p class="ptext"> <span class="de"> <span class="done">(469 max)</span> Some text: 50% </span> </p> </div> </div> I need to extract that nested html div tags including that frist one <div class="className" id="va_56"> and <div class="className" id="va_57"> and any other that might ocure in the script. I only managed to get either class attribute or id attribute or only some div content but without that class and id and other nested content: This is the code I used: $url = file_get_contents("file.html"); function srch($var) { /* $tmp = preg_match_all('/(<div)(<\/div>)/', $var, $matches); */ $tmp = preg_match_all('/(id="(va_\w*)")/is', $var, $matches); /* $tmp = preg_match_all('/(<div \w*)(.*)(<\/div>)/', $var, $matches); */ $result = array($tmp); array_push($result, $matches[2]); array_push($result, count($matches[2])); return $result; } $result = srch($url); echo '<pre>'; var_dump($result); echo '</pre>'; When I put something like this /(<div).(<\/div>)/ why that wont return all content between first and end div tags found, or maybe I dont need preg match all at all, instead I should use just preg_match (I know it returns only one result but If we consider just first part of posted HTML div with id va_56 would this get content between the tags - I would rather use preg_match_all to avoid loops)?
  8. First part works for me, at least for now. I forgot to mention that already tried parse_url and like Josh mentioned it has some limitation so i didnt found it useful. Thanks. @Josh Thanks, I will look into that example latter, for now its to advanced for me, tbh I have never had a situation for using regular expresions so I didnt learned them. Yeah, I'm sort of novice
  9. Hello guys, I am new at this forum. Tought it was right placeto ask this question. So I was using php function: filter_var($url, FILTER_VALIDATE_URL) but it onliy accepts input in format with protocol specified e.g. http:// I need to be able to check if url is in formats like this: http://www.example.com http://example.com www.example.com example.com I googled around and I couldn't find anything useful, tough this script was something not that bad but yet still not usefull cause it accepts domain like ".com" which is invalid. Here is the script: <?php $regex = "((https?|ftp)\:\/\/)?"; // SCHEME $regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass $regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP $regex .= "(\:[0-9]{2,5})?"; // Port $regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path $regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query $regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor $url = ".com"; if(preg_match("/^$regex$/", $url)) { echo "Valid"; } ?> Maybe this script can be improved to with some string validation e.g. Lenght of URL $domain.$tld can not be smaller than 5 char if $tld is 3 char long and URL can not be smaller than 4 char if $tld is 2 char long. But what about domains wher tld is like this - .co.uk? So my question is does anyone have such script that can share? Thanks.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.