Jump to content

effigy

Staff Alumni
  • Posts

    3,600
  • Joined

  • Last visited

    Never

Everything posted by effigy

  1. Character classes only match one character. You must add a quantifier to match more. The replacement needs to be evaluated in order to act on the matched data. preg_replace('/("[^"]+")/e', 'str_replace(" ", "_", "$1")', $string);
  2. Store and work with the number in its digit form, using number_format for its display.
  3. I get three pages worth; two if the phrase is quoted.
  4. Since the diamond is preceding an "s," I assume you've got "smart quotes" on your hands. Search the forum for this.
  5. Actually, it needs another tweak in case the attributes are not quoted: %<a href=([\'"])?((??!\1)[^>\s])+\.pdf)(?(1)\1)>(.*?)</a>%si The first non-literal part of the regex looks for a single or double quote, which may not exist at all. Afterwards, it captures one character (that is not whitespace or ">") at a time, but only if it does not encounter the (optional) quote that it began with. In other words, if a single quote was found, match all of its contents up to the ending single quote; the same goes if a double quote was matched. If nothing was found, it stops at the end of the tag. It then backtracks to make sure the URL ends with ".pdf", matches the ending quote if one was found, the end of the tag, the rest of the content up to "</a>", and then "</a>" itself. Keep in mind that this regex only works if no other attributes are present and the formatting is exact. Here's a technical breakdown: NODE EXPLANATION ---------------------------------------------------------------------- <a href= '<a href=' ---------------------------------------------------------------------- ( group and capture to \1 (optional (matching the most amount possible)): ---------------------------------------------------------------------- ['"] any character of: ''', '"' ---------------------------------------------------------------------- )? end of \1 (NOTE: because you're using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \1) ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- (?: group, but do not capture (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- \1 what was matched by capture \1 ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- [^>\s] any character except: '>', whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- )+ end of grouping ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- pdf 'pdf' ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- (?(1) if back-reference \1 matched, then: ---------------------------------------------------------------------- \1 what was matched by capture \1 ---------------------------------------------------------------------- | else: ---------------------------------------------------------------------- succeed ---------------------------------------------------------------------- ) end of conditional on \1 ---------------------------------------------------------------------- > '>' ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- .*? any character (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- </a> '</a>' ----------------------------------------------------------------------
  6. %<a href=([\'"])?((??!\1).)+\.pdf)(?(1)\1)>(.*?)</a>%si
  7. <pre> <?php $html = 'this is some stuff <a href="http://www.mydomain.com/dir/thefile.pdf">Read More</a> for updating the <a href="http://youdomain.com/another.pdf">Other Stuff</a>html'; $replace = <<<REPLACE <div class="pdf"> <a href="$2" target="_blank" title="$3"> <img class="left" src="images/pdf_download.png" alt="Download PDF" width="64" height="74" /> </a> <span class="title">$3</span> <span class="info">download pdf</span> <a href="$2" target="_blank" title="$3" class="link">DOWNLOAD</a> </div> <div class="pdf-bot2"></div> REPLACE; $html = preg_replace( '%<a href=([\'"])?((?(1).+?|[^\s>]+)\.pdf)(?(1)\1)>(.*?)</a>%si', $replace, $html ); echo htmlspecialchars($html); ?> </pre>
  8. You need the s modifier so that . will also match new lines: preg_match_all('~<td class="trow1">(.*?)</td>~is', $searchresult, $topics);
  9. Actually, this is correct. I crossed my wires on the lazy/greedy portion, while the real issue is using [^>]*? rather than .*? (or with +, doesn't matter). My apologies. The difference between the lazy/greedy approach depends, as the book says, on the data.
  10. <pre> <?php $string = <<<STR <a href='blah' id=1234.2>[FLAG] something</a> <a href='blah' id=829.1>somethingelse</a> <a href='blah' id=634.5>somerandomcharlength</a> STR; preg_match_all('%<a[^>]+id=([\d.]+)[^>]*>(?!\[FLAG\]\s)%si', $string, $matches); array_shift($matches); print_r($matches); ?> </pre>
  11. The concern isn't of id= being outside of a tag, but of a tag not having id=. In this instance the regex would keep consuming data--going outside of the tag and running into another, possibly not even an a--until it finds id=. Arguably, the data in question may always have id= in the a; however, (1) data may change; and (2) [^>]* will work in both cases. Additionally, according to Mastering Regular Expressions:
  12. So what do you want? What you posted is the latest news entry.
  13. %</small>\s*</p>\s*<p>(.*?)</p>%si
  14. When you're working with a known format--e.g., HTML tags begin with "<" and end with ">"--conform to these rules in your pattern: don't use <a.*?...> but <a[^>]*...>. Not only is the greediness optimal, but safer, ensuring that you stay within your tag boundary.
  15. Look into the /e modifier; this will let you pass $1 after it is defined by the regular expression, rather than having it evaluated beforehand.
  16. Have you tried html_entity_decode? I would compare the strings in this fashion since you wouldn't run across an instance of comparing a named entity to a numerical.
  17. I would use /(<form[^>]+action="process.php"[^>]*>)/ since greediness is faster. It also constrains the pattern within tag boundaries just in case you encounter some bad HTML.
  18. <pre> <?php $html = 'http://www.google.com<br>Go there for a cool search engine!'; ### Similar to strip_tags, but replace with a space. $html = preg_replace('/<[^>]*>/', ' ', $html); preg_match('%https?://\S+(?<!\p{P})%i', $html, $matches); print_r($matches); ?> </pre>
  19. Well, what I was really after is: are you using any HTML or XML tools for this? Typically these will handle entities, isolation of content, etc.
  20. That data works in the example code: <pre> <?php $html = <<<HTML http://www.google.com Go there for a cool search engine! HTML; preg_match('%https?://\S+(?<!\p{P})%i', strip_tags($html), $matches); print_r($matches); ?> </pre> What else is happening in your code?
  21. How about something like this? <pre> <?php $html = <<<HTML <a href="http://www.phpfreaks.com">PHP Freaks</a> <a href="http://www.google.com/index.html">Visit http://www.google.com!</a> HTML; preg_match('%https?://\S+(?<!\p{P})%i', strip_tags($html), $matches); print_r($matches); ?> </pre>
  22. Do you want to pull URLs from tags, content, or both?
  23. What is the format of these entries? HTML? Prose? Anything?
  24. %https?://[^\"\s>]+%i Will the URLs always be double quoted?
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.