Modernvox Posted October 24, 2009 Share Posted October 24, 2009 /mcy/1435258204.html Tried ('~^/[a-z0-9][^/]($html)~' Can you get it? Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/ Share on other sites More sharing options...
Daniel0 Posted October 24, 2009 Share Posted October 24, 2009 What are you trying to do? Extract the number? Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943393 Share on other sites More sharing options...
Modernvox Posted October 24, 2009 Author Share Posted October 24, 2009 What are you trying to do? Extract the number? No it's a link. I want the link. Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943544 Share on other sites More sharing options...
nrg_alpha Posted October 24, 2009 Share Posted October 24, 2009 Admittedly, your initial post doesn't explain things fully. Are you specifically looking for ".html" at the end? Could it be any file extension? Is the amount of directories the same, or can they be different on a case by case basis? Will the file name always contain only numbers, or a mix of letters and/or numbers? It's helpful to provide multiple samples of what you are sifting though, demonstrating a variety of strings, explaining what will be consistent, what might change, and what exactly you are trying to match / capture. Going by the info provided so far, this is what I 'assume' is what you are looking for (using preg_match as an example): $str = '/mcy/1435258204.html'; preg_match('#^/[a-z]+/[0-9]+\.html$#i', $str, $match); echo $match[0]; // Output: /mcy/1435258204.html But again, without much explanation, it's not clear on the conditions to be honest. You can read more about helpful suggestions here. Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943562 Share on other sites More sharing options...
Modernvox Posted October 24, 2009 Author Share Posted October 24, 2009 Admittedly, your initial post doesn't explain things fully. Are you specifically looking for ".html" at the end? Could it be any file extension? Is the amount of directories the same, or can they be different on a case by case basis? Will the file name always contain only numbers, or a mix of letters and/or numbers? It's helpful to provide multiple samples of what you are sifting though, demonstrating a variety of strings, explaining what will be consistent, what might change, and what exactly you are trying to match / capture. Going by the info provided so far, this is what I 'assume' is what you are looking for (using preg_match as an example): $str = '/mcy/1435258204.html'; preg_match('#^/[a-z]+/[0-9]+\.html$#i', $str, $match); echo $match[0]; // Output: /mcy/1435258204.html But again, without much explanation, it's not clear on the conditions to be honest. You can read more about helpful suggestions here. The / is always at the beginning and the html is always at the end. There is a vertical list with more which i will tackle after i can actually wrap my head around the regex. I just finished reading O'Reilly's mastering regular expressions vol. 2 , but the biggest problem i am having is knowing how to wrap the regex statement in general. It seems i am seeing folks use different characters and it is confusing the hell out me? Is it (....) is it ~...~ is it "...." is it '...' There must be a standard character to close this function. the if statement is enclosed in { ...} You know what i'm saying? Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943569 Share on other sites More sharing options...
cags Posted October 24, 2009 Share Posted October 24, 2009 In PCRE regular expressions the pattern must be enclosed between delimeters. These delimeters can be a large selection of characters, with alphanumeric characters being the biggest exception. Generally speaking people just choose a char that is unlikely to appear in their pattern as this reduces the amount of escaping required. In your original post you used the tilde character wheras nrg_alpha used the hash. It really makes no great difference. Whilst generally speaking the delimeters should be the same character, I started a recent thread that discussed the fact that you can also use a couple of 'sets' as the delimiters such as {} and <> etc. The characters that are included after the closing delimiter are whats called pattern modifiers. Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943575 Share on other sites More sharing options...
Modernvox Posted October 24, 2009 Author Share Posted October 24, 2009 In PCRE regular expressions the pattern must be enclosed between delimeters. These delimeters can be a large selection of characters, with alphanumeric characters being the biggest exception. Generally speaking people just choose a char that is unlikely to appear in their pattern as this reduces the amount of escaping required. In your original post you used the tilde character wheras nrg_alpha used the hash. It really makes no great difference. Whilst generally speaking the delimeters should be the same character, I started a recent thread that discussed the fact that you can also use a couple of 'sets' as the delimiters such as {} and <> etc. The characters that are included after the closing delimiter are whats called pattern modifiers. love you Cags, but you just confused the shit out of me. I'm going to go read mastering regular expressions again Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943577 Share on other sites More sharing options...
nrg_alpha Posted October 24, 2009 Share Posted October 24, 2009 With regards to delimiters, the only thing to remember is that they can be any non-white space, non alpha numeric ASCII characters other than a backslash (or null byte apparently). You can read up on delimiters here. With regards to (...), "...", '...' etc.. I'm not sure I follow.. perhaps posting a small portion of code you are trying to use with regards to regex will help out. NOTE: cags basically cut and paste what I linked too in the php manual.. D'oh! Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943580 Share on other sites More sharing options...
Modernvox Posted October 24, 2009 Author Share Posted October 24, 2009 I'd like to add that i don't want to be limited to grabbing one link so is creating the $str variable necessary for this? There are about 50 links on each page i just the first link as an example. Cags helped me out with a similar preg to match email addresses. This time i'm attempting to grab some thinks that i can open and grab that email address (the one cags assisted me with Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943581 Share on other sites More sharing options...
nrg_alpha Posted October 24, 2009 Share Posted October 24, 2009 No, $str is only an example I used.. if you want multiple links, you could use preg_match_all.. granted, typically, when dealing with parsing html, it's wiser to use dom for this kind of thing (but that's an entirely different ball of wax). Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943582 Share on other sites More sharing options...
salathe Posted October 24, 2009 Share Posted October 24, 2009 There must be a standard character to close this function. Generally forward slashes (/), though not if they occur within the pattern (common with parsing URIs or HTML). In the latter case, common alternatives are tilde (~) or hash/pound (#). E.g. /foobar\.html/i /\/foo\/bar\.html/ <-- ugly ~/foo/bar\.html~ Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943586 Share on other sites More sharing options...
salathe Posted October 24, 2009 Share Posted October 24, 2009 I can't edit my previous post to provide an example of grabbing the links that you want, so it'll have to be a double-post (sorry if you guys frown on that!). $html = file_get_contents($url); $pattern = '#<a href="(/mcy/\d{10}\.html)">#'; preg_match_all($pattern, $html, $matches); echo "Links:\n"; foreach ($matches[1] as $link) { echo $link . "\n"; } Will output something like (shortened to save scrolling): Links: /mcy/1435866184.html /mcy/1435864882.html /mcy/1435864500.html ... /mcy/1435673391.html /mcy/1435671439.html Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943589 Share on other sites More sharing options...
Modernvox Posted October 24, 2009 Author Share Posted October 24, 2009 I can't edit my previous post to provide an example of grabbing the links that you want, so it'll have to be a double-post (sorry if you guys frown on that!). $html = file_get_contents($url); $pattern = '#<a href="(/mcy/\d{10}\.html)">#'; preg_match_all($pattern, $html, $matches); echo "Links:\n"; foreach ($matches[1] as $link) { echo $link . "\n"; } Will output something like (shortened to save scrolling): Links: /mcy/1435866184.html /mcy/1435864882.html /mcy/1435864500.html ... /mcy/1435673391.html /mcy/1435671439.html Why thank you SIR Salathe' your time is and alwayd is much appreciated as well as your Wisdom. By the way, how a bout sending some of that wisdom this way in the form of let's say a "brain swap"? Oh yeah, the mcy is not included in every string i want so i need to 86 that part. Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943617 Share on other sites More sharing options...
salathe Posted October 24, 2009 Share Posted October 24, 2009 Why thank you SIR Salathe' your time is and alwayd is much appreciated as well as your Wisdom. By the way, how a bout sending some of that wisdom this way in the form of let's say a "brain swap"? I'm not too sure I'd be up for a brain swap (though I'm sure yours is a lovely brain) but keep on posting questions and I'll keep posting replies (and maybe some answers). Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943620 Share on other sites More sharing options...
Modernvox Posted October 24, 2009 Author Share Posted October 24, 2009 Why thank you SIR Salathe' your time is and alwayd is much appreciated as well as your Wisdom. By the way, how a bout sending some of that wisdom this way in the form of let's say a "brain swap"? I'm not too sure I'd be up for a brain swap (though I'm sure yours is a lovely brain) but keep on posting questions and I'll keep posting replies (and maybe some answers). [/quote You sure? I hear brain swapping is in! $html = file_get_contents($url); Out put = the text "links" ONLY without the actual links. <?php function curlURL($url) { $curl = curl_init(); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2'); $output = curl_exec($curl); return $output; } $url = "http://southcoast.craigslist.org/tls/1432616932.html"; $html = file_get_contents($url); $pattern = '#<a href="(/mcy/\d{10}\.html)">#'; preg_match_all($pattern, $html, $matches); echo "Links:\n"; foreach ($matches[1] as $link) { echo $link . "\n"; } Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943628 Share on other sites More sharing options...
Modernvox Posted October 24, 2009 Author Share Posted October 24, 2009 I can't edit my previous post to provide an example of grabbing the links that you want, so it'll have to be a double-post (sorry if you guys frown on that!). $html = file_get_contents($url); $pattern = '#<a href="(/mcy/\d{10}\.html)">#'; preg_match_all($pattern, $html, $matches); echo "Links:\n"; foreach ($matches[1] as $link) { echo $link . "\n"; } Will output something like (shortened to save scrolling): Links: /mcy/1435866184.html /mcy/1435864882.html /mcy/1435864500.html ... /mcy/1435673391.html /mcy/1435671439.html This not working for me. Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943673 Share on other sites More sharing options...
cags Posted October 24, 2009 Share Posted October 24, 2009 Show us the string you are trying to match (including and surrounding text). Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943674 Share on other sites More sharing options...
Modernvox Posted October 24, 2009 Author Share Posted October 24, 2009 Show us the string you are trying to match (including and surrounding text). Here's the (links) only, that i am trying to grab: <p><a href="/pts/1436241251.html">1989 Jeep Wrangler parts - $600 -</a><font size="-1"> (bergen county)</font> <span class="p"> pic</span> <<<i><a href="/pts/">auto parts</a></i></p> <p><a href="/vgm/1436241144.html">Xbox 360 / Wii / PSP/iPhone Flashing - $30 -</a><font size="-1"> (Roselle)</font> <<<i><a href="/vgm/">video gaming</a></i></p> <p><a href="/emd/1436239956.html">OVER 200 CASSETTE'S ROCK COUNTRY - $50 -</a><font size="-1"> (RANDOLPH)</font> <<<i><a href="/emd/">cds / dvds / vhs</a></i></p> <p><a href="/bfs/1436240970.html">Business for Sale - $195000 -</a><font size="-1"> (Newark)</font> <<<i><a href="/bfs/">business/commercial</a></i></p> <p><a href="/ele/1436240954.html">SONY 27-inch TRINITRON Flat Screen (Great TV!) - $150 -</a><font size="-1"> (Belleville)</font> <span class="p"> pic</span> <<<i><a href="/ele/">electronics</a></i></p> <p><a href="/hsh/1436240849.html">Wooden Doorway Gate - $9 -</a><font size="-1"> (Denville, NJ)</font> <span class="p"> pic</span> <<<i><a href="/hsh/">household items</a></i></p> <p><a href="/pts/1436240687.html">Oldsmobile 1965-1966 gasket set -</a><font size="-1"> (Pequannock)</font> <span class="p"> pic</span> <<<i><a href="/pts/">auto parts</a></i></p> <p><a href="/cto/1436240412.html">1994 Volvo 940 Sedan - $1200 -</a><font size="-1"> (Springfield, NJ)</font> <span class="p"> pic</span> <<<i><a href="/cto/">cars & trucks - by owner</a></i></p> <p><a href="/hsh/1436239243.html">Pet Travel Kennel - $18 -</a><font size="-1"> (Parsippany, NJ)</font> <span class="p"> pic</span> <<<i><a href="/hsh/">household items</a></i></p> Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943683 Share on other sites More sharing options...
Modernvox Posted October 24, 2009 Author Share Posted October 24, 2009 I can't edit my previous post to provide an example of grabbing the links that you want, so it'll have to be a double-post (sorry if you guys frown on that!). $html = file_get_contents($url); $pattern = '#<a href="(/mcy/\d{10}\.html)">#'; preg_match_all($pattern, $html, $matches); echo "Links:\n"; foreach ($matches[1] as $link) { echo $link . "\n"; } Will output something like (shortened to save scrolling): Links: /mcy/1435866184.html /mcy/1435864882.html /mcy/1435864500.html ... /mcy/1435673391.html /mcy/1435671439.html Why thank you SIR Salathe' your time is and alwayd is much appreciated as well as your Wisdom. By the way, how a bout sending some of that wisdom this way in the form of let's say a "brain swap"? Oh yeah, the mcy is not included in every string i want so i need to 86 that part. foreach ($matches[1] as $link) { //This part of the code is confusing me, why the [1] in there? Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943686 Share on other sites More sharing options...
cags Posted October 24, 2009 Share Posted October 24, 2009 The solution provided by salathe has a literal mcy in the string, which those links don't. You'd need to use something more like... #<a href="(/[a-z]{3}/\d{10}\.html)">#' Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943687 Share on other sites More sharing options...
cags Posted October 24, 2009 Share Posted October 24, 2009 preg_match_all returns a multi dimensional array. $matches[0] will contain all strings that match the entire pattern. So for example in the string you just provided $matches[0][0] will contain... <a href="/pts/1436241251.html"> $matches[1] contains an array of all patterns matched by the first capture group (content inside the first set of parentheses/brackets), so using your example again... $matches[1][0] contains... /pts/1436241251.html Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943691 Share on other sites More sharing options...
Modernvox Posted October 24, 2009 Author Share Posted October 24, 2009 The solution provided by salathe has a literal mcy in the string, which those links don't. You'd need to use something more like... #<a href="(/[a-z]{3}/\d{10}\.html)">#' Ok so / for the beginning (why you not use ^) Then [a-z] is is pretty self explanatory Then you have {3} (Not sure i understand this one?) Then you have /\ I imagine \d {10} represents the 10 digits? Finally you have \ but why? Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943695 Share on other sites More sharing options...
Modernvox Posted October 24, 2009 Author Share Posted October 24, 2009 The solution provided by salathe has a literal mcy in the string, which those links don't. You'd need to use something more like... #<a href="(/[a-z]{3}/\d{10}\.html)">#' Ok so / for the beginning (why you not use ^) Then [a-z] is is pretty self explanatory Then you have {3} (Not sure i understand this one?) Then you have /\ I imagine \d {10} represents the 10 digits? Finally you have \ but why? Ok i get the {3} which is for 3 characters. Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943703 Share on other sites More sharing options...
cags Posted October 24, 2009 Share Posted October 24, 2009 #<a href="(/[a-z]{3}/\d{10}\.html)"># # - opening delimiter <a href=" - literal string ie find this exact pattern ( - start a new capture group / - literal forward slash (as all the links start with a forward slash) [a-z]{3} - 3 letters / - literal forward slash \d{10} - a 10 digit number \. - a full stop character (the backslash escapes it as it is a special character) html - another literal string ) - close capture group "> - yet more literal characters # - ending delimiter Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943706 Share on other sites More sharing options...
Modernvox Posted October 24, 2009 Author Share Posted October 24, 2009 file_get_contents($url) Not sure i like this method. Seems a bit much, considering it shows error stating it can't be empty? Isn't it easier just to say $curlResults = curlURL("http://newjersey.craigslist.org/sss/"); preg_match_all (#<a href="(/[a-z]{3}/\d{10}\.html)">#', $curlResults, $out); echo $out[1][0]; Quote Link to comment https://forums.phpfreaks.com/topic/178815-solved-small-stringcant-get-the-right-combo-to-pull-it-in/#findComment-943708 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.