terrypin Posted January 22, 2012 Share Posted January 22, 2012 I'm hoping one of the experts can help please. I have a text file that looks like this: --- Start paste --- [blackfordLane.jpg] File name = BlackfordLane.jpg Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\ Compression = JPEG, quality: 87, subsampling OFF Resolution = 96 x 96 DPI File date/time = 19/01/2012 / 15:01:23 - IPTC - Object Name - s bridge over the River Thames is not a footbridge but carries pipes. - COMMENT - Thames Path on Blackford Lane heading towards Blackford Farm, east of Castle Eaton. [Castle Eaton Church.jpg] File name = Castle Eaton Church.jpg Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\ Compression = JPEG, quality: 87, subsampling OFF Resolution = 72 x 72 DPI File date/time = 19/01/2012 / 14:03:55 - EXIF - Make - FUJIFILM Model - FinePix2600Zoom Orientation - Top left XResolution - 72 YResolution - 72 ResolutionUnit - Inch - COMMENT - Castle Eaton Church [CastleEaton-2.jpg] File name = CastleEaton-2.jpg Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\ Compression = JPEG, quality: 75 Resolution = 0 x 0 DPI File date/time = 18/01/2012 / 15:40:05 - COMMENT - The Red Lion, Castle Eaton A warm welcoming pub on a cold winter's day, with the River Thames running at the bottom of the garden. And this is what I want to get as a result: BlackfordLane.jpg Thames Path on Blackford Lane heading towards Blackford Farm, east of Castle Eaton. Castle Eaton Church.jpg Castle Eaton Church CastleEaton-2.jpg The Red Lion, Castle Eaton A warm welcoming pub on a cold winter's day, with the River Thames running at the bottom of the garden. My first line of attack is to try for a Regex expression that will Find everything (for example) between the ']' of '[blackfordLane.jpg]' and the '-' of '- COMMENT -'? That would leave only a little tidying up, I think. But so far it's eluded me after a couple of hours. The best I could come up with was the following to delete all lines from File name... to File date/time (with the Replace box empty): File name = .*\nDirectory = .*\nCompression = .*\nResolution = .*\nImage dimensions = .*\nPrint size = .*\nColor depth = .*\nNumber of unique colors = .*\nDisk size = .*\nCurrent memory size = .*\nFile date/time = .*\n But that's only part of the task and seems very inelegant. Any suggestions please? -- Terry, East Grinstead, UK Quote Link to comment Share on other sites More sharing options...
joe92 Posted January 22, 2012 Share Posted January 22, 2012 Hi, if this were in a database it would be so much easier Anyway, this is untested but I think your best line of approach is to preg_match_all both items and match up the matches. So the first is a file name which is enclosed in square brackets try this: preg_match_all("/(?<=[)[^\]]/sm", $text, $file_matches); Then for the comment, a bit trickier. It always starts after '- COMMENT -' and ends when the next item begins with a square bracket. It can also cover multiple lines so we can't stop at the end line. You could try the following which is almost identical to above, but you would have to make sure that there were no start square brackets within the comment: preg_match_all("/(?<=- COMMENT -)[^\[]/sm", $text, $comment_matches); Preg_match_all will place the matches into a multi dimensional array and since there is a comment for every file name you should just be able to match up the matches and if there isn't (but there's still the word comment) it will just match up the blank space inbetween. So to print the results: <?php //file name in position $file_matches[0][0] will match up with comment in position $comment_matches[0][0] etc $count = count($file_matches[0]); for($i=0;$i<$count;++$i) { echo $file_matches[0][$i].'<br/>'; echo $comment_matches[0][$i]; } Hope this helps you, Joe Quote Link to comment Share on other sites More sharing options...
ragax Posted January 22, 2012 Share Posted January 22, 2012 Hy Terrypin, Didn't have time to look at Joe's solution, rushing out, just wanted to give you a preg_replace option. You can run this php code. The Regex: ,(?sm)\[([^]]+.jpg)\].*?- COMMENT -(\r\n[^[]*), Code: <?php $regex=',(?sm)\[([^]]+.jpg)\].*?- COMMENT -(\r\n[^[]*),'; $string='[blackfordLane.jpg] File name = BlackfordLane.jpg Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\ Compression = JPEG, quality: 87, subsampling OFF Resolution = 96 x 96 DPI File date/time = 19/01/2012 / 15:01:23 - IPTC - Object Name - s bridge over the River Thames is not a footbridge but carries pipes. - COMMENT - Thames Path on Blackford Lane heading towards Blackford Farm, east of Castle Eaton. [Castle Eaton Church.jpg] File name = Castle Eaton Church.jpg Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\ Compression = JPEG, quality: 87, subsampling OFF Resolution = 72 x 72 DPI File date/time = 19/01/2012 / 14:03:55 - EXIF - Make - FUJIFILM Model - FinePix2600Zoom Orientation - Top left XResolution - 72 YResolution - 72 ResolutionUnit - Inch - COMMENT - Castle Eaton Church [CastleEaton-2.jpg] File name = CastleEaton-2.jpg Directory = C:\Docs\My Videos\PROJECTS\Thames Path Walk Projects\TP03 Project\Geograph Photos\GeoDay2\ Compression = JPEG, quality: 75 Resolution = 0 x 0 DPI File date/time = 18/01/2012 / 15:40:05 - COMMENT - The Red Lion, Castle Eaton A warm welcoming pub on a cold winter\'s day, with the River Thames running at the bottom of the garden. '; $s=preg_replace($regex,'\1\2',$string); echo '<pre>'.$s.'</pre>'; ?> Output: BlackfordLane.jpg Thames Path on Blackford Lane heading towards Blackford Farm, east of Castle Eaton. Castle Eaton Church.jpg Castle Eaton Church CastleEaton-2.jpg The Red Lion, Castle Eaton A warm welcoming pub on a cold winter's day, with the River Thames running at the bottom of the garden. Didn't have time to look at the fine details, let me know if that works for you. Quote Link to comment Share on other sites More sharing options...
terrypin Posted January 22, 2012 Author Share Posted January 22, 2012 Thanks Joe, much appreciate that fast response. However, I realise now that I forgot an important point! My post here turns out to be somewhat OT, as I'm not a PHP user. Not even a programmer, just an end user using Regex ((the POSIX version apparently) in my text editor, TextPad. I had assumed that 'PHP Regex' would be close enough for me to make any necessary syntax changes. But obviously I was mistaken, as TextPad Regex looks quite different. No preg_match_all for example! But your post has inspired me to come at the problem from a totally different angle. Instead of trying to find those strings I described, and replace them with blanks, I should Find the individual names inside the initial square brackets, giving me the filename. And then the line(s) directly after '- COMMENT -' and before the following open square bracket. Both of those should be fairly easy I think. The challenge then however is how to do this for all such pairs? BTW, this is the first forum I've joined (I use scores of them) that makes me complete image verification and answer test questions even though I've already regsitered! Bit OTT isn't it? -- Terry, East Grinstead, UK Quote Link to comment Share on other sites More sharing options...
joe92 Posted January 22, 2012 Share Posted January 22, 2012 Ah, well I'm afraid I have very little knowledge on POSIX regex as I've only learnt PCRE (do note that the patterns are different between POSIX and PCRE). That's because POSIX regex functions were deprecated in php as of version 5.3.0 and as a result this board is ultimately PCRE only now too. I have never used Textpad so can't help you there either. Maybe you could try a textpad forum if there is one? I think it's the spammers which have made you do image verification etc.. When I changed my registered email they deactivated my account till I verified my new one... what if I'd made a typo! haha. Good luck, Joe Quote Link to comment Share on other sites More sharing options...
ragax Posted January 22, 2012 Share Posted January 22, 2012 The challenge then however is how to do this for all such pairs? Hi again Terry, If you don't have PHP, for the simple REPLACE approach I gave you above, I'd use a program that has regex search-and-replace capabilities. Two that I like: EditPadPro, Aba Search and Replace. There's also some regex replace functionality in some Adobe programs (Dreamweaver, Indesign). The regex flavor there is probably strong enough for the expression I gave you, which is fairly simple. Some of the IDEs have regex functionality: Code::Blocks, NetBeans. I haven't fully tested them. Let me know if you need any help with the two linked tools or the Adobe tools. Quote Link to comment Share on other sites More sharing options...
terrypin Posted January 22, 2012 Author Share Posted January 22, 2012 Thanks Playful, appreciate your help. The Regex in TextPad seems pretty good, with the usual Find/Replace functionality: http://dl.dropbox.com/u/4019461/TextPad-Regex-1.jpg But its repertoire and syntax as I said seems radically different. In that code you suggested ',(?sm)\[([^]]+.jpg)\].*?- COMMENT -(\r\n[^[]*),' I don't recognise/understand: - the commas - '(?sm) - [^] - \r (although I think it means CR? Why do I need that? Isn't Return ' \n' sufficient? - * on its own, instead of '.*' As mentioned, I'm just a Regex novice, so maybe much of this is obvious stuff to you! -- Terry, East Grinstead, UK Quote Link to comment Share on other sites More sharing options...
ragax Posted January 22, 2012 Share Posted January 22, 2012 Hi Terry, The Regex in TextPad seems pretty good, From what you sent, I'd say very basic. But maybe there's more. The expression I sent is meant to work with a full-blown regex flavor. I don't recognise/understand: - the commas - '(?sm) - [^] - \r (although I think it means CR? Why do I need that? Isn't Return ' \n' sufficient? - * on its own, instead of '.*' The commas are delimiters. They're part of the php code I sent you. If you're not using php (although this is the phpfreaks forum), then omit the commas when you paste the expression in your tool. For instance it works in regexbuddy. (?sm) turns on "dot matches new line" and "multiline" modes [^[] Means anything that is not an opening square bracket. (The caret here stands for NOT) \r is a carriage return, whether you need \r\n or \n depends on your OS. \r\n for Windows. * means zero or more. That's what it means in .* and in [^[]* Hope this helps, don't hesitate to ask more. Quote Link to comment Share on other sites More sharing options...
abareplace Posted January 23, 2012 Share Posted January 23, 2012 In TextPad, .* and [^[]* cannot span multiple lines. I've tried \(.\|\n\)* It does not help, too Generally, there are some problems with multi-line searches in TextPad. Quote Link to comment Share on other sites More sharing options...
terrypin Posted January 23, 2012 Author Share Posted January 23, 2012 Thanks for the follow-ups. Still experimenting. Will report back when I have a clearer picture. I suspect TextPad (for the basic regex) plus my macro program (for the iteration across multiple lines) might be my best approach. I'm also determined to master more of Regex itself! -- Terry, East Grinstead, UK Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.