Sandeep590 Posted September 8, 2016 Share Posted September 8, 2016 Hello Everyone, I have a problem with the regular expression in PHP. Input Given : 9780735585157 (pbk.;teacher's manual) Expected Output : 9780735585157 For example : 9780735585157 (pbk.;teacher's manual) - Here , I have written a PHP code in such a way to split the portion of text using delimiter ; . so this is making the record to split in two different texts such as 9780735585157(pbk. and teacher's manual. Now, My question is , how to write a php code to replace the (pbk.;teacher's manual) with null or empty so that only 9780735585157 will be displayed in the output. Note that there are other several records with such delimiters in the parenthesis which is making the original text to be displayed in two parts. Kindly help me on this issue. With Regards, Sandeep. Quote Link to comment https://forums.phpfreaks.com/topic/302108-issue-with-php-regular-expressions/ Share on other sites More sharing options...
Jacques1 Posted September 8, 2016 Share Posted September 8, 2016 Where does this weird input come from? Why is there no proper data structure which separates the ISBN(?) from other information? Quote Link to comment https://forums.phpfreaks.com/topic/302108-issue-with-php-regular-expressions/#findComment-1537195 Share on other sites More sharing options...
Sandeep590 Posted September 9, 2016 Author Share Posted September 9, 2016 To answer your question number 1 : The input comes from a text file which has been not properly formatted. There is no way that we can format it as it contains huge number of records. To answer your question number 2 : I don't have control on why there is no proper data structure which separates 10 or 13 digit ISBN number from the input textfile. Is there any solution which you can provide me so that i would be thankful to you . Quote Link to comment https://forums.phpfreaks.com/topic/302108-issue-with-php-regular-expressions/#findComment-1537202 Share on other sites More sharing options...
benanamen Posted September 9, 2016 Share Posted September 9, 2016 (edited) You could just search and replace the ( with a ,( which will give you a comma separated list as long as that format is consistent in your data meaning numbers followed by a left ellipses. That would give you 9780735585157,(pbk.;teacher's manual) Then you could do whatever you want with the properly formatted data. Edited September 9, 2016 by benanamen Quote Link to comment https://forums.phpfreaks.com/topic/302108-issue-with-php-regular-expressions/#findComment-1537204 Share on other sites More sharing options...
Jacques1 Posted September 9, 2016 Share Posted September 9, 2016 Since you aren't dealing with a proper data structure, I would not make any assumptions about details like the presence of a semicolon. One line (appearently) consists of an ISBN followed by additional information in parentheses: <?php const BOOK_REGEX = '~\\A(?<isbn>[\\d-]+)\\s*\([^)]+\)\s*\\z~'; $bookFile = fopen('/path/to/file', 'r'); $isbnCollection = []; $lineNumber = 1; $matches = null; while ($line = fgets($bookFile)) { if (!ctype_space($line)) { if (preg_match(BOOK_REGEX, trim($line), $matches)) { $isbnCollection[] = $matches['isbn']; } else { echo "Malformed line {$lineNumber}: ".htmlspecialchars($line, ENT_QUOTES | ENT_SUBSTITUTE, 'UTF-8').'<br>'; } } $lineNumber++; } var_dump($isbnCollection); fclose($bookFile); Quote Link to comment https://forums.phpfreaks.com/topic/302108-issue-with-php-regular-expressions/#findComment-1537207 Share on other sites More sharing options...
requinix Posted September 9, 2016 Share Posted September 9, 2016 A quick adjustment to Jacques's regex: ISBN-10s can end with an X instead of the final digit, so (?[\\d-]+[Xx]?). Quote Link to comment https://forums.phpfreaks.com/topic/302108-issue-with-php-regular-expressions/#findComment-1537217 Share on other sites More sharing options...
Sandeep590 Posted September 9, 2016 Author Share Posted September 9, 2016 Well , thank you so much all for your valuable inputs and suggestions. The regular expression which I have used to extract the 10 digit or 13 digit ISBN number is displayed below. preg_match_all('/\d+(?:\d|X)/',$str,$matches); Where $str is the string which to be parsed and $matches is the output result. Quote Link to comment https://forums.phpfreaks.com/topic/302108-issue-with-php-regular-expressions/#findComment-1537245 Share on other sites More sharing options...
Jacques1 Posted September 9, 2016 Share Posted September 9, 2016 (edited) This makes exactly zero sense, because now you're matching any numbers, not ISBNs. What a disappointment after this long discussion. Edited September 9, 2016 by Jacques1 Quote Link to comment https://forums.phpfreaks.com/topic/302108-issue-with-php-regular-expressions/#findComment-1537246 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.