moxol Posted July 26, 2022 Share Posted July 26, 2022 Function preg_match_all($pattern, $subject, PREG_OFFSET_CAPTURE ) returns wrong offset for matched value if there are multibyte characters in subject. It seems that PHP counts multibyte character as two characters instead of one. Is there a solution to this problem? Quote Link to comment https://forums.phpfreaks.com/topic/315092-preg_offset_capture-returns-wrong-offset-if-there-is-a-multibyte-character/ Share on other sites More sharing options...
requinix Posted July 26, 2022 Share Posted July 26, 2022 Bytes and characters are not the same thing. You think it's characters, PHP is telling you bytes. Quote Link to comment https://forums.phpfreaks.com/topic/315092-preg_offset_capture-returns-wrong-offset-if-there-is-a-multibyte-character/#findComment-1598635 Share on other sites More sharing options...
moxol Posted July 26, 2022 Author Share Posted July 26, 2022 (edited) If I use that offset in substr() then it would give me wrong substring. How to solve this issue? Edited July 26, 2022 by moxol Quote Link to comment https://forums.phpfreaks.com/topic/315092-preg_offset_capture-returns-wrong-offset-if-there-is-a-multibyte-character/#findComment-1598636 Share on other sites More sharing options...
moxol Posted July 26, 2022 Author Share Posted July 26, 2022 (edited) I think I found the soluton: $offset = mb_strlen(substr($text, 0, $offset_capture)) Is this ok? Edited July 26, 2022 by moxol Quote Link to comment https://forums.phpfreaks.com/topic/315092-preg_offset_capture-returns-wrong-offset-if-there-is-a-multibyte-character/#findComment-1598637 Share on other sites More sharing options...
requinix Posted July 26, 2022 Share Posted July 26, 2022 Is that what you want to do? Find out the "character" position for a match's byte position? You're only giving out small pieces of information at a time. It's hard to give advice on broader problems or situations when we have to work with is your use of mb_strlen and substr... Quote Link to comment https://forums.phpfreaks.com/topic/315092-preg_offset_capture-returns-wrong-offset-if-there-is-a-multibyte-character/#findComment-1598638 Share on other sites More sharing options...
moxol Posted July 26, 2022 Author Share Posted July 26, 2022 36 minutes ago, requinix said: Is that what you want to do? Find out the "character" position for a match's byte position? Yes, I am using regex preg_match_all to find offset of matches so that I could use substr to the position of PREG_OFFSET_CAPTURE. And then I would use mb_strlen(substr($text, 0, $offset_capture)) to find "real" offset if there are multibytes in the text. Quote Link to comment https://forums.phpfreaks.com/topic/315092-preg_offset_capture-returns-wrong-offset-if-there-is-a-multibyte-character/#findComment-1598639 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.