Regular expression for matching spaced letters needed - shell
This is a discussion on Regular expression for matching spaced letters needed - shell ; Hi all, please tell me, if this isn't the right place to ask. I am looking for a regular expression which matches "spaced words" in a sentence like, for example, "M I L L E R" or "B u s ...
![]() |
| | LinkBack | Thread Tools | Display Modes |
|
#1
| |||
| |||
| please tell me, if this isn't the right place to ask. I am looking for a regular expression which matches "spaced words" in a sentence like, for example, "M I L L E R" or "B u s h" and replaces them with "Miller" or "Bush". I found out that sub!(/\s?([A-Z])\s([A-Z])\s([A-Z])\s([A-Z])\s([A-Z])\s?/,' \1\2\3\4\5 ') works for 4 Letters. (Analogue for any other fix number.) But I just can't figure out how to match any number and how to don't match any letters of surrounding words. "A B C D E" -> "ABCDE" "a b c d" -> "abcd" "hello A B C test" -> "hello ABC test" Think you got it. Which regexp will do the job? Regards Jan |
|
#2
| |||
| |||
|
Jan Fischer wrote: > > please tell me, if this isn't the right place to ask. > I am looking for a regular expression which matches "spaced words" in a > sentence like, for example, "M I L L E R" or "B u s h" and replaces them > with "Miller" or "Bush". I would guess that your file is a (Windows) UTF-16 encoded file. If so, you can use iconv to convert it to a UTF-8 file which doesn't have that problem. John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall |
|
#3
| |||
| |||
|
On 2008-11-13, Jan Fischer > please tell me, if this isn't the right place to ask. > I am looking for a regular expression which matches "spaced words" in a > sentence like, for example, "M I L L E R" or "B u s h" and replaces them > with "Miller" or "Bush". > I found out that > sub!(/\s?([A-Z])\s([A-Z])\s([A-Z])\s([A-Z])\s([A-Z])\s?/,' \1\2\3\4\5 ') > works for 4 Letters. (Analogue for any other fix number.) But I just > can't figure out how to match any number and how to don't match any > letters of surrounding words. > > "A B C D E" -> "ABCDE" > "a b c d" -> "abcd" > "hello A B C test" -> "hello ABC test" This is not a terribly well-defined problem. Should "A b r i d g e" be interpreted as "Abridge" or "A bridge" ? Assuming the former, you might want to try something along these lines : while (s/(\s|@|^)(\w)(\s)(\w)(\s|@|$)/\1\2@\4\5/g) ; s/@//g; where "@" is any string or character that does not appear in your data (and does not match \s). Dealing with punctuation is left as an exercise. -- André Majorel You measure democracy by the freedom it gives its dissidents, not the freedom it gives its assimilated conformists -- Abbie Hoffman. |
![]() |
« Previous Thread
|
Next Thread »
| Thread Tools | |
| Display Modes | |
| |
All times are GMT -4. The time now is 04:21 PM.




Linear Mode