dbaspot
Tags Register FAQ Calendar Search Today's Posts Mark Forums Read

Regular expression for matching spaced letters needed - shell

This is a discussion on Regular expression for matching spaced letters needed - shell ; Hi all, please tell me, if this isn't the right place to ask. I am looking for a regular expression which matches "spaced words" in a sentence like, for example, "M I L L E R" or "B u s ...


Home > Database Forum > Operating Systems > shell > Regular expression for matching spaced letters needed

Reply

 

LinkBack Thread Tools Display Modes
  #1  
Old 11-13-2008, 06:22 AM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Regular expression for matching spaced letters needed

Hi all,

please tell me, if this isn't the right place to ask.
I am looking for a regular expression which matches "spaced words" in a
sentence like, for example, "M I L L E R" or "B u s h" and replaces them
with "Miller" or "Bush".
I found out that
sub!(/\s?([A-Z])\s([A-Z])\s([A-Z])\s([A-Z])\s([A-Z])\s?/,' \1\2\3\4\5 ')
works for 4 Letters. (Analogue for any other fix number.) But I just
can't figure out how to match any number and how to don't match any
letters of surrounding words.

"A B C D E" -> "ABCDE"
"a b c d" -> "abcd"
"hello A B C test" -> "hello ABC test"

Think you got it. Which regexp will do the job?

Regards Jan
Reply With Quote
  #2  
Old 11-13-2008, 07:13 AM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: Regular expression for matching spaced letters needed

Jan Fischer wrote:
>
> please tell me, if this isn't the right place to ask.
> I am looking for a regular expression which matches "spaced words" in a
> sentence like, for example, "M I L L E R" or "B u s h" and replaces them
> with "Miller" or "Bush".


I would guess that your file is a (Windows) UTF-16 encoded file. If so,
you can use iconv to convert it to a UTF-8 file which doesn't have that
problem.



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Reply With Quote
  #3  
Old 11-13-2008, 05:14 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: Regular expression for matching spaced letters needed

On 2008-11-13, Jan Fischer wrote:

> please tell me, if this isn't the right place to ask.
> I am looking for a regular expression which matches "spaced words" in a
> sentence like, for example, "M I L L E R" or "B u s h" and replaces them
> with "Miller" or "Bush".
> I found out that
> sub!(/\s?([A-Z])\s([A-Z])\s([A-Z])\s([A-Z])\s([A-Z])\s?/,' \1\2\3\4\5 ')
> works for 4 Letters. (Analogue for any other fix number.) But I just
> can't figure out how to match any number and how to don't match any
> letters of surrounding words.
>
> "A B C D E" -> "ABCDE"
> "a b c d" -> "abcd"
> "hello A B C test" -> "hello ABC test"


This is not a terribly well-defined problem. Should "A b r i d g e"
be interpreted as "Abridge" or "A bridge" ?

Assuming the former, you might want to try something along these
lines :

while (s/(\s|@|^)(\w)(\s)(\w)(\s|@|$)/\1\2@\4\5/g)
;
s/@//g;

where "@" is any string or character that does not appear in
your data (and does not match \s). Dealing with punctuation is
left as an exercise.

--
André Majorel
You measure democracy by the freedom it gives its dissidents, not
the freedom it gives its assimilated conformists -- Abbie Hoffman.
Reply With Quote
Reply

Thread Tools
Display Modes



All times are GMT -4. The time now is 04:21 PM.