There is an interesting topic in the python.list about comparing two strings deiregarding whitespace without re. As the discussion went on, there cames two different requirements:
- normalize whitespace That is, “a\n b c” == “a b \n c” but “ab c” <> “a bc”
- totally ignore withespace Both “a\n b c” == “a b \n c” and “ab c” == “a bc”
The first solution looks like this
NULL = string.maketrans("","")
WHITE = string.whitespace
def compare(a,b):
“”"Compare two strings, disregarding whitespace -> bool”"”
return a.translate(NULL, WHITE) == b.translate(NULL, WHITE)
This one first make a do-nothing translateion table and utilize string.translate to delete all the whitespaces. So this meets requirement (2).
However this won’t work with unicode string. This is because the plain text translate take 2 arguments and the first one is a 256 char long “translation table” returned by string.maketrans but the unicode translate takes only 1 argument and it is a dict.
Here is how the unicode translate works:
u’baynaynay’
or use it for delete:
u’bnn’
So for requirement (2), it looks like
def compare(a,b):
“”"Compare two strings, disregarding whitespace -> bool”"”
return a.translate(nowhite) == b.translate(nowhite)
It’s annoying to have these two solution for plain text and unicode. This might be why translate is marked “obsolete” in the Python documents.
So someone made a wrapper with the isinstance check:
NULL = string.maketrans("","")
WHITE = string.whitespace
NO_WHITE_MAP = dict.fromkeys(ord(c) for c in WHITE)
def compare(a,b):
"""Compare two basestrings, disregarding whitespace -> bool"""
if isinstance(a, unicode):
astrip = a.translate(NO_WHITE_MAP)
else:
astrip = a.translate(NULL, WHITE)
if isinstance(b, unicode):
bstrip = b.translate(NO_WHITE_MAP)
else:
bstrip = b.translate(NULL, WHITE)
return astrip == bstrip
And for requirement (1), there is a much clever and easier way:
return a.split() == b.splite()
This is because split() does all the normalize thing and splite the string with whitespace. Also this works with Unicode too.
Sure the Unicode translate can work for this requirement:
def compare(a,b):
“”"Compare two strings, disregarding whitespace -> bool”"”
return a.translate(nowhite) == b.translate(nowhite)
The only change is how the dict is formed: by default dict.fromkeys will use None as the value so the whitespace is deleted, but we could replace them with a single ” ” so it is normalized.
And finially, the RE way, of cause works with Unicode:
- totally ignore::
def compare(a, b):
"""Compare two basestrings, disregarding whitespace -> bool"""
return re.sub("\s*", "", a) == re.sub("\s*", "", b)
- normalize::
"""Compare two basestrings, normalizing whitespace -> bool"""
return re.sub("\s*", " ", a) == re.sub("\s*", " ", b)
Post a Comment
You could use <code type="name"> to get your code colorized