-
Notifications
You must be signed in to change notification settings - Fork 0
2. Idioms
if x <= y and y <= z:
print('ok')
Better
if x <= y <= z:
# do something
Examples
>>> 2.999 < 3 == 3.0
True
>>> 1e5 >= 100 == 1e2 <= 1000
True
value = 0
if cond:
value = 1
Better
value = 1 if cond else 0
Intuitively it's like how we write in maths, f(x) = |x| = x if x > 0 else -x
if x:
y = x
else:
y = 'fallback'
Better: use or
y = x or 'fallback'
or
returns the first operand if the first operand evaluates to True, and the second operand if the first operand evaluates to False. It is similar to Null coalescing operator. Examples:
'' or 'default' # 'default'
0 or 1 # 1
None or 0 # 0
[] or [3] # [3]
None or [] # []
False or 0 # 0
x or 'fallback'
is same as x if x else 'fallback'
if city == 'Nairobi' or city == 'Kampala' or city == 'Lagos':
found = True
Better: use in
keyword
city = 'Nairobi'
found = city in {'Nairobi', 'Kampala', 'Lagos'}
Here we used a set of cities, though we could also have used
- a tuple:
('Nairobi', 'Kampala', 'Lagos')
- a list:
['Nairobi', 'Kampala', 'Lagos']
Set will be advantageous when number of cities is very large. In summary, use in
where possible:
-
Contains:
if x in items
-
Iteration:
for x in items
strings = ['ab', 'cd', 'ef']
cs = ''
for s in strings:
cs += s
# cs is 'abcdef'
Above code uses the Shlemiel the painter’s algorithm and is accidentaly quadratic 👎. Instead use join
' '.join(strings)
for i in range(len(my_list)):
print(my_list[i])
Better 👇
for elem in my_list:
print(elem)
for i in range(len(my_list)):
print(i, my_list[i])
Better: use enumerate
for idx, element in enumerate(my_list):
print (idx, element)
enumerate
returns an iterator
colors = ['red', 'green', 'blue', 'yellow']
for i in range(len(colors)-1, -1, -1):
print(colors[i])
Better: use slicing [::-1]
for color in colors[::-1]:
print(color)
Even Better: use reversed
👌. It returns an iterator.
for color in reversed(colors):
print(color)
Example Problem: Let's say a polynomial 4x^5 + 2 * x^2 - x + 3
is represented by its coefficients (4, 0, 0, 2, -1, 3)
, then given coefficients
as this tuple we need to calculate value of the polynomial at a given x
.
sum(c * x ** idx for idx, c in enumerate(reversed(coefficients)))
names = ['raymond', 'rachel', 'matthew']
colors = ['red', 'green', 'blue', 'yellow']
n = min(len(names), len(colors))
for i in range(n):
print(names[i], '--->', colors[i])
Better: use zip
for name, color in zip(names, colors):
print(name, '--->', color)
zip
too returns an iterator.
- Make (an iterable of) bigrams of items in iterable:
zip(mylist, mylist[1:])
words = 'A girl has no name'.split() bigrams = list(zip(words, words[1:])) # bigrams is [('A', 'girl'), ('girl', 'has'), ('has', 'no'), ('no', 'name')]
-
Transpose/Unzip an iterable of tuples:
zip(*data)
Prerequisite:*args
and**kwargs
data = [(1, 2, 3), (4, 5, 6)] transposed = list(zip(*data)) # transposed is [(1, 4), (2, 5), (3, 6)]
zip(*)
is equivalent to unzip/transposels1 = [1, 2, 3, 4, 5] ls2 = list('abcde') c = list(zip(ls1, ls2)) # c is [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e')] d = list(zip(*c)) # equivalent to unzip # d is [(1, 2, 3, 4, 5), ('a', 'b', 'c', 'd', 'e')]
Summary: The iterators enumerate
, zip
, reversed
are syntax goodies (syntactic sugar) that cover many usual cases to make code more readable and pretty.
color_weights = {'blue': 1, 'green': 2, 'red': 3}
yellow_weight = color_value['yellow'] if 'yellow' in color_weights else -1
Better: use get
yellow_value = color_value.get('yellow', -1)
colors = ['red', 'green', 'red', 'blue', 'green', 'red']
d = {}
for color in colors:
if color not in d:
d[color] = 0
d[color] += 1
# {'blue': 1, 'green': 2, 'red': 3}
Better
d = {}
for color in colors:
d[color] = d.get(color, 0) + 1
Use collections 💪
from collections import Counter
Counter(colors)
Use defaultdict
Let's simulate an experiment to shuffle 'n' cards each with a unique label in 0...n-1, and then check if any kth card's label is k.
We will use sample
function from random
module for that. sample
is used for sampling with replacement; sample(range(n), n)
is equivalent to shuffling the list 0...n-1.
from random import sample
idx_labels = enumerate(sample(range(n), n))
To proceed with the experiment:
for idx, label in idx_labels:
if idx == label:
print(True)
print(False)
Better: use any
outcome = any(idx == label for idx, label in idx_labels)
print(outcome)
We could also have used a list instead of a generator: any([idx == label for idx, label in idx_labels])
, but obviously generator-expression used above is memory-efficient.
def squares(n):
return [i*i for i in range(n)]
def even(iter):
return [i for i in iter if i%2 == 0]
even_squares = even(squares(n))
Better: use yield
def squares(n):
for i in range(n):
yield i*i
def even(iter):
for i in iter:
if i % 2 == 0:
yield i
list(even(squares(10)))
Using yield
instead of return
makes an otherwise normal function a generator. It is an easier way to create an iterator than to define __iter__
and __next__
methods. It lets us interleave two functions and transfer control and hence this function is also called coroutine.
Repeatedly transferring control to a generator is way faster than calling a function since a stack frame is saved in case of generator.
Using yield from
def countdown(n):
yield from range(n, 0, -1)
>>> list(countdown(5))
[5, 4, 3, 2, 1]
foo = open('/tmp/foo', 'w')
try:
foo.write('sometext')
finally:
foo.close()
👆code is equivalent to 👇. Use with
with open('/tmp/foo', 'w') as handle:
handle.write('sometext')
squares = list(map(lambda x: x**2, range(1,10)))
even_squares = list(map(lambda x: x**2, filter(lambda x: x % 2 == 0, range(1,10))))
List comprehensions 👇are more readable and pythonic! 🤘
squares = [x**2 for x in range(1,10)]
even_squares = [x**2 for x in range(1,10) if x % 2 == 0]
Specialized tools usually outperform or are more accurate than general purpose tools
-
math.sqrt(x)
is more accurate thanx ** 0.5
-
math.log2()
is exact for powers of twofrom math import log, log2 all(log(2 ** x, 2) == x for x in range(100)) # False all(log2(2 ** x) == x for x in range(100)) # True
- In PySpark,
key_value_rdd.countByKey()
is way faster thankey_value_rdd.groupBy().mapValues(len).collect()
because of less shuffling involved.