Skip to content

1. Reference

Kamal Banga edited this page Jul 22, 2019 · 126 revisions
Table of Contents
  • Little Things
  • Files
  • Lists
  • Sets
  • Dictionaries
  • Iterators
  • Little Things

    Numerics

    • 1e2 is short-hand notation for 100.0. Similarly, 1e3 is 1000.0, 1e-4 is 0.0001 and 8e4 is 80000.0.
    • Enhance readability of large numbers by using underscores in numbers. 1_000 is 1000 and 1_000_000 is the same as int(1e6). Make string in comma-separated form: format(n, ',') python-3.6
    • / is true division and // is floor division.
      38 / 5 # = 7.6     '/' is true division operator and returns float
      38 // 5 # = 7      '//' is floor division operator and returns int

    Data types

    • In Python, there is no char type like in C/C++/Java. There's only strings; hence ' can be interchangeably used with ". Hence all stringy methods like isdigit work for strings having 1 or more characters.
    • Assignment never copies data. a = b = []; b.append(3) sets a to be [3] too. a = b = [] is unlike a, b = [], []. Use copy or deepcopy for copying.
    • We can exchange variables using tuple unpacking: (b, a) = (a, b). A tuple (a, b) can also be written as a, b, hence, simplest way to exchange variables is b, a = a, b. 👏
    • An iterator is an iterable but an iterable may or may not be iterator. List is iterable not iterator. Looping over an iterator exhausts it and makes it empty. ? A book full of pages is an iterable and the bookmark is an iterator.

    Tricks

    • Given a list, nums = [1, 2, 3, 4, 5], nums[::-1] reverses the list. The way slicing works is [start:stop:jump]
    • Remove duplicates from a list: list(set([1, 8, 4, 5, 5, 8, 1])) gives [8, 1, 4, 5]. Similarly, to count number of unique elements in a list: len(set(some_list))
    • Remove duplicates maintaining the insertion order: list(dict.fromkeys([1, 8, 4, 5, 5, 8, 1])) gives [1, 8, 4, 5]. Since python-3.7, dictionary retains the insertion order.

    Syntax Quirks

    • Python's libraries like json, pickle usually have two methods for loading, load and loads. load is used for loading a file while loads is used for a string of bytes. The s in loads stands for string.
    • == checks equality while is operator compares identities.
      a = b = [1, 2, 3] # a and b refer to same objects
      a == b # True
      a is b # True
      
      c, d = [1, 2, 3], [1, 2, 3] # c and d refer to different objects whose value is same
      c == d # True
      c is d # False; to inspect further, print id(c) and id(d)
    • f-strings (formatted string literals) are awesome! ⭐️ python-3.6
      age = 30
      age_fstr = f'age = {age}' 
      # age_fstr is 'age = 30'
      first_name, last_name = 'James', 'Bond'
      print(f'The name is {last_name}, {first_name} {last_name}.')

    Extras

    • Add a progress bar to a long-running loop using the module tqdm
      from tqdm import tqdm
      
      for _ in tqdm(range(int(1e8))):
        pass
    • Easter eggs: import this (prints The Zen of Python); import antigravity
    • Use a code formatter, e.g., Black. pip install black and use as black mycode.py
    • Try an autocomplete plugin, e.g., Jedi

    Timeline of major features

    3.0
    • zip, enumerate, reversed become iterators
    • print is a function, no longer a statement
    3.7

    Dictionaries maintain insertion order


    📁 Files

    Read a file line-by-line into list

    with open(filepath) as handle: 
      mylist = handle.read().splitlines()

    splitlines returns a list of lines in the string removing the trailing '\n'

    Read few (N) lines of a file

    with open(filepath) as handle:
      head = [next(handle) for _ in range(N)]

    Another way using itertools' islice

    Iterate over file

    If file is too large to fit into memory or would hog the RAM, consider iterating line-by-line

    with open(filepath) as handle:
      for line in handle:
        print(line) # do something

    File mode codes

    File mode Code
    Read (default) r
    Write w
    Append a

    Lists

    • Get the last element: some_list[-1]
      [1, 5, 3, 2][-1] returns 2

    • Check if an element exists in the list: in operator
      4 in [1, 5, 3, 5, 2] returns False. Has linear-time complexity

    • Count occurrences of an item in list: count(item)
      [1, 5, 3, 5, 2].count(5) returns 2
      To count occurrences of multiple items in a list, use collections.Counter since count will perform multiple passes and would degrade performance.

    • Find the index of an item: index(item)
      [1, 5, 3, 5, 2].index(5) returns 1
      An index call has linear-time complexity 🤔, returns only the index of the first match, and throws ValueError ⛔️ if element is not present in list.

    • To add an element, use append, and to add all elements of a list, use extend.

      x = [1, 3, 5]
      x.append(4) 
      # x is [1, 3, 5, 4]
      
      x.extend([6, 9])
      # x is [1, 3, 5, 4, 6, 9]
    • Flattening a nested list; the syntax is like nested for loops.

      flat_list = [item for sublist in original_list for item in sublist]

      is equivalent to

      flat_list = []
      
      for sublist in original_list:
          for item in sublist:
            flat_list.append(item)
    • List comprehension with an if condition has the form

      vals = [expression 
          for value in collection 
          if condition]

      is equivalent to

      vals = []
      for value in collection:
          if condition:
              vals.append(expression)

    Sets

    Set is an unordered collection of unique elements.

    Initialization

    colors = set() # initialise an empty set
    colors = {'red', 'blue', 'green'} # initialise non-empty set; is called set-literal notation
    
    colors.add('yellow') # add method adds an element to set
    colors.update(['magenta', 'violet']) # update method adds multiple elements

    ⚠️ colors = {} initialises an empty dictionary not a set

    Operations on sets

    a = s1 | s2     # Union of s1 and s2 
    b = s1 & s2     # Intersection of s1 and s2 
    c = s1s2     # Set difference (items in s1, but not in s2)
    d = s1 ^ s2     # Symmetric difference (items in s1 or s2, but not both)
    e = s1 <= s2    # True if s1 is a subset of s2

    If however, one needs to work with many sets, we can use set functions

    list_of_sets = [s1, s2, ..., sn]
    a = set.union(*list_of_sets)                # s1 | s2 | ... | sn
    b = set.intersection(*list_of_sets)         # s1 & s2 & ... & sn 
    c = set.difference(*list_of_sets)           # s1 - s2 - ... - sn
    d = set.symmetric_difference(s1, s2)        # symmetric difference is not defined for a list of sets
    e = set.issubset(s1, s2)                    # True if s1 is a subset of s2
    • To summarize, s1 & s2 is same as s1.intersection(s2) and set.intersection(s1, s2).
    • Sets are partially ordered, because, {1, 2} <= {1, 3} is False and so is {1, 3} <= {1, 2}. So, there's no order between {1, 2} and {1, 3}; hence, {1} <= {1, 2}, {1, 3} <= {1, 2, 3}. (Here ≤ is more like ⊆)

    Set comprehension

    prime_pairs = {(x, x+2) for x in range(2,100) if is_prime(x) and is_prime(x+2)}

    Dictionaries

    Dictionary comprehension

    stock_prices = {'ACME': 45.2, 'AAPL': 612.7, 'IBM': 205.5, 'HPQ': 37.2, 'FB': 10.7}
    highvalue_stocks = { key:value for key, value in prices.items() if value > 200 }

    Construct a dictionary from pairs

    names = ['raymond', 'rachel', 'matthew']
    colors = ['red', 'green', 'blue']
    
    d = dict(zip(names, colors))
    # d is {'matthew': 'blue', 'rachel': 'green', 'raymond': 'red'}

    Looping over dictionary

    1. Looping over keys
    for key in some_dict:
      # do something
    1. Looping over key-value pairs
    for key, value in some_dict.items():
      # do something
    1. Inverting a dictionary: 2 ways
    inverted1 = {v: k for k, v in d.items()}      # dictionary comprehension
    inverted2 = dict(zip(d.values(), d.keys()))   # dictionary constructor

    Booleans

    any and all

    any will return True when at least one of the elements is Truthy. all will return True when all the elements are Truthy.

    any(l == 't' for l in 'python') # True
    all(l == 't' for l in 'python') # False

    shortcircuit

    Iterable any all
    All Truthy values True True
    All Falsy values False False
    At least one Truthy value and at least one Falsy True False
    Empty Iterable False True

    * and **

    Commonly referred to as *args and **kwargs (args for arguments and kwargs for keyword arguments),

    • * is the iterable unpacking operator
    • ** is the dictionary unpacking operator
    >>> *[1]
    SyntaxError: can't use starred expression here
    >>> *[1],
    (1,)
    >>> *[1], 2
    (1, 2)
    >>> [*[1, 2, 3]]
    [1, 2, 3]
    >>> [*(1, 2, 3)]
    [1, 2, 3]
    >>> {*range(4), 4, *(5, 6, 7)}
    {0, 1, 2, 3, 4, 5, 6, 7}
    >>> {**{'a': 1, 'c': 3}, **{'b': 2, 'd': 4}}
    {'a': 1, 'c': 3, 'b': 2, 'd': 4}

    As shown above; a trick to merge two dictionaries d1 and d2: {**d1, **d2} python-3.5

    * helps us write a function that accepts any number of function arguments

    def avg(first, *rest):
        return (first + sum(rest))/(1 + len(rest))

    More


    Functions

    Default arguments

    def func(l=5):
        return l**2

    Collections

    Named tuples

    from collections import namedtuple
    Point = namedtuple('Point', 'x y')
    pt1 = Point(1.0, 5.0)
    pt2 = Point(2.5, 1.5)
    
    line_length_squared = (pt1.x-pt2.x)**2 + (pt1.y-pt2.y)**2

    Counter


    Strings

    import string
    string.ascii_lowercase # 'abcdefghijklmnopqrstuvwxyz'
    string.punctuation # '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

    Split strings

    >>> fruits = ' apple banana   mango'
    >>> fruits.split()
    ['apple', 'banana', 'mango']
    
    >>> fruits = 'apple; banana,  mango'
    >>> fruits.split()
    ['apple;', 'banana,', 'mango']

    split() splits on whitespace by default and ignores multiple whitespace. If we have multiple delimiters, we could use regular expressions.

    >>> import re
    >>> fruits = 'apple; banana,  mango'
    >>> re.split(r'[,;]\s*', fruits)
    ['apple', 'banana', 'mango']

    startswith

    >>> url = 'http://google.com'
    >>> url[:4] == 'http'
    True
    >>> url.startswith('http') # more readable
    True
    >>> filename = 'spam.txt'
    >>> filename.endswith('.txt')
    True

    Profiling

    python -m cProfile -o output.prof myscript.py
    snakeviz output.prof
    

    Sampling

    import random
    random.sample(population, SAMPLE_SIZE)

    Return a new SAMPLE_SIZE-length list of unique elements chosen from the population sequence or set. Used for random sampling without replacement.

    random.sample(my_set, 5) returns list of 5 unique elements random sampled from my_set. Useful for introspecting a set.


    🙈 Ignore Stuff: The Underscore _

    1. If you need to run a loop n times and there's no need of a loop variable
    for _ in range(10):
      # do something

    Go programming language likes this idea so much that it won't compile a code having an unused variable

    1. Ignore a value while unpacking an iterable
    name, _, phone_number = ('riyaz', 'bangalore', '9393939393')

    Unpacking iterables: The *

    user_record = ('Dave', '[email protected]', '773-555-1212', '847-555-1212')
    name, email, *phone_numbers = user_record
    # name is 'Dave', email is '[email protected]', phone_numbers is "['773-555-1212', '847-555-1212']"

    To ignore last elements (here, phone_numbers)

    name, email, *_ = user_record

    Iterators

    • An iterator is any object whose class has a __next__ method and an __iter__ method that does return self.
    • It has no __prev__, no __len__ and so on; only __next__.
    • Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions.
    • An iterator method doesn't have len function. len(i for i in range(10)) gives

    TypeError: object of type 'generator' has no len()

    • Length of an iterator is unknown until you iterate through it.
    • It is possibly infinite, for e.g., itertools.count() is an infinite iterator of whole numbers.
    • You can only iterate once, because iterating an iterator empties it.
    • One trick for finding length is sum(1 for _ in range(10)). The obvious way is to make it list: len([1 for _ in range(10)])
    import random
    
    def gen(n):
        for i in range(n):
            if random.randint(0, 1) == 0:
                yield i
    
    iterator = gen(10) # it's length is unknown until iterated

    Manually iterating an iterator 👇

    r = iter([1, 2, 5, 9])
    try:
        while True:
            print(next(r), end=' ')
    except StopIteration:
    pass

    prints 1 2 5 9

    Generator Expressions

    nums = [1, 2, 3, 4, 5]
    s = sum(x * x for x in nums)

    No intermediate list is created, hence it's great for very large list of numbers 🏃


    Catching Exceptions

    try:
      doSomething()
    except: 
      pass
    try:
      doSomething()
    except Exception as e: 
      print(e)

    The first one will also catch KeyboardInterrupt, SystemExit etc., which are derived directly from exceptions.BaseException, not exceptions.Exception.


    Regex


    Decorators

    The function decorator syntax:

    @decorator
    def F():
      ...

    is equivalent to

    def F():
      ...
    
    F = decorator(F)

    Example usage:

    def uppercase(func):
      def wrapper(*args, **kwargs):
        return func(*args, **kwargs).upper()
      return wrapper
    
    @uppercase
    def greet(name):
      return f'Hello {name}'

    Here, greet is renamed to

    def wrapper(name):
      return greet(name).upper()

    Save objects to disk/Serialization

    Pickle is Python's native binary serialization tool; it converts in-memory Python objects to/from bytestreams

    import pickle
    
    obj = ('serialization & de-serialization', [1, 4, 16], {1: 'hello', 2: 'world'})
    
    # save to disk / serialize
    with open('filename.pickle', 'wb') as handle:
        pickle.dump(obj, handle)
    
    # read from disk / de-serialize
    with open('filename.pickle', 'rb') as handle:
        obj2 = pickle.load(handle)
    
    print(obj == obj2)

    Pickle gives us a trick to perform a deepcopy of nested objects:

    deepcopy = lambda x: pickle.loads(pickle.dumps(x))

    dumps and loads are used for dumping and loading a string instead of a file; as stated above, s stands for string.


    Pretty print

    import pprint as pp
    animals = [{'animal': 'dog', 'legs': 4, 'breeds': ['Border Collie', 'Pit Bull', 'Huskie']}, {'animal': 'cat', 'legs': 4, 'breeds': ['Siamese', 'Persian', 'Sphynx']}]
    pp.pprint(animals, width=1)

    Python Modules

    Command line utilities

    • python -m json.tool
      Pretty print JSON in terminal. Try echo '{"a": 1, "b": 2}' | python -m json.tool

    • python -m http.server
      Starts a simple web server

    • python -m timeit
      Measure time taken for a snippet to run. Try

    $ python -m timeit "1 + 2"
    100000000 loops, best of 3: 0.0162 usec per loop
    $ python -m timeit "sum(x*x for x in range(int(1e5)))"
    100 loops, best of 3: 9.79 msec per loop
    • python3.4 -m pip install Install a library in a version-specific Python environment, e.g. python3.4 -m pip install requests

    Reserved keywords

    False class finally is return
    None continue for lambda try
    True def from nonlocal while
    and del global not with
    as elif if or yield
    assert else import pass
    break except in raise