A View from the Keyboard: December 2015

Saturday 26 December 2015

Weekly Python #11 - Duck Typing

Duck Typing

If it walks like a duck, and quacks like a duck ....

If you are new to Python you may well have heard of Duck Typing but you might not be too sure of what it means. In this article we will explore Duck Typing with one of the most powerful concepts in Python : Iterators, a very powerful example of duck typing.

An Iterator is any object which implements functionality to allow the user to move through a collection of data one item at a time. Iterator behaviours are both simple to use and implement. Many parts of the python standard library implement iterators : strings, lists, dictionaries, tuples, generators, and even files.

Because every iterator shares the same behaviour, then your application can write one function which will work with every single iterator - no matter what type it is, and your application can implement it's own iterators, and the same functions which work on standard library iterators, will work the same way on your iterator too.

That is the theory, lets see something in practice :

>>> def dup( iterator ):
...     "A Generator which will duplicate each item in the given iterator"
...     for item in iterator:
...         yield item
...         yield item

The function above will take any iterator, and will return a generator which yields each item in the original duplicated. This function relies on Duck Typing: it needs whatever is passed in as the iterator argument to be implemented as an Iterator: It will work on any standard iterator (Note that Iterables (such as Lists, strings, tuples) are also Iterators, where as generator expressions, files are only Iterators).

>>> # A list of numbers
>>> [i for i in dup([0,1,2,3,4,5,6,7,8,9])]
[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9]
>>>
>>> # A String
>>> [i for i in dup("abcdef")]
['a', 'a', 'b', 'b', 'c', 'c', 'd', 'd', 'e', 'e', 'f', 'f']
>>>
>>> # A Tuple
>>> [i for i in dup( (0,1,2,3,4,5,6) )]
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6]
>>>
>>> # A Generator expression
>>> [i for i in dup( (i for i in range(0,7) )]
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6]

This function would also work if you passed in an open file object, and would generate a duplicate for every line in the file, and with a dictionary it will generate a duplicated sequence of keys.

With a language such as C or C++, you will struggle to implement such a generic function, which would operate on strings, vectors, arrays and files, since C is a statically typed language which means you need to declare ahead the arguments that your function, and strings, vectors, arrays are all different types.

With Duck typing, as we have seen it isn't the type that matters, it is the behaviour (in this case data which behaves like an Iterable). In general you don't even worry about testing that the arguments are the correct/expected type (let the lower levels worry about raising exceptions and cascading them upwards); if your code is using hasattr or isinstance to check the type or behaviour exists before you use it, then you aren't utilizing the full power of Duck-Typing.

Custom Errors

You might consider that if you need to raise a custom error rather than use the exception generated by the lower level code, then the right thing to do is to check that the attributes have the right behaviour before you use them :

>>> def dup( iterator ):
...     "A Generator which will duplicate each item in the given iterator"
...      if not hasattr(iterator,"__iter__"):
...           raise TypeError("You can only duplicate items from an iterator")
...
...      for item in iterator:
...          yield item
...          yield item

Although this will work, it is not the best use of the language, and not very Pythonic (i.e. best practice for Python code). A Better solution is to use the power of Duck Typing, and exception handling.

>>> def dup( iterator ):
...     "A Generator which will duplicate each item in the given iterator"
...      try:
...          for item in iterator:
...              yield item
...              yield item
...      except TypeError:
...          raise TypeError("You can only duplicate items from an iterator")

Not only is this version best practice (if you really insist on a custom Exception message), but it will be quicker as it wont need to execute the hasattr test each time the function is called.

using isinstance or hasattr

There is one case where using isinstance or hasattr is completely valid thing to do (and even recommended) and that is when your code is taking data as an argument into a class __init__ which needs to be an expected type, but your code doesn't use that data until much later (as it is always better to report errors as soon as possible).

Tuesday 22 December 2015

Weekly Python #10 - Employing disabled workers

Employing disabled workers

This article is a short post which has nothing to do with Python, but is about my other area of interest : Disability and Employment rights.

In most modern jurisdictions (certainly true in the EU), it is an obligation of employees to treat disabled workers fairly and not show any bias due to their disability. The definition of disability is pretty wide; for instance in the UK it is generally defined as anyone with a recognized chronic (i.e. long term) condition.

For an employer with worker who becomes disabled, it is responsibility of the employer to make reasonable adjustments to the job to ensure that the worker can still do their role, but what does the obligation to fairness mean when you are looking to take on a brand new employee.

During both CV and interview stage a prospective employee should concentrate on the skills of the applicants - and what they can bring to the organization, rather than what they might not be able to do based on their disability. All relevant skills and experience should be seen as positives for that candidate, and training needs are clearly costs for employing that candidate. My understanding of the law ^[1] suggests that if there are extra costs associated with employing a disabled candidate that these should not be counted against that person unless they would be unreasonable for the company to bear.

I have an example recently which is worth recounting :

My current employer (I am not giving any names) has a strategy to try to bring everyone together into a number of key locations (in order to improve collaboration, and reduce infrastructure and building costs). The strategy has clearly stated exceptions for employees who are disabled or otherwise unable to be based in one of these locations. I recently had a telephone interview for a design role: being a member of a team completing software and network designs.

During the interview though, I wasn't really asked about my skills or experience, or what I could bring to the role. The interviewer focused almost entirely on the company's buildings strategy and how me working from home was incompatible with that strategy, and that it was "a waste of time" to talk to me any further. The role had no operational need for everyone to be be based in the same building; no secure networks, no key customers to work with daily. There was no rationale given for why being in the office was essential to the role, although I can understand that one person working from home will impact how the team works together (most of which could be overcome with an appropriate use of technology). The only "explanation" given was the building strategy, despite it not being mandatory for all employees.

I am not suggesting that every role can be executed efficiently by everyone (a person in a wheelchair would probably find it difficult to be a scaffold rigger), but there are many cases where a disabled person would be able to do a given role just as efficiently as a non-disabled person, and therefore the critical decision should be whether that person has the right mix of technical, business and personal skills to do the role.

I accept that there is sometimes a fine line between two candidates and I am not suggesting that an employer should always choose the disabled candidate, but the employer should be looking at a disability as something to adjust to, rather than something which prevents the job from being done.

[1] I am not an employment lawyer, or a trained recruiter. I am a s/w designer and developer with 27 years experience of systems and network experience, and 2 years experience of living with a disability.

Sunday 13 December 2015

Weekly Python #9 - Adding to a list - only one way ?

Adding to a list

Is there really only one way ?

It is one of the principles of python that "There should be one-- and preferably only one --obvious way to do it." You will notice though when it comes to many of the parts of the standard language, that there is actually many different ways to do somethings - for instance adding a single item to a list.

>>> a = [0,1,2,3,4,5,6,7,8,9,10]
>>> a.append(11)
>>> a += [12]
>>> a = a + [13]
>>> a.extend([14])
>>> a
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]

All of these appear to do the same things, but behind the scenes they all do different things - and have different advantages.

a.append

Using the append method is by far the most obvious method : It is very readable (and readability is key when writing software). It is implemented a single method call on the list method, and once the append is complete there is no return value. it all makes for a lightning quick execution

$ python -m timeit -n 1000000 -s 'a = [i for i in range(1000)]' 'a.append(1001)'
1000000 loops, best of 3: 0.0452 usec per loop

0.0452 micro seconds (a micro second is one millionth of a second) to append 1,000,000 elements to a list.

Using +=

For some people this idiom is just as readable as an append but it is actually a bit less efficient, as there a few things that have to be done before the item can be appended to the list.

the element (12 in our case) has to be made into a list
The __iadd__ method is called
The __iadd__ method does a type check to ensure that the argument is a list

$ python -m timeit -n 1000000 -s 'a = [i for i in range(1000)]' 'a += [1001]'
1000000 loops, best of 3: 0.081 usec per loop

Using the += method takes nearly 2 times longer than using append for adding single elements.

It would probably be far more efficient to use this method to add another existing list (rather than looping around the list and appending each element) - we will look at this later

Using +

A first glance doing a = a + [13] is going to be similar in performance to the previous statement a += [12] but there is a lot more going under the skin here.

the element (12 in our case) has to be made into a list
The __add__ method is called
The __add__ method does a type check to ensure that the argument is a list
The __add__ method creates a brand new empty list to hold the result of the addition, and will need to copy the original a list and argument list into this new empty list. The name a will be bound to the new list returned by the __add__ method.

These difference go some way to explaining to performance differences :

$ python -m timeit -n 1000000 -s 'a = [i for i in range(1000)]' 'a = a + [1001]'
10000 loops, best of 3: 25.4 usec per loop

So using a straight list addition as above - it takes over 500 times longer to add a single element to a list; and it can't be recommended for use. It will also use more memory (as it creates this brand new list and fills it, before the old one is unbound), and this could also be significant for your application.

Note

This significant performance impact seems to not be connected to the size of the original list - even adding a single element to an empty list takes a similar amount of time, and creation of a new empty list doesn't seem to be the issue either. At the time of writing the performance impact seems to be caused by the re-binding of the new list to one of the names in the expression.

The only advantage is that if you have another name bound to the original list, then that original list wont be changed - but there are better ways to ensure that this happens - by taking a copy of the original list before you change it :

>>> a = [0,1,2,3,4,5,6,7,8,9,10]
>>> b = a[:]            # Take an explicit shallow copy of a
>>> a.append(11)        # Use append - which we know is a lot quicker
>>> a, b
[0,1,2,3,4,5,6,7,8,9,10,11], [0,1,2,3,4,5,6,7,8,9,10]
>>>

extend

The extend method is intended to be used to combine two lists together; a.extend(x) is equivalent to a += x (where a and x are both lists). We can also use extend to add single elements to a list :

$ python -m timeit -s 'a = [i for i in range(1000)]' 'a.extend([1001])'
10000000 loops, best of 3: 0.0924 usec per loop

So extend is approximately equivalent to using a += x - as expected.

Adding multiple elements efficiently

So far we have looked at adding single elements to a list, but often our code needs combine two lists; so it would be useful to look at the relative performance of a += x vs a.extend(x} where x is a list.

$ python -m timeit -s 'a = [i for i in range(1000)]' -s 'b=[i for i in range(1001,2001)]' 'a += b'
100000 loops, best of 3: 2.25 usec per loop

$ python -m timeit -s 'a = [i for i in range(1000)]' -s 'b=[i for i in range(1001,2001)]' 'a.extend(b)'
100000 loops, best of 3: 2.3 usec per loop

So within a margin of error the two are roughly equivalent. Just out of interest how much slower is appending each element :

$ python -m timeit -s 'a = [i for i in range(1000)]' -s 'b=[i for i in range(1001,2001)]' 
'for e in b:' '   a.append(e)'
10000 loops, best of 3: 47.2 usec per loop

Appending each element of a 1000 element list is 23 times slower than adding the two lists together, and it is not unreasonable to expect that the gap between the two approaches will increase as the lists get longer.

Conclusion

Although (on the face of things) there are 4 different methods for adding an element to a list, and at least 3 for combining lists together, there is really only one way that you should use; append for single items and extend of combining lists. The guiding "One obvious way" principle espoused by Python experts still holds, although sometimes it isn't obvious which way is the obvious one.

Friday 4 December 2015

Python Weekly #8 - I know Python - now what ?

I know Python - so now what ?

In this article I wanted to write a few words about a question that I have seen regularly on a number of different support forums. The question is generally of the form given in the title : New programmers have adsorbed the syntax and control structures of Python, and now want to know what to do with it.

Have you really learned Python ?

The first thing that springs to mind is whether the person asking the question has really learned the language at all. The syntax of Python is relatively simple, and the basic data types (str, list, int, bool etc) are relatively easy to learn and intuitive, but Python is so much more than that - for instance - do you know what this does before trying it :

>>> import antigravity

This is just one (albeit just a bit of fun) of over 100 different importable modules that come delivered with python (The Standard Library) that allows you to do wonderful things with it. Beyond that there is the world of pyPi (python Package index) - a world of over 70,000 modules and packages developed across the world and freely available for you to use in a few easy steps.

These package provide functionality that far extends the basic syntax, and it is all reusable for free.

Now I am not suggest you should know everyone of the 70,000+ modules on pyPi or all the details even of the 100+ modules that form the Standard Library, but you should be aware that they exist, how to find them, how to read the documentation, and at least have a working knowledge of some of the main ones (See my previous article Batteries included - 5 Standard Library Packages everyone should know). In the case of the pyPi contributions you should also know how to install these to your local environment (Hint : you will need to use pip).

Practice, Practice, Practice

The best way to learn anything and get proficient at it is to continue to practice. There are a number of ways to hone your skills in Python and these include :

Project Euler : "a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems". A heavy mathematical bent, but well worth trying some to hone your skills, especially when it comes to writing efficient and performant code. There are 526 Project Euler problems created at the time of writing.
The Python Challenge : Billed as "The first programming riddle on the internet", this is a chain of problems which are solved by deciphering a set of clues, and then writing a short scripts to calculate/derive the answer - which is then the form of the URL for the next riddle". For the most part they rely on the standard library. There are 33 steps to the riddle currently.
Python programming challenges : A small set of 7 programming challenges set to support the UK GCSE Computer Programming qualification (GCSE is aimed at students aged around 16).
CodingBat Python problems : "CodingBat is a free site of live coding problems to build coding skill in Python created by who is computer science lecturer at Stanford. The coding problems give immediate feedback, so it's an opportunity to practice and solidify understanding of the concepts.

These (and similar sites) allow you to practice without the difficulty of trying to come up with your own ideas - and if you had your own "Big Idea" then you wouldn't be reading this article.

What are you interested in ?

If you really want to get moving on your own big idea, then my advice is to look around you - is there something in your life, that you do that would benefit from a computer program to help you do it. This could be something from your work environment or your home life (Caution if you are going to tackle a work based problem - check that you wont be in breach of any work based security rules before you start. I would hate for you to loose your job because you followed my advice, identified a work problem and installed Python on your work PC/laptop without permission).

The reason behind this advice is simple: If you need something, (or even think you need something) it is far more likely that you will work on it until it is complete (or at least good enough). If you start on something that you don't need, or you are not enthused about, you will be far less likely to complete it.

I will give you an example based on my first application developed in 2008. I am an amateur photographer, and I use flickr to store the majority of my pictures. At the time the flickr uploader on the website was clunky, slow, unreliable and required a lot of work post upload to add tags, titles, to the pictures, as well needing to add pictures to folders etc. It was not suprising then that my first project was to write my own PC based uploader, that allowed me to take groups of pictures - add them into folders, add tags, titles, descriptions, change permissions etc. and then to upload them to flickr at a single click. This application required me to work with a number of different libraries and packages to complete the functionality including :

pyGTK - Python bindings for the Gnome Toolkit - A fulll featured GUI framework
PIL - Python Image Library - an image manipulation toolset
flickrapi - A Python interface onto the http based Flickr API.

I completed this - and used it "in anger" for several years (fixing bugs and adding more features) as I went along until in around 2012 the web site based uploader was re-developed and is now far better than it was. My pyFlickr application although complete is now never used.

I have to admit that my development directory is littered with programming projects which I have started but never completed - mainly because at the end of the day I didn't need it.

Get Involved

If you haven't got a project that you need to write, but you still want to keep your Python skills fresh, then look around for a project that others have started but that you can work on (by it's very nature much of the work published on pyPi and other places is Open Source, and many projects are looking for collaborators, either to fix bugs or assist in developing new features). Find a project that interests you and contact the authors.

Many of the bigger projects will require some proof of experience and knowledge of python - a project you have completed yourself - or contributions to other projects. Nearly all projects will require you to have a good working knowledge of one of the python testing frameworks and knowledge of how to use the chosen method of change control, and I would suggest that you should be proficient in being able to profile your code in terms of performance and memory usage, and in being able to measure metrics like code coverage etc.