The 2022 edition of the Europython convention took place in Dublin between 11th and 17th of July, and I had the opportunity to spend the week at this event. This was the first time I attended a convention like this, and I really enjoyed the talks and meeting other people at this event. I decided to write this blog post to sum up the most interesting things I learned from this event and share it with the community.
More on the convention
This was the first time in two years that Europython was an in person event. It took place in the Convention Center of Dublin, Ireland, and gathered around 1200 python enthusiasts from around the world. There were more than a hundred different talks, often 5 or 6 at a time. I obviously couldn’t attend all of them, and had to choose the ones that sounded the more interesting to me.
I struggled to find a way to organize all the things I learned from this week of conferences. In the end, I decided to divide the talks into categories, and write a section for each. I tried my best to find links to the work of the original speakers, and all the credit for the information in this article belongs to them.
Speakers are credited at the beginning of each section, feel free to check their work when it is available.
Security
- CPython bugs & risky features by disconnect3D
- Writing secure code in Python by Yan Orestes
Python is secure most of the time, and if we compare it to C++, there is less work to do as a developer to write secure code. However, we should still be careful how we use Python as bad practices can be problematic.
General advices
CVEs (Common Vulnerabilities and Exposures) are discovered very often, and some of them can be very critical for security. Thankfully, the Python community works hard to fix those issues when they are discovered, and provide regular updates for the Python interpreter and for packages. It then is the job of the software engineers to take care of updating Python and the packages they use.
The versions of Python receive some fixes over time, until they reach their end of life (the end of life of Python releases can be found here). Some methods can also be marked as deprecated in new releases, and thus shouldn’t be used anymore.
The python interpreter
Some of you probably use the python interpreter pretty often by just running the python
command. This opens an interactive prompt where you can write and execute python code one line at a time. disconnect3D showed during his talk that this command may lead to security issues. Indeed, the program will try to fetch some dependencies in the directory where you launched the command, and if one of these dependencies was replaced by a malicious one, the code will be executed. This can be an issue if, as a sysadmin, you open a python interpreter in a folder where users upload files for example.
The issue seems to be fixed in python 3.11, but for earlier versions of python, it is probably a good idea to alias the python
command to python -I
to remove this risky behavior.
Coding advices
When developing in Python, some mistakes can lead to security vulnerabilities. Yan Orestes made us aware of the potential security concerns we should have as software engineers:
- The
eval
method is not secure and shouldn’t be used (especially with an external input), as there is always a way around it. - When loading a data serialized with pickle, the dunder method
__reduce__
is called on the serialized object. This can lead to arbitrary code execution if we are not careful. An alternative is to use a safer serialization format such asjson
. pip
is the most widely used package manager for Python developers. To install a package, pip will first look for a binary, then try to build the package from source, and finally if none of the previous methods worked, it will run thesetup.py
file. This is were things can turn wrong, as any code can be written in this file. Some malicious packages used this method combined with name squatting (the package has a name similar to a widely used package, e.g.:reqeust
instead ofrequest
) to hack into computers. This can be prevented with the--only-binary
and--require-hashes
flags when runningpip install
.- Randomness is a difficult subject in computer science, and Python’s random module is not the best at generating random numbers. Alternatives are the
secrets
module,os.urandom
andrandom.SystemRandom
. - Extracting
.tar.gz
files can be dangerous, and an inspection of the file should be done before extracting it. assert
is not equivalent to anif
statement! Indeedassert
also checks the value of the__debug__
boolean which is set toFalse
when runningpython
with the-O
flag.
A more detailed explanation of those recommendations can be found in the slides for this talk.
Auditing tools
Finally, some code audition tools are available:
Tools
- Bullet proof Python - Property based testing with Hypothesis by Michael Seifert - Workshop
- From pip to poetry - Python (many) ways of packaging and publishing by Vinícius Gubiani Ferreira
- Lint All the Things! by Luke Lee
- Robyn: An async Python web framework with a Rust runtime by Sanskar Jethi
- Automate cleaning code in few easy steps! by Ester
- Why is it slow? Strategies for solving performance problems by Caleb Hattingh
- Automated Refactoring Large Python Codebase by Jimmy Lai
A lot of talks presented Some interesting tools to use with Python. I compiled the ones I found the more useful below.
Hypothesis
Hypothesis is a testing library that automates the choice of parameters to test your functions. Check the following example:
This method will be tested with 100 different inputs, those inputs being either a list of integers or a list of floats. Then, the max function is tested against the last element of the sorted list to check if it indeed returns the maximum value of the list. With Hypothesis, if you are creative enough in your assert statements, you can write testing methods that will cover much more inputs than if you did it manually. Hypothesis also tends to often test edge cases (empty lists, None
values, etc).
Poetry
Although most people use pip as their package manager for Python, there are alternatives that are more recent, complete and secure. One of them is poetry, and it was introduced in several talks during Europython. Here is a summary comparing pip to poetry.
pip | poetry | |
---|---|---|
Beginner friendly | Yes | Yes |
Heavily relies on other tools | Yes | No |
Built-in virtualenv management | No | Yes |
Python 2 support | Yes | Dropping soon |
Manage package and distribution by itself | No | Yes |
Poetry has a number of advantages over pip, and I won’t go in depth on this subject here. Check out the poetry website for more information.
Code checks
Many tools exist in Python for checking the quality of your code. Here are the most used ones:
- Black - Autoformat your Python files.
- Isort - Automatically sort your import statements.
- Flake8 - Get hints about bad code practices, this can be extended with your own lints.
- Mypy - Check the type hints in your project.
All those tools can be run locally using a git precommit hook for example. However, some checks may take to much time to run in a precommit hook, such as mypy
. A good practice is to use all those tools in a CI pipeline on your repository (for example using Github Actions).
If you didn’t use those tools from the beginning of your project and you have a huge codebase to refactor, you may want to automate the refactor. This is possible if you create a script that apply those tools on a few files, then automatically opens a pull request on your favorite code hosting platform. Gitlab and Github do provide API that will let you do this through a script. Then, a reviewer can be assigned to manually check if the changes are correct.
You may also want to check the performances of your Python code. For this purpose, they are multiple tools depending on your needs:
- pytest-profiling: this tool displays a heat graph for your program.
- pyspy: An tool that is external to your project and adds a very low overhead to your program execution as it runs in a separate process.
- Opentelemetry: An instrumentation tool to highlight the performances of distributed systems.
Robyn
Robyn is an async web framework for Python with a Rust runtime. It is currently the only async web framework for Python, and also the fastest framework. It also supports sync methods, and its syntax is very similar to Flask, but there are some key differences between the two. The project is currently under active development and didn’t reach a stable 1.0 version for now.
Python’s internals
- Raise better errors with Exception Groups by Or Chen
- Protocols - Static duck typing for decoupled code by Ran Zvi
- Writing Faster Python 3 by Sebastian Witowski
- How we are making Python 3.11 faster by Mark Shannon
- Multithreaded Python without the GIL by Sam Gross
- Clean Architectures in Python by Leonardo Giordani
Some talks helped me to understand how to write better code in Python and how the compiler works under the hood. The next section will highlight some of them.
Write better Python
Faster code
There are many ways to write code that does what you want, but some methods will be faster than others. Sebastian Witowski shared some improvements that can make your code faster during his talk. Among those:
- The fastest way to remove duplicates from a
list
is to convert it to aset
and back to alist
again. Uselist(dict.fromkeys(DUPLICATES))
if you want to preserve the order of the list. - Generators are memory efficient while lists are speed efficient.
- We sometimes have to choose between asking for permission and asking for forgiveness. This means that we can write two algorithms that have the same behavior but one is checking if it can perform an action and the other is performing the action and handling an exception if it failed. Asking for permission is faster if the condition is not met, but asking for forgiveness is faster if the code doesn’t raise an exception.
- numba is a Just In Time (JIT) compiler for Python that allows for faster runtime in some cases.
- Initializing a new dictionary using
{}
is faster than usingdict()
. - numpy provides a lot of improved methods and data structures. More performances comparison can be found here on GitHub.
Protocols
A feature of Python that is sometimes useful and help achieve better code is Protocols. Take the following code snippet as an example:
The problem here is that you must explicitly inherit the Sized
and Iterable
Abstract Base Classes (ABCs) to register them as subtypes of their parents. This is particularly difficult to do with library types as the type objects may be hidden deep in the implementation of the library. Also, extensive use of ABCs might impose additional runtime costs. Consider the following snippet as a solution:
Protocols allow to avoid inheriting from classes and only rely on the implementation of the required dunder methods. The static type checking passes implicitly.
Building a code architecture
According to an old definition, a clean architecture is something that is useable, maintainable, and beautiful. This is what every developer should try to achieve with their code. Writing code is about controlling a flow of data, and there are several methods to organize this flow in a structured way. The idea behind the clean architecture presented by Leonardo Giordani relies on the separation between the data that is external to our system and the one that is used to compute our business logic. The book Clean architecture in Python will describe it way better than me, feel free to check it out.
Python 3.11
Feature: ExceptionGroups
Error messages started to be improved in the Python interpreter in the recent releases of Python. This will continue in Python 3.11.
Additionally, a new feature is added: ExceptionGroups
. This new feature allows to raise multiple exceptions and handle them separately. This can be useful in several cases:
- Several concurrent tasks can fail at the same time.
- Cleanup code (finally block) can cause its own errors.
- Code can try several different alternatives that all raise exceptions.
In Python 3.11, exceptions can be grouped in an
ExceptionGroup
and treated individually using theexcept*
keyword.
The future of Python
A talk that was particularly interesting for the future of Python is the one that Sam Gross gave. Since the early releases of Python, the CPython compiler is using a lock to prevent the access to the environment by multiple threads while executing the code. This lock is called the Global Interpreter Lock, or GIL. The GIL makes it easy to implement garbage collection (using reference counting) and other stuff but appears to slow down Python a lot when trying to run multithreaded code. Several people tried to remove the GIL in the past years without much success. However, Sam Gross recently managed to remove it and ran some very promising benchmarks. Hopefully, we can expect the GIL to be removed in a future version of Python using this work. The project is available here on GitHub. Sam Gross proposed to include his work in the next Python release (3.12) behind a compile flag.
Other talks
I couldn’t include summaries of all the talks I attended. Here are the ones that didn’t make it in the previous sections, feel free to check them too:
- Packaging security with Nix by Ryan Lahfa
- When to refactor your code into generators and how by Jan-Hein Bührman
- Making Python better one error message at a time by Pablo Galindo Salgado
- Demystifying Python’s Internals: Diving into CPython by implementing a pipe operator by Sebastiaan Zeeff
- Norvig’s lispy: beautiful and illuminating Python code by Luciano Ramalho
- Build a production ready GraphQL API using Python by Patrick Arminio
- Python’s role in unlocking the secrets of the Universe with the James Webb Space Telescope by Patrick Kavanagh
- HPy: a better C API for Python by Ronan Lamy
- Killer Robots Considered Harmful by Laura Nolan
- Online voting system used for primary elections for the French Presidential, must be secure right ? by Emmanuel Leblond
- Self-explaining APIs by Roberto Polli