Europython 2022 - My take on the event

 

The 2022 edition of the Europython convention took place in Dublin between 11th and 17th of July, and I had the opportunity to spend the week at this event. This was the first time I attended a convention like this, and I really enjoyed the talks and meeting other people at this event. I decided to write this blog post to sum up the most interesting things I learned from this event and share it with the community.

More on the convention

This was the first time in two years that Europython was an in person event. It took place in the Convention Center of Dublin, Ireland, and gathered around 1200 python enthusiasts from around the world. There were more than a hundred different talks, often 5 or 6 at a time. I obviously couldn’t attend all of them, and had to choose the ones that sounded the more interesting to me.

Dublin's convention center

I struggled to find a way to organize all the things I learned from this week of conferences. In the end, I decided to divide the talks into categories, and write a section for each. I tried my best to find links to the work of the original speakers, and all the credit for the information in this article belongs to them.

Speakers are credited at the beginning of each section, feel free to check their work when it is available.

Security

Python is secure most of the time, and if we compare it to C++, there is less work to do as a developer to write secure code. However, we should still be careful how we use Python as bad practices can be problematic.

General advices

CVEs (Common Vulnerabilities and Exposures) are discovered very often, and some of them can be very critical for security. Thankfully, the Python community works hard to fix those issues when they are discovered, and provide regular updates for the Python interpreter and for packages. It then is the job of the software engineers to take care of updating Python and the packages they use.

The versions of Python receive some fixes over time, until they reach their end of life (the end of life of Python releases can be found here). Some methods can also be marked as deprecated in new releases, and thus shouldn’t be used anymore.

The python interpreter

Some of you probably use the python interpreter pretty often by just running the python command. This opens an interactive prompt where you can write and execute python code one line at a time. disconnect3D showed during his talk that this command may lead to security issues. Indeed, the program will try to fetch some dependencies in the directory where you launched the command, and if one of these dependencies was replaced by a malicious one, the code will be executed. This can be an issue if, as a sysadmin, you open a python interpreter in a folder where users upload files for example.

The issue seems to be fixed in python 3.11, but for earlier versions of python, it is probably a good idea to alias the python command to python -I to remove this risky behavior.

Coding advices

When developing in Python, some mistakes can lead to security vulnerabilities. Yan Orestes made us aware of the potential security concerns we should have as software engineers:

  • The eval method is not secure and shouldn’t be used (especially with an external input), as there is always a way around it.
  • When loading a data serialized with pickle, the dunder method __reduce__ is called on the serialized object. This can lead to arbitrary code execution if we are not careful. An alternative is to use a safer serialization format such as json.
  • pip is the most widely used package manager for Python developers. To install a package, pip will first look for a binary, then try to build the package from source, and finally if none of the previous methods worked, it will run the setup.py file. This is were things can turn wrong, as any code can be written in this file. Some malicious packages used this method combined with name squatting (the package has a name similar to a widely used package, e.g.: reqeust instead of request) to hack into computers. This can be prevented with the --only-binary and --require-hashes flags when running pip install.
  • Randomness is a difficult subject in computer science, and Python’s random module is not the best at generating random numbers. Alternatives are the secrets module, os.urandom and random.SystemRandom.
  • Extracting .tar.gz files can be dangerous, and an inspection of the file should be done before extracting it.
  • assert is not equivalent to an if statement! Indeed assert also checks the value of the __debug__ boolean which is set to False when running python with the -O flag.

A more detailed explanation of those recommendations can be found in the slides for this talk.

Auditing tools

Finally, some code audition tools are available:

Tools

  • Bullet proof Python - Property based testing with Hypothesis by Michael Seifert - Workshop
  • From pip to poetry - Python (many) ways of packaging and publishing by Vinícius Gubiani Ferreira
  • Lint All the Things! by Luke Lee
  • Robyn: An async Python web framework with a Rust runtime by Sanskar Jethi
  • Automate cleaning code in few easy steps! by Ester
  • Why is it slow? Strategies for solving performance problems by Caleb Hattingh
  • Automated Refactoring Large Python Codebase by Jimmy Lai

A lot of talks presented Some interesting tools to use with Python. I compiled the ones I found the more useful below.

Hypothesis

Hypothesis is a testing library that automates the choice of parameters to test your functions. Check the following example:

@given(
    st.one_of(
        st.lists(st.integers(), min_size=1),
        st.lists(st.floats(allow_nan=False), min_size=1),
    )
)
def test_max_returns_max(values):
    assert max(values) == sorted(values)[-1]

This method will be tested with 100 different inputs, those inputs being either a list of integers or a list of floats. Then, the max function is tested against the last element of the sorted list to check if it indeed returns the maximum value of the list. With Hypothesis, if you are creative enough in your assert statements, you can write testing methods that will cover much more inputs than if you did it manually. Hypothesis also tends to often test edge cases (empty lists, None values, etc).

Poetry

Although most people use pip as their package manager for Python, there are alternatives that are more recent, complete and secure. One of them is poetry, and it was introduced in several talks during Europython. Here is a summary comparing pip to poetry.

  pip poetry
Beginner friendly Yes Yes
Heavily relies on other tools Yes No
Built-in virtualenv management No Yes
Python 2 support Yes Dropping soon
Manage package and distribution by itself No Yes

Poetry has a number of advantages over pip, and I won’t go in depth on this subject here. Check out the poetry website for more information.

Code checks

Many tools exist in Python for checking the quality of your code. Here are the most used ones:

  • Black - Autoformat your Python files.
  • Isort - Automatically sort your import statements.
  • Flake8 - Get hints about bad code practices, this can be extended with your own lints.
  • Mypy - Check the type hints in your project.

All those tools can be run locally using a git precommit hook for example. However, some checks may take to much time to run in a precommit hook, such as mypy. A good practice is to use all those tools in a CI pipeline on your repository (for example using Github Actions).

If you didn’t use those tools from the beginning of your project and you have a huge codebase to refactor, you may want to automate the refactor. This is possible if you create a script that apply those tools on a few files, then automatically opens a pull request on your favorite code hosting platform. Gitlab and Github do provide API that will let you do this through a script. Then, a reviewer can be assigned to manually check if the changes are correct.

You may also want to check the performances of your Python code. For this purpose, they are multiple tools depending on your needs:

  • pytest-profiling: this tool displays a heat graph for your program.
  • pyspy: An tool that is external to your project and adds a very low overhead to your program execution as it runs in a separate process.
  • Opentelemetry: An instrumentation tool to highlight the performances of distributed systems.

Robyn

Robyn is an async web framework for Python with a Rust runtime. It is currently the only async web framework for Python, and also the fastest framework. It also supports sync methods, and its syntax is very similar to Flask, but there are some key differences between the two. The project is currently under active development and didn’t reach a stable 1.0 version for now.

Python’s internals

  • Raise better errors with Exception Groups by Or Chen
  • Protocols - Static duck typing for decoupled code by Ran Zvi
  • Writing Faster Python 3 by Sebastian Witowski
  • How we are making Python 3.11 faster by Mark Shannon
  • Multithreaded Python without the GIL by Sam Gross
  • Clean Architectures in Python by Leonardo Giordani

Some talks helped me to understand how to write better code in Python and how the compiler works under the hood. The next section will highlight some of them.

Write better Python

Faster code

There are many ways to write code that does what you want, but some methods will be faster than others. Sebastian Witowski shared some improvements that can make your code faster during his talk. Among those:

  • The fastest way to remove duplicates from a list is to convert it to a set and back to a list again. Use list(dict.fromkeys(DUPLICATES)) if you want to preserve the order of the list.
  • Generators are memory efficient while lists are speed efficient.
  • We sometimes have to choose between asking for permission and asking for forgiveness. This means that we can write two algorithms that have the same behavior but one is checking if it can perform an action and the other is performing the action and handling an exception if it failed. Asking for permission is faster if the condition is not met, but asking for forgiveness is faster if the code doesn’t raise an exception.
  • numba is a Just In Time (JIT) compiler for Python that allows for faster runtime in some cases.
  • Initializing a new dictionary using {} is faster than using dict().
  • numpy provides a lot of improved methods and data structures. More performances comparison can be found here on GitHub.

Protocols

A feature of Python that is sometimes useful and help achieve better code is Protocols. Take the following code snippet as an example:

from typing import Sized, Iterable, Iterator

class Bucket(Sized, Iterable[int]):
    ...
    def __len__(self) -> int: ...
    def __iter__(self) -> Iterator[int]: ...

The problem here is that you must explicitly inherit the Sized and Iterable Abstract Base Classes (ABCs) to register them as subtypes of their parents. This is particularly difficult to do with library types as the type objects may be hidden deep in the implementation of the library. Also, extensive use of ABCs might impose additional runtime costs. Consider the following snippet as a solution:

from typing import Iterator, Iterable

class Bucket:
    ...
    def __len__(self) -> int: ...
    def __iter__(self) -> Iterator[int]: ...

def collect(items: Iterable[int]) -> int: ...
result: int = collect(Bucket())  # Passes type check

Protocols allow to avoid inheriting from classes and only rely on the implementation of the required dunder methods. The static type checking passes implicitly.

Building a code architecture

According to an old definition, a clean architecture is something that is useable, maintainable, and beautiful. This is what every developer should try to achieve with their code. Writing code is about controlling a flow of data, and there are several methods to organize this flow in a structured way. The idea behind the clean architecture presented by Leonardo Giordani relies on the separation between the data that is external to our system and the one that is used to compute our business logic. The book Clean architecture in Python will describe it way better than me, feel free to check it out.

Python 3.11

Feature: ExceptionGroups

Error messages started to be improved in the Python interpreter in the recent releases of Python. This will continue in Python 3.11. Additionally, a new feature is added: ExceptionGroups. This new feature allows to raise multiple exceptions and handle them separately. This can be useful in several cases:

  • Several concurrent tasks can fail at the same time.
  • Cleanup code (finally block) can cause its own errors.
  • Code can try several different alternatives that all raise exceptions. In Python 3.11, exceptions can be grouped in an ExceptionGroup and treated individually using the except* keyword.

The future of Python

A talk that was particularly interesting for the future of Python is the one that Sam Gross gave. Since the early releases of Python, the CPython compiler is using a lock to prevent the access to the environment by multiple threads while executing the code. This lock is called the Global Interpreter Lock, or GIL. The GIL makes it easy to implement garbage collection (using reference counting) and other stuff but appears to slow down Python a lot when trying to run multithreaded code. Several people tried to remove the GIL in the past years without much success. However, Sam Gross recently managed to remove it and ran some very promising benchmarks. Hopefully, we can expect the GIL to be removed in a future version of Python using this work. The project is available here on GitHub. Sam Gross proposed to include his work in the next Python release (3.12) behind a compile flag.

Other talks

I couldn’t include summaries of all the talks I attended. Here are the ones that didn’t make it in the previous sections, feel free to check them too:

  • Packaging security with Nix by Ryan Lahfa
  • When to refactor your code into generators and how by Jan-Hein Bührman
  • Making Python better one error message at a time by Pablo Galindo Salgado
  • Demystifying Python’s Internals: Diving into CPython by implementing a pipe operator by Sebastiaan Zeeff
  • Norvig’s lispy: beautiful and illuminating Python code by Luciano Ramalho
  • Build a production ready GraphQL API using Python by Patrick Arminio
  • Python’s role in unlocking the secrets of the Universe with the James Webb Space Telescope by Patrick Kavanagh
  • HPy: a better C API for Python by Ronan Lamy
  • Killer Robots Considered Harmful by Laura Nolan
  • Online voting system used for primary elections for the French Presidential, must be secure right ? by Emmanuel Leblond
  • Self-explaining APIs by Roberto Polli