In this article, we want to highlight some packages and tools we use in our team to set up a new python microservice. Python has a great community that provides many tools and packages. This improves our development experience a lot, but choosing the right tool or package for your service isn't always easy. That's why we want to highlight what we use in our team.
List of tools that we use in our machine learning team to kickstart python services.
- pyenv: python version management
- poetry: python packaging and dependency management
- FastAPI: web framework with async support
- pydantic: type and data validation for python
- SQLAlchemy: ORM for SQL databases
- structlog: structured logging for python
- black: opinionated python code formatter
- flake8: tool to check python code style and quality
- mypy: static typing for python
- interrogate: Docstring coverage
- pre-commit: framework for managing and executing pre-commit hooks
If your python journey started as mine did, you installed packages in your system python. Then after resetting your broken OS, you learned about virtual environments and used those. This works fine until you want to start a new project on python 3.10, but still have an old project that runs on 3.7 and you don't have time to migrate it.
This is where pyenv comes into play. It's a great tool to manage python versions. After setting it up, installing and using a new python version is as easy as this:
# install a new version
pyenv install 3.10.6
# use it in this shell
pyenv shell 3.10.6
# set it as your global python version
pyenv global 3.10.6
# or as a default in the current directory
pyenv local 3.10.6
You can have as many python versions as you need and if you, by chance, break any of them you can easily do a clean install.
poetry is our tool of choice for dependency management and virtual environments. It's easy to use, has a great dependency resolver, and comes with built-in virtual environments.
There are a lot of web frameworks in python to choose from. We went with FastAPI because of its minimal approach and support for concurrency. It also automatically generates OpenAPI documentation. FastAPI is based on Starlette and pydantic. While the first one provides most functionalities needed for a web framework, the latter introduces data validation to python.
Here's a small example of how to define a pydantic class with data validation using typing and constraints. You can find a much more detailed tutorial in the pydantic documentation.
from pydantic import BaseModel, conint
from typing import Literal
# weight must be an int > 0
# type must be either "cat" or "dog"
pet_type: Literal["cat", "dog"]
FastAPI has great support for pydantic, you can define a POST endpoint using the
Pet class like this:
app = FastAPI()
async def create_pet(pet: Pet):
The request body for this endpoint then looks like this.
If you want to use snake_case in your python service, but the API interface should be in camelCase you can use this base class to automatically add a camelCase alias for every attribute.
from humps.main import camelize
from pydantic import BaseModel
alias_generator = camelize
allow_population_by_field_name = True
When we decided to embrace async as our default for all our microservices, SQLAlchemy did not support async sessions, so we had to try other solutions. ormar is a small ORM package with async support for Postgres. As it was still in early development when we gave it a shot, we switched to encode/databases and SQLAlchemy Core. With the release of version 1.4, SQLAlchemy added support for async sessions and we switched back to it as it's the most tried and tested ORM package for python.
The author of SQLAlchemy also wrote a database migration tool called alembic which we use to manage our database schemas.
To reduce boilerplate code, we tried out SQLAlchemys DeferredReflection feature. With this, you don't need to define columns for your database objects, but instead, load the information from the database at a later stage.
While you write less code, you also lose autocompletion and wonder why a table does not have a column, you know it should have because you forgot to reflect the table object.
That's why we decided to go back to defining columns in code after trying this once.
We opted for structlog to enforce structured logging. While you could also do that with the standard logging library, structlog is built for it and comes with great features, e.g. a context manager to bind variables. Using this manager every log message inside it also logs the bound variables.
>>> event='hi' b=2
We use several tools to enforce code quality.
- black: An opinionated code formatter, so we avoid arguing about trivia such as line length or single quotes vs double quotes.
- mypy: A static type checker. Enforcing type hints makes the code easier to understand and maintain. It also improves auto-completion and catches many possible bugs early.
- flake8: Our code linter of choice. Needs a few tweaks to work well with black.
- interrogate: The newest addition. It checks docstring coverage and fails if coverage is below a set threshold. We added this because we struggle to maintain good docstring coverage over time.
Our GitHub workflows ensure that we can only merge a pull request if the checks of all the above-mentioned tools pass. In addition, we also started using pre-commit to automatically run all checks before changes are pushed. We started with pre-commit hooks but found them too annoying. For us, pre-push hooks work great because we can keep commits small, but still, see if a check fails without looking at the pull request.
Examples of how we configured these tools can be found here.
While this list is not complete and we re-evaluate tools, when starting a new service, I hope this summary is a good base for your next new python service.
Published by Philipp Glock
Visit author page