Getting a Handle on our Environment Management for Model Deployment and Engineering Pipelines
Organization Hierarchy: Library >> Packages >> Modules
Dependency: If Code A requires Code B to operate, Code B is a dependency of Code A, where “Code” can be a program, library, or package. Dependencies can have dependencies, which have dependencies, and so on.
-
Example
Dependencies fork if a dependency is not updated to the latest version.
Suppose Code A is v1.0. Code B is at v2.0, where Code B is a dependency of Code A.
A new software update is released for Code B, allowing Code B to update to v1.1.
Code A will still be dependent on Code B v1.0, so our project requires the older v1.0 of Code B.
But Code B is usually not just a dependency to Code A. Other code will require the newest version of Code B.
From one software update, we have two slightly different copies of Code B. Code B has forked.
Repeated forking places additional strain on the compiler and causes the program to take longer to execute.
Naturally, a growing web of dependencies can be difficult to handle. Thankfully, Python Dependency Managers offer a way to keep dependencies from trampling on one another.
Virtual Environments
Virtual environments allow us to isolate dependencies so that they do not interfere with each other, acting as directories which allow us to access the packages located within it.
-
Example of a Directory
Virtual Environment 1 stores Package A v1.1, Package B v1.0, and Package C v1.0.
Virtual Environment 2 stores Package A v1.2 and Package B v1.3.
As explained in the “Dependency” example, projects or programs within a project may use different versions of the same package. Because Python is unable to tell the difference between two versions of the same package, we need to isolate the different versions of the package and access them via their virtual environment.
Virtualenv
-
A tool from the venv Python library which creates a virtual environment
Python Dependency Managers
- Conda
- Language-agnostic package, dependency, and environment management.
- Poetry
- Python dependency management and packaging
- Read More
Companies that use Conda:
- Nutmeg – automatic portfolio management
- 10x genomics – targeted RNA gene expression, some data analytics
- Farmioc – Agricultural Data Analytics Platform.
Companies do use Poetry in their dev stack, but stackshare.io, the database holding companies’ dev stacks offered scant evidence of this. Companies tend not to reveal the stack they use, especially hyper-competitive startups.
Because our code is made up of more than just Python, a language-agnostic software like Conda seems to be the better choice.
However, the general consensus from several sources (and experienced developers) is that Poetry and Conda can accomplish the same goal if wielded properly. Many developers use Conda, Poetry, or a combination of the two — it’s a matter of personal preference. In our case, we use Poetry for Python and Conda when our code is written in several languages. Whatever is most comfortable to use will be the most productive and efficient in the long-run. In other words, “if it ain’t broke, don’t fix it.”
Pyenv
A tool for managing multiple versions of Python, Pyenv allows you to:
- Change the global python version
- Create Python virtual environments
- Install more than one version of Python, and create directories to access each version
CI/CD Actions
CI - Continuous Integration - an automated process where code is tested for bugs and merged into a repository (in our case, GitHub)
CD - Continuous Delivery/Deployment - From the repository, the code can be deployed to a live production environment (this is where the product is available to consumers).
GitHub Actions
GitHub Actions is CI/CD workflow management. It’s what allows us to pour our code into a third party cloud application (AWS Athena in our case), revise our code, and push out another iteration.
Docker
Virtual environments tend to use a whole lot of computing resources (remember, we’re recreating entire virtual machines within an OS). Instead of creating virtual environments to manage dependencies, we can isolate such dependencies in containers created through the software Docker (a process known as Dockerization). Instead of running a virtual machine and recreating an OS, Docker containers operate using the system’s native OS, requiring far less computing resources than virtual environments.
- Explore more of everything covered in this doc:
- https://medium.com/semantixbr/getting-started-with-conda-or-poetry-for-data-science-projects-1b3add43956d
- https://stackoverflow.com/questions/70851048/does-it-make-sense-to-use-conda-poetry
- https://blogs.sap.com/2022/05/08/why-you-should-use-poetry-instead-of-pip-or-conda-for-python-projects/
- https://stackshare.io/poetry
- https://news.ycombinator.com/item?id=27559655
- https://discovery.hgdata.com/product/anaconda
- https://pythonspeed.com/articles/conda-vs-pip/