Overview of Open Source Software Supply Chains
Open Source Software (OSS) is like a community recipe book for coding. Imagine a cookbook to which anyone can read, use, and even add their recipes—that’s what open-source software is like. In the tech world, developers share the “source code,” or the original programming instructions so that anyone can read, use, or modify it. This contrasts with “closed source,” where only the original creators can alter the code. With OSS, the more people who can look at and work with the code, the better it often becomes in terms of features and security.
Now, let’s talk about Software Supply Chains. Think of building software like building a car. A car has many parts—engine, tires, airbags—from various suppliers. Similarly, a software project uses different “parts,” which might be chunks of code or software libraries, many of which come from open-source projects. Just like in a car supply chain, where you’d want every part to be safe and reliable, in a software supply chain, you want to ensure that all the components are secure and function as expected.
The concept of a Software Supply Chain becomes especially important in the context of Open Source Software. Since many people can modify OSS, knowing where your ‘parts’ are coming from is crucial. Are they secure? Are they updated? Are their known vulnerabilities? What licensing requirements do they have?
Open Source Software (OSS) is essential in modern software development and data analysis. A staggering 90% of codebases used globally integrate at least some OSS components. It is, therefore, crucial to all projects - from a single project managed by a Data Scientist to a business-critical application used by millions of users - to understand what your software supply chain looks like and the risks that come with that.
Dependencies, Packages, Libraries: Understanding the Python Supply Chain Ecosystem
The Python ecosystem is built around interdependencies of dependencies, packages, and libraries, forming the foundational structure of open-source software projects.
- Packages are Python software modules that have been built and released to the open-source community. These can be installed and used to perform specific roles in your software project instead of writing that code from scratch. For example, pandas is a popular package choice for data exploration and manipulation that would take a long time to build from scratch.
- Libraries, on the other hand, are comprehensive collections of Python modules. A prime example is Python’s standard library - a one-stop shop for diverse modules ranging from file I/O to system calls. Another example is PyTorch, Meta’s extensive collection of machine-learning tools that can be installed as a single library.
- Dependencies are the complex webs of interconnected external software packages your projects or applications rely upon to operate. For instance, TensorFlow, a commonly used machine learning library, is often a necessary dependency for machine learning projects. By installing TensorFlow, however, you are also installing more than 20 other packages simultaneously, upon which TensorFlow relies. As a result, you have just added 21 packages and libraries to your software supply chain that are linked together. A vulnerability in one is effectively a vulnerability in all. Fear not, though! We’ll cover Supply Chain Security in part 2 of this series.
Package Installation via Pip, Pipenv, Poetry
Python offers a variety of streamlined mechanisms for package installation. The three key players are pip, pipenv, and poetry.
- Pip, or “Pip Installs Python,” is the foundation. It straightforwardly facilitates installing Python packages, such as with the command pip install safety.
- Pipenv is a step ahead, merging dependency management with environment handling. Running pipenv install pandas installs the pandas package while updating your Pipfile.
- Poetry is an advanced tool providing an all-in-one solution for dependency resolution, package management, and packaging process. It simplifies tasks like versioning and publishing packages.
The Security Perspective: Unveiling Software Supply Chain Security
Despite the efficiency they offer, software supply chains also pose distinct security risks. Threat actors often exploit these chains, resulting in dire consequences such as data breaches, malware distribution, and even system-wide vulnerabilities.
When embracing open-source software, the importance of security cannot be stressed enough. Our research has illuminated critical vulnerabilities like ReDoS (Regular expression Denial of Service) that can reside in widely used Python packages that pose significant risks to projects and the organizations who own them.
Understanding software supply chains and their respective security issues is vital in the current digital era. Here at Safety, we believe in streamlining Python dependency security, reducing vulnerability noise, and effectively integrating it into your security workflows. In our next article in this series, we dive deep into Software Supply Chain Security and security best practices.
To learn more about Safety, ask questions about this article, or provide feedback, please get in touch with us at firstname.lastname@example.org.