- Create environment
- Dedicated notebook environment
- Install packages
- Remove unnecessary packages
- Document environment
- Recreate environment
Related: project organization < data science projects
> [!Warning]
> Don't make the same mistake I make (again and again). Always activate the environment you want to install packages into before installing packages so you don't pollute your base environment and have to install the packages twice!
```bash
mamba create -n <my-env>
# or
mamba create -n pandas scikit-learn matplotlib
# or
mamba create -n python=3.13 pandas scikit-learn matplotlib
```
Install as many packages as you can at once to allow `mamba` to resolve the dependencies once.
To create from a file
```bash
# if file environment.yml exists in the active directory
mamba env create
# to specify file path
mamba env create -f environment.yml
```
## recreate the environment
Recreating the environment before you finalize the project, and periodically throughout development, is considered best practice to ensure its shareable, reproducible, and free of unnecessary dependencies. Recreate the environment every few weeks or months depending on how frequently you work on it.
If necessary, first create the `environment.yml` file (or update). Manually inspect the file and remove any unnecessary packages. Packages may become unnecessary because you found a better library, used a different approach that didn't require the library, or abandoned the functionality that required the library.
Next, delete the old environment.
```bash
mamba activate base
mamba env remove -n myenv
```
Recreate the library and run your tests to ensure everything works.
```bash
mamba env create -f environment.yml
```
For smaller projects, a quick review of the `environment.yml` file is sufficient to clean out any unused dependencies. If you are using a [[dedicated notebook environment]], you are less likely to have installed multiple unnecessary packages. However, some projects will require a more systematic approach.
One approach is to remove packages one at a time and then re-run the app to ensure it still works.
```bash
mamba remove somepackage
python app.py
```
Another approach is to use `pipreqs` to check the packages you import in Python scripts located inside your project. Note that this will not show the packages imported in any notebooks in your project directory.
```
mamba install pipreqs
pipreqs . --ignore <myenv>
```
This generates a `requirements.txt` file listing only the packages found in your scripts. Make sure everything you import is listed in your `environment.yml` file (or possibly installed as a dependency of another package).
## dedicated notebook environment
For more complex data science projects, create two environments:
- `my-env`: your primary environment for the project with only the packages needed for the model or app.
- `my-env-nb`: A notebook environment for exploring within notebooks. Some of the packages installed here will not be needed for the final project.
In the notebook environment install packages like `ipykernel`, `matplotlib` and `seaborn` along with any other packages you want to try out.