SQLAlchemy

SQLAlchemy is a [[Python]] package that handles interfacing with a [[database]]. You can easily switch from [[SQLite]] as a test database to [[Postgres]] as a deployment database without changing any code. This tutorial will introduce you to [Flask-SQLAlchemy](https://flask-sqlalchemy.palletsprojects.com/en/2.x/), an extension for [[Flask]] which is configured for easier use with Flask apps (e.g., [[Dash]] apps). ## installation ```python conda install -c conda-forge flask-sqlalchemy ``` Also install Flask if needed. If you're using Postgres (as we will in this tutorial), also install psycopg2: ```python conda install -c conda-forge psycopg2 ``` ## set up Circular imports can be a problem with `flask-sqlalchemy`, to avoid that we'll set up a package within our project to handle the namespaces. The project directory will look like this: ``` root/ | app/ -- __init__.py -- models.py index.py ``` `index.py` will run the app. Within `app/`, `__init__.py` will set up the app, and `models.py` will contain the data definition language. ## create an empty database First we need to create the database. For this project, we'll use a Postgres database locally for testing and then again after we deploy to Heroku. See [this tutorial](postgres-tutorial) for the steps to create the Postgres database on your local machine. Follow steps 1 - 3, then return here. ## configure the application In your `__init__.py` file, copy/paste the following code: ```python from flask import Flask from flask_sqlalchemy import SQLAlchemy app = Flask(__name__) # switch ENV variable on deployment. ENV = 'dev' if ENV == 'dev': app.debug = True app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://postgres:<password>@localhost/<database>' else: app.debug = False app.config['SQLALCHEMY_DATBASE_URI'] = '' # avoid warning in console app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False db = SQLAlchemy(app) ``` Update the `SQLALCHEMY_DATBASE_URI` string with your password and database name. If instead you'd prefer to use a SQLite database, simply substitute `sqlite:///test.db` for the `SQLALCHEMY_DATABASE_URI` environment variable. Note that the models package has not been configured yet, which we will do in the next section. In your `index.py` file, paste the following code: ```python from app import app if __name__== '__main__': app.run() ``` You can run the app using `python index.py`. For now, there is nothing in the app to run. ## define database structure Database tables are created as python classes for easy access within the application. ```python from datetime import datetime # used for timestamps from app import db # database object created in __init__.py class Project(db.Model): __tablename__ = 'projects' id = db.Column(db.Integer, primary_key=True) project_name = db.Column(db.String(200), unique=True, nullable=False) def __repr__(self): return f'Project('{project_name}') class HabitatOutcome(db.Model): __tablename__ = 'habitat_outcomes' id = db.Column(db.Integer, primary_key=True) date = db.Column(db.DateTime, default=datetime.utcnow) # do not add parens to utcnow project_id = db.Column(db.Integer, db.ForeignKey('projects.id'), nullable=False) quantity = db.Column(db.Float, nullable=False) unit = db.Column(db.String(20), nullable=False) benefit_type = db.Column(db.String(100), nullable=False) def __repr__(self): return f'{quantity} {unit}s of {benefit_type}' ``` The class Project represents a table for projects with an auto-incrementing id as a primary key and project_name as the only attribute. Defining a string representation using the `__rep__` method is good practice to help with debugging. The class `HabitatOutcome` represents a quantity of benefit and is associated with projects through the `project_id` foreign key. The table name in the database will be converted from the class `CamelCase` name to `camel_case` by default. Otherwise a table name can be specified with the`__tablename__` dunder method. I prefer to specify table names to have singular class names and plural table names. Note that both the unit field and the benefit_type field should have explicit options to enforce data integrity. Setting up tables to store options is discussed below, but for simplicity this tutorial will not set those up for now. ### relationships As we saw above, it is straightforward to specify relationships between entities with a foreign key. SQLAlchemy also allows us to store with the table a query which will access information about these relationships. For example we can extend the `Project` class to get habitat outcomes associated with a project from any instance of the class: ```python class Project(db.Model): ... habitat_outcomes = db.relationship('HabitatOutcome', backref='project', lazy=True) ``` `habitat_outcomes` is not a column in the table `project` but rather a query that will return a list of all habitat_outcomes associated with a project. The query can be accessed as a property of each project with `project.habitat_outcomes`. The `backref` parameter will allow us to get the project associated with any habitat outcomes using `habitat_outcome.project'`. We can use the relationship in the string representation of the habitat outcome. ```python class HabitatOutcome(db.Model): ... def __repr__(self): return f'{quantity} {unit}s of {benefit_type} from {self.project.project_name}' ``` ### many-to-many relationships If you want to use many-to-many relationships you will need to define a helper table that is used for the relationship. For this helper table it is strongly recommended to not use a model but an actual table. You will need to specify this table before the two tables it relates. ```python projects_programs = db.Table( 'projects_programs', db.Column('project_id', db.Integer, db.ForeignKey('projects.id', primary_key=True)), db.Column('program_id', db.Integer, db.ForeignKey('programs.id', primary_key=True)) ) ``` Now we can include the programs associated with a project using a many-to-many relationship like this: ```python class Project(db.Model): ... programs = db.relationship('Program', secondary=projects_programs, lazy = 'subquery', backref=db.backref('projects', lazy=True)) ``` Now, `project.programs` will list all of the programs associated with the project. Of course, you need to add a table to store programs: ```python class Program(db.Model): __tablename__ = 'programs' id = db.Column(db.Integer, primary_key=True) program_name = db.Column(db.String(200), nullable=False) ``` ### enumeration or option tables When you want to restrict user inputs to a set of values, like data validation from a list in Excel, you should use tables to store those options. These should be specified as classes. ```python class ConservationPlanningArea(db.Model): __tablename__ = 'conservation_planning_areas' id = db.Column(db.Integer, primary_key=True), cpa_name = db.Column(db.String(200). nullable=False) ``` When you specify the field in the parent table, the field will be an integer with a foreign key in the child table: ```python class Project(db.Model): ... conservation_planning_area = db.Column(db.Integer, db.ForeignKey('conservation_planning_areas.id'), nullable=False) ``` ## create the database Once the data structure is specified in `models.py`, creating the database is very easy. In the console, type: ```python from app import db # app references app package, not the app from app.models import * # load all tables/classes into namespace db.create_all() ``` Open pgAdmin and refresh the tables to see if the database was set up correctly. Note that re-running `db.create_all()` will only add new tables and will not override existing data, add new fields/relationships, or fail if tables already exist. Use `db.drop_all()` to delete all tables and start over. See [here](https://dev.to/zchtodd/sqlalchemy-cascading-deletes-8hk) for an in depth discussion on deleting records. ## adding data Typically, you will add data programmatically through your application. If you have existing data to pre-populate your database, you may either batch upload `.csv` files (continue the [Postgres tutorial](postgres-tutorial)) or write a script to read and insert data. Once you've deployed your application, you'll want to insert data provided by users with functions in your code. Here we go over some of the basic commands for creating, reading, updating, and deleting data using SQLAlchemy. It will be up to you to adapt these concepts to your application. Here we'll insert and delete data to the local database instance. You may need to repeat these steps for your production database after setting it up. First, we'll need to add conservation planning areas to the database. This must be added first because we'll need to specify a conservation planning area when we add a project. ```python feather_river = ConservationPlanningArea(cpa_name='Feather River') db.session.add(feather_river) db.session.commit() ``` Let's check that the area was added. ```python ConservationPlanningArea.query.all() >>>[<ConservationPlanningArea 1>] ``` We can access properties of the Project with the dot accessor. ```python feather_river.id >>>1 ``` Now, let's add a project to the database. You must both add and commit the project to the database. ```python project_1 = Project(project_name='Credit Project A', conservation_planning_area=feather_river.id) db.session.add(project_1) db.session.commit() ``` If you add something incorrectly, rollback the session with `db.session.rollback()`. We can also access records based on their ids: ```python Project.get(1) >>><Project 1> ``` Next, we'll add a few habitat outcomes. ```python outcome_1 = HabitatOutcome(project_id=project_1.id, quantity=10, unit='acres', benefit_type='GGS') outcome_2 = HabitatOutcome(project_id=project_1.id, quantity=200, unit='lineal feet', benefit_type='Shaded Riparian') db.session.add(outcome_1) db.session.add(outcome_2) db.session.commit() ``` Now let's check the relationships we set up. First, we'll check that the habitat outcomes are indeed associated with the right project: ```python project_1.habitat_outcomes >>> ``` We can also access the project associated with each habitat outcome using the `backref` we set up: ```python outcome_1.project >>> outcome_2.project >>> ``` ## querying data We've seen some basic queries above, including `.query.all()` and using dot accessors to access properties of each record. Here we'll look at some more advanced queries including filters, joins, aggregations, and pivots (cross-tab). For advanced queries, you can submit any SQL statement with `db.session.execute("FROM projects SELECT *")`. Try saving these queries in a `.sql` file, then passing the contents of that file to `execute()`. This is a good way to create views in the database, which you can then access with `db.session.execute("FROM view SELECT *")`. Another tactic is to read a SQL statement into a `pandas` dataframe using `df = pd.read_sql()` and continue manipulation in pandas. I recommend using SQL to the extent possible, saving common data queries as views in the database, and reading those views into pandas. It is more performant, reduces duplication of code across applications serving the database, and a better starting place for your team. This is not supported out-of-the-box with SQLAlchemy, but see [here](https://stackoverflow.com/questions/9766940/how-to-create-an-sql-view-with-sqlalchemy) if you want more on how to create views and represent them in the ORM. ## moving to production See the [Postgres tutorial](postgres-tutorial) for help setting up a database on Heroku, beginning with step 7. Once the database is set up and your code is deployed to Heroku, you can create the database tables with the command: ```python heroku run python from app import db from app import models db.create_all exit() ``` Connect to the database on pgAdmin and confirm the database was set up correctly. To view data from the database on Heroku, you can follow this pattern: ```python pg: psql --app <app_name> select * from <table_name> ``` ## database schema To get information about the database schema: ```python from flask_sqlalchemy import inspect insp = inspect(db.engine) insp.get_table_names() >>>['table_1', 'tabl'] ``` > [!Tip]- Additional Resources > - Corey Shafer Flask app series > * [Package Structure](https://www.youtube.com/watch?v=44PvX0Yv368)