project organization for databases

#rough - add a `sql` directory - include DDL in a `ddl.sql` file - include any views in `view_<view_name>.sql` files - If using a python library, include script to create/update database in `scripts/` - Store historic versions of sql commands in a sql_notes.txt file. For example, I had to downgrade a SQLite version that couldn't support interim tables and so had to rewrite all of my sql commands. I stored the old versions in this file so I wouldn't lose that thinking. - Use a Makefile to set up and tear down ## Directory organization You won't have a scripts folder if not using Python. Following the DBML recommendations, include a database.dbml file in the root directory. ```text . data/ scripts/ +-- 010-create-database.py +-- 020-import-tables.py +-- 030-create-views.py +-- utils.py +-- models.py sql/ +-- create_tables/ +-- create_table1.sql +-- insert/ +-- insert_table1.sql +-- create_views/ +-- create_view1.sql +-- drop_all_tables.sql tests/ initialize_db.bat # !use Makefile database.dbml README.md ``` The example code below comes from two projects on my Surface: 1. nv-ccs-database: uses SQLite 2. dwr-sidewinder: uses SQLAlchemy ## SQLite > [!Warning] > Use Makefile instead initialize_db.bat ```bat "C:\Users\Erik\Anaconda3\python.exe" %~dp0scripts/010_create_database.py "C:\Users\Erik\Anaconda3\python.exe" %~dp0scripts/020_import_policy_tables.py "C:\Users\Erik\Anaconda3\python.exe" %~dp0scripts/030_create_views.py pause ``` For some reason, between sessions my simple .bat file stopped working. It probably was finding a different install of python and missing the xlrd package (although I think both of my python installs include xlrd). Ideally, I'd be working in a conda environment and the .bat file would activate it first. Here's some code that might work: ```bat call activate [my_env] python my_script.py call conda deactivate ``` This will require that anyone else using the batch file also has replicated my environment. First, to export the environment: `conda list --explicit > <environment name>.txt` To create from this text file `conda env create --file <environment name>.txt` or `conda create --name <name> --file <file name>.txt` This [conda cheat sheet](https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf) is great. models.py ```python import pandas as pd class CreditCalculator: def __init__(self, calc_file): self.calc_file = calc_file @property def map_units_df(self): # read in map units table map_units_df = pd.read_excel( self.calc_file, '1.1 Enter Map Unit Data', skiprows=5, header=None, usecols="C:G,J:O,V") # read TRUE/FALSE as bool (0) in indirect_benefits map_units_df = map_units_df[map_units_df.iloc[:,0].notnull()] return map_units_df ``` utils.py ```python import sqlite3 from sqlite3 import Error import sys def create_connection(db_file): """ Create a database connection to the sqlite database specified by db_file :param db_file: database file :return: connection object or None """ conn = None try: conn = sqlite3.connect(db_file) except Error as e: print (e) return conn def create_table(conn, create_table_sql): """ Create a table from the create_table_sql statement :param conn: connection object :param create_table_sql: a CREATE TABLE statement :return: None """ try: c = conn.cursor() c.execute(create_table_sql) except Error as e: print (e) def insert_data(conn, sql_file, data): """ Insert provided data into the specified table using a .sql file :param conn: connection object :param sql_file: a .sql file containing the sql command :param data: a list of data :return: th unique identifier for the data """ with open(sql_file, 'r') as file: sql = file.read() try: c = conn.cursor() c.execute(sql, data) except Error as e: print(e) sys.exit(0) return c.lastrowid def create_view(conn, sql_file): """ Executes a sql file :param conn: connection object :param sql_file: a .sql file containing the sql command :return: None """ with open(sql_file, 'r') as file: sql = file.read() try: c = conn.cursor() c.execute(sql) except Error as e: print(e) sys.exit(0) ``` In `ddl.sql` ```sql DROP TABLE IF EXISTS table1 CASCADE; DROP TABLE IF EXISTS table2 CASCADE; create table if not exists table1 { id serial primary key name text } create table if not exists table2 { id serial primary key link integer references table1 } ``` ## SQLAlchemy If using SQLAlchemy, create a class to store the data read from the spreadsheet or csv. ```python class xlForecast: def __init__(self, file_xlsx): self.file = file_xlsx @property def agencies(self): return pd.read_excel(self.file, 'options', usecols="A").dropna() @property def implementers(self): return pd.read_excel(self.file, 'implementers', usecols="B:E") ``` To create a database object with SQLalchemy in a flask app. ```python from flask import Flask from flask_sqlalchemy import SQLAlchemy import dash import dash_bootstrap_components as dbc import os server = Flask(__name__) app = dash.Dash(__name__, server=server, suppress_callback_exceptions=True, external_stylesheets=external_stylesheets) if ENV == 'deploy': app.server.debug = False uri = os.environ['DATABASE_URL'] uri = uri.replace("postgres://", "postgresql://", 1) app.server.config['SQLALCHEMY_DATABASE_URI'] = uri # Heroku needs to update URI for newer versions of Postgres elif ENV == 'local': app.server.debug = True with open('secrets/db_uri.txt') as f: uri = f.read() uri = uri.replace("postgres://", "postgresql://", 1) app.server.config['SQLALCHEMY_DATABASE_URI'] = uri # avoids warning in console app.server.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False db = SQLAlchemy(app.server) ``` To create the database (instead of DDL language) ```python from app import db # create many-to-many helper tables first projects_programs = db.Table( 'projects_programs', db.Column('project_id', db.Integer, db.ForeignKey('projects.id'), primary_key=True), db.Column('program_id', db.Integer, db.ForeignKey('programs.id'), primary_key=True) ) # next create enum tables as classes class Agency(db.Model): __tablename__ = 'agencies' id = db.Column(db.Integer, primary_key=True) agency_name = db.Column(db.String, nullable=False) # create each database table as a class class Project(db.Model): __tablename__ = 'projects' # columns id = db.Column(db.Integer, primary_key=True) project_name = db.Column(db.String, unique=True, nullable=False) # relationships habitat_parcels = db.relationship('HabitatParcel', backref='project', lazy=True) mitigation_parcels = db.relationship('MitigationParcel', backref='project', lazy=True) programs = db.relationship('Program', secondary=projects_programs, lazy = 'subquery', backref=db.backref('programs', lazy=True)) def __repr__(self): return f"Project('{self.project_name}')" ``` Script to populate database from spreadsheet ```python from app import db from apps.data import xlForecast from apps.models import * db.create_all() F = 'data/Database of Expected Projects.xlsx' # map xlForecast properties to SQLAlchemy classes prop_to_class_map = { 'agencies': Agency, 'implementers': Implementer, } prop_to_table_map = { 'projects_programs': projects_programs } # load data data = xlForecast(F) # insert data to database for prop, tbl_class in prop_to_class_map.items(): print(prop) for row in getattr(data, prop).to_dict('records'): db.session.add(tbl_class(**row)) db.session.commit() for prop, tbl_name in prop_to_table_map.items(): print(prop) for row in getattr(data, prop).to_dict('records'): db.session.execute(tbl_name.insert().values(**row)) db.session.commit() ``` To read a view into Python ```python from app import app, db df = pd.read_sql_table('view_project_outcomes', con=db.engine) ``` There are two primary strategies when editing a table in a database. 1. Drop the table and add the new data. This supposes that all data are stored in the original data source. 2. Insert into a table. This is best when you want to add new data to a table. ### Create table ```sql CREATE TABLE IF NOT EXISTS table_name ( id integer, name varchar ); ``` ### Read csv ```sql INSERT INTO table_name ( id, name ) VALUES(?, ?) ``` ### Create view ```sql CREATE VIEW IF NOT EXISTS view_name AS ... ``` With statements allow you to create interim tables that simply your sql statement. Here's an example ```sql REATE VIEW IF NOT EXISTS view_baseline AS WITH weighted_baseline AS ( SELECT d.map_unit_id, d.indirect_benefits_area, d.mz3, d.mz4, d.mz5, b.season, SUM(CASE WHEN b.mgmt_zone = 'MZ III' THEN d.mz3 * b.baseline END) AS mz3_weighted, SUM(CASE WHEN b.mgmt_zone = 'MZ IV' THEN d.mz4 * b.baseline END) AS mz4_weighted, SUM(CASE WHEN b.mgmt_zone = 'MZ V' THEN d.mz5 * b.baseline END) AS mz5_weighted FROM view_desktop_results AS d LEFT JOIN standard_baseline AS b GROUP BY d.map_unit_id, b.season ) SELECT map_unit_id, season, (mz3_weighted + mz4_weighted + mz5_weighted) AS baseline FROM weighted_baseline; ``` ### Edit record Be sure to not edit records that will be overwritten on the next import. Instead, edit the data source, drop the existing table, and read in the new one. 1. Open pgAdmin 2. Navigate to the table in the Browser tree 3. Right-click and select Edit All Rows 4. Make edit 5. Save (F6) ### Drop tables Sometimes you'll want to drop many tables or all tables in a database and start fresh with the same schema. ```sql DROP TABLE IF EXISTS table_name CASCADE; DROP VIEW IF EXISTS view_name; ``` Postgres command line tool `psql`