#rough
- add a `sql` directory
- include DDL in a `ddl.sql` file
- include any views in `view_<view_name>.sql` files
- If using a python library, include script to create/update database in `scripts/`
- Store historic versions of sql commands in a sql_notes.txt file. For example, I had to downgrade a SQLite version that couldn't support interim tables and so had to rewrite all of my sql commands. I stored the old versions in this file so I wouldn't lose that thinking.
## Directory organization
You won't have a scripts folder if not using Python.
Following the DBML recommendations, include a database.dbml file in the root directory.
```text
.
+-- data/
+-- scripts/
+-- 010-create-database.py
+-- 020-import-tables.py
+-- 030-create-views.py
+-- utils.py
+-- models.py
+-- sql/
+-- create_tables/
+-- create_table1.sql
+-- insert/
+-- insert_table1.sql
+-- create_views/
+-- view_view1.sql
+-- drop_all_tables.sql
+-- tests/
+-- initialize_db.bat
+-- database.dbml
+-- README.md
```
The example code below comes from two projects on my Surface:
1. nv-ccs-database: uses SQLite
2. dwr-sidewinder: uses SQLAlchemy
## SQLite
initialize_db.bat
```bat
"C:\Users\Erik\Anaconda3\python.exe" %~dp0scripts/010_create_database.py
"C:\Users\Erik\Anaconda3\python.exe" %~dp0scripts/020_import_policy_tables.py
"C:\Users\Erik\Anaconda3\python.exe" %~dp0scripts/030_create_views.py
pause
```
For some reason, between sessions my simple .bat file stopped working. It probably was finding a different install of python and missing the xlrd package (although I think both of my python installs include xlrd). Ideally, I'd be working in a conda environment and the .bat file would activate it first. Here's some code that might work:
```bat
call activate [my_env]
python my_script.py
call conda deactivate
```
This will require that anyone else using the batch file also has replicated my environment. First, to export the environment:
`conda list --explicit > <environment name>.txt`
To create from this text file
`conda env create --file <environment name>.txt` or `conda create --name <name> --file <file name>.txt`
This [conda cheat sheet](https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf) is great.
models.py
```python
import pandas as pd
class CreditCalculator:
def __init__(self, calc_file):
self.calc_file = calc_file
@property
def map_units_df(self):
# read in map units table
map_units_df = pd.read_excel(
self.calc_file,
'1.1 Enter Map Unit Data',
skiprows=5, header=None,
usecols="C:G,J:O,V") # read TRUE/FALSE as bool (0) in indirect_benefits
map_units_df = map_units_df[map_units_df.iloc[:,0].notnull()]
return map_units_df
```
utils.py
```python
import sqlite3
from sqlite3 import Error
import sys
def create_connection(db_file):
"""
Create a database connection to the sqlite database specified by db_file
:param db_file: database file
:return: connection object or None
"""
conn = None
try:
conn = sqlite3.connect(db_file)
except Error as e:
print (e)
return conn
def create_table(conn, create_table_sql):
"""
Create a table from the create_table_sql statement
:param conn: connection object
:param create_table_sql: a CREATE TABLE statement
:return: None
"""
try:
c = conn.cursor()
c.execute(create_table_sql)
except Error as e:
print (e)
def insert_data(conn, sql_file, data):
"""
Insert provided data into the specified table using a .sql file
:param conn: connection object
:param sql_file: a .sql file containing the sql command
:param data: a list of data
:return: th unique identifier for the data
"""
with open(sql_file, 'r') as file:
sql = file.read()
try:
c = conn.cursor()
c.execute(sql, data)
except Error as e:
print(e)
sys.exit(0)
return c.lastrowid
def create_view(conn, sql_file):
"""
Executes a sql file
:param conn: connection object
:param sql_file: a .sql file containing the sql command
:return: None
"""
with open(sql_file, 'r') as file:
sql = file.read()
try:
c = conn.cursor()
c.execute(sql)
except Error as e:
print(e)
sys.exit(0)
```
In `ddl.sql`
```sql
DROP TABLE IF EXISTS table1 CASCADE;
DROP TABLE IF EXISTS table2 CASCADE;
create table if not exists table1 {
id serial primary key
name text
}
create table if not exists table2 {
id serial primary key
link integer references table1
}
```
## SQLAlchemy
If using SQLAlchemy, create a class to store the data read from the spreadsheet or csv.
```python
class xlForecast:
def __init__(self, file_xlsx):
self.file = file_xlsx
@property
def agencies(self):
return pd.read_excel(self.file, 'options', usecols="A").dropna()
@property
def implementers(self):
return pd.read_excel(self.file, 'implementers', usecols="B:E")
```
To create a database object with SQLalchemy in a flask app.
```python
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
import dash
import dash_bootstrap_components as dbc
import os
server = Flask(__name__)
app = dash.Dash(__name__, server=server,
suppress_callback_exceptions=True,
external_stylesheets=external_stylesheets)
if ENV == 'deploy':
app.server.debug = False
uri = os.environ['DATABASE_URL']
uri = uri.replace("postgres://", "postgresql://", 1)
app.server.config['SQLALCHEMY_DATABASE_URI'] = uri # Heroku needs to update URI for newer versions of Postgres
elif ENV == 'local':
app.server.debug = True
with open('secrets/db_uri.txt') as f:
uri = f.read()
uri = uri.replace("postgres://", "postgresql://", 1)
app.server.config['SQLALCHEMY_DATABASE_URI'] = uri
# avoids warning in console
app.server.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
db = SQLAlchemy(app.server)
```
To create the database (instead of DDL language)
```python
from app import db
# create many-to-many helper tables first
projects_programs = db.Table(
'projects_programs',
db.Column('project_id', db.Integer, db.ForeignKey('projects.id'), primary_key=True),
db.Column('program_id', db.Integer, db.ForeignKey('programs.id'), primary_key=True)
)
# next create enum tables as classes
class Agency(db.Model):
__tablename__ = 'agencies'
id = db.Column(db.Integer, primary_key=True)
agency_name = db.Column(db.String, nullable=False)
# create each database table as a class
class Project(db.Model):
__tablename__ = 'projects'
# columns
id = db.Column(db.Integer, primary_key=True)
project_name = db.Column(db.String, unique=True, nullable=False)
# relationships
habitat_parcels = db.relationship('HabitatParcel', backref='project', lazy=True)
mitigation_parcels = db.relationship('MitigationParcel', backref='project', lazy=True)
programs = db.relationship('Program', secondary=projects_programs,
lazy = 'subquery',
backref=db.backref('programs', lazy=True))
def __repr__(self):
return f"Project('{self.project_name}')"
```
Script to populate database from spreadsheet
```python
from app import db
from apps.data import xlForecast
from apps.models import *
db.create_all()
F = 'data/Database of Expected Projects.xlsx'
# map xlForecast properties to SQLAlchemy classes
prop_to_class_map = {
'agencies': Agency,
'implementers': Implementer,
}
prop_to_table_map = {
'projects_programs': projects_programs
}
# load data
data = xlForecast(F)
# insert data to database
for prop, tbl_class in prop_to_class_map.items():
print(prop)
for row in getattr(data, prop).to_dict('records'):
db.session.add(tbl_class(**row))
db.session.commit()
for prop, tbl_name in prop_to_table_map.items():
print(prop)
for row in getattr(data, prop).to_dict('records'):
db.session.execute(tbl_name.insert().values(**row))
db.session.commit()
```
To read a view into Python
```python
from app import app, db
df = pd.read_sql_table('view_project_outcomes', con=db.engine)
```
There are two primary strategies when editing a table in a database.
1. Drop the table and add the new data. This supposes that all data are stored in the original data source.
2. Insert into a table. This is best when you want to add new data to a table.
### Create table
```sql
CREATE TABLE IF NOT EXISTS table_name (
id integer,
name varchar
);
```
### Read csv
```sql
INSERT INTO table_name (
id,
name
)
VALUES(?, ?)
```
### Create view
```sql
CREATE VIEW IF NOT EXISTS view_name AS
...
```
With statements allow you to create interim tables that simply your sql statement. Here's an example
```sql
REATE VIEW IF NOT EXISTS view_baseline AS
WITH weighted_baseline AS (
SELECT d.map_unit_id,
d.indirect_benefits_area,
d.mz3,
d.mz4,
d.mz5,
b.season,
SUM(CASE WHEN b.mgmt_zone = 'MZ III' THEN d.mz3 * b.baseline END) AS mz3_weighted,
SUM(CASE WHEN b.mgmt_zone = 'MZ IV' THEN d.mz4 * b.baseline END) AS mz4_weighted,
SUM(CASE WHEN b.mgmt_zone = 'MZ V' THEN d.mz5 * b.baseline END) AS mz5_weighted
FROM view_desktop_results AS d
LEFT JOIN
standard_baseline AS b
GROUP BY d.map_unit_id,
b.season
)
SELECT map_unit_id,
season,
(mz3_weighted + mz4_weighted + mz5_weighted) AS baseline
FROM weighted_baseline;
```
### Edit record
Be sure to not edit records that will be overwritten on the next import. Instead, edit the data source, drop the existing table, and read in the new one.
1. Open pgAdmin
2. Navigate to the table in the Browser tree
3. Right-click and select Edit All Rows
4. Make edit
5. Save (F6)
### Drop tables
Sometimes you'll want to drop many tables or all tables in a database and start fresh with the same schema.
```sql
DROP TABLE IF EXISTS table_name CASCADE;
DROP VIEW IF EXISTS view_name;
```
Postgres command line tool `psql`