[psql](https://www.postgresql.org/docs/current/app-psql.html) is a [[command line application]] for [[Postgres]] databases. `psql` is installed by default with Postgres (unless specifically excluded during the installation process).
## getting started
Before getting started on your first database, you may want to create a project directory using the [[project organization for databases]].
1. [[add psql to PATH]]
2. Open [[Bash]] from the project direcory (alternatively you may use SQL Shell (psql), installed with Postgres, however the instructions will differ).
3. Log in to the `postgres` account.
```bash
psql -U postgres
```
3. You will be prompted for the password. Note that when typing in the password you may not see any characters appear. Input the password you set when installing Postgres.
```bash
>>> Password for user postgres:
```
4. Create a new database. If successful, the prompt will print `CREATE DATABASE`. You can also use [[pgAdmin]] to view the database.
```sql
CREATE DATABASE <databasename>;
```
5. Connect to the new database ( `\c <databasename>` is also valid).
```
\connect <databasename>
```
See [[#Common workflows]] for details on how to use `psql` to add tables, insert data, create views, and read from and write to CSV.
Quit the interactive SQL session.
```psql
\q
```
## Common workflows
### Connect to a database
When coming back to a database project, login and specify the database at the same time. You will be prompted for the password each time.
``` bash
psql -U postgres -d <databasename>
```
### Create a table
Use the CREATE TABLE command to create a table.
```sql
CREATE TABLE persons (
id BIGSERIAL PRIMARY KEY NOT NULL,
first_name VARCHAR(50),
last_name VARCHAR(50)
);
```
### Primary key
Include the key word `PRIMARY KEY NOT NULL` when defining the column to specify it as a primary key.
#### Sequence as primary key
Use the data type BIGSERIAL or SERIAL when specifying integer type primary keys so that they will auto increment.
#### Compound primary key
You can also set a compound primary key (e.g., two or more fields together). Be careful to specify fields that will never contain duplicates!
```sql
-- SPECIFY IN CREATE TABLE
CREATE TABLE persons (
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
PRIMARY KEY (first_name, last_name)
);
-- OR USE ALTER TABLE
ALTER TABLE persons PRIMARY KEY (first_name, last_name);
```
#### UUID primary key
A universally unique identifier (UUID) is a unique identifier that can be used as a primary key. The benefit of a UUID, as opposed to a sequence, is that it is unique across all tables (and even other databases). It also prevents hackers from mining a database by guessing primary keys.
To use in Postgres, you must first install the extension.
```sql
CREATE EXTENSION IF NOT EXISTS "uuid-ossp"
```
Use `\df` to list the UUID generation algorithms available. You'll need to select one of these to generate a UUID when inserting a record. I recommend `uuid_generat_v4()`.
To create a primary key using UUID simply list `UUID` as the data type when creating the table. Foreign key references to a `UUID` should also use the data type `UUID`. When inserting a record, call the UUID function to create a unique identifier. You do not need to check for duplicates since these are guaranteed to be universally unique.
```sql
CREATE TABLE cars (
car_uid UUID NOT NULL PRIMARY KEY
);
-- foreign key references should also be UUID
CREATE TABLE persons (
person_uid UUID
car_uid UUID references car(car_uid)
);
-- use a function to generate a uuid for each record
INSERT INTO persons (person_uid)
VALUES (uuid_generat_v4());
```
### Insert a record
Use the INSERT command to insert a record in an existing table.
```sql
INSERT INTO persons (
first_name,
last_name,
) VALUES ('John', 'Doe');
```
Fields not specified, such as the `id` field in this example, will be populated with their default values.
### Update a record
Use the primary key to access the record. You can update multiple fields for any record by comma separating each `field = value` pair. You can also update multiple records at the same time, depending on the `WHERE` query. Always use the `WHERE` clause!
```sql
UPDATE <tablename> SET <field> = <value>, <field_2> = <value_2> WHERE id = <id>;
```
### Delete a record
As with updating a record, always use the `WHERE` clause when deleting records.
```sql
DELETE FROM <tablename> WHERE id = <id>;
```
>[!warning] Worst Practice
>Using the `CASCADE` command to overcome the foreign key constraint when deleting records is not best practice. `CASCADE` will delete any record in your database that includes that foreign key. Instead, you should manually delete or update related records to have full control over what is deleted.
### Read from CSV
Often, your data will initially be stored in a CSV. Before reading from a CSV, you must [[#create a table]] with the corresponding rows. PostgreSQL provides a helper [COPY](https://www.postgresql.org/docs/15/sql-copy.html) command for reading from `.csv` files. The will append the data to whatever is in the table already.
Specify the fields if your CSV does not contain every column in your database table; those fields not specified will receive their default value. Most commonly, specify fields when your database has an auto-incrementing `id` field and your CSV does not have a unique identifier.
```sql
COPY <tablename>
(<field_1>, <field_2>)
FROM <full/path/to/file>
DELIMITER ‘,’
CSV Header;
```
Alternatively, use a scripting language to create a `.sql` file that creates the table and inserts each record one at a time. The file should look like this:
```sql
CREATE TABLE <tablename> (
<field_1> <field_1_type>,
<field_2> <field_2_type>
);
INSERT INTO <tablename> (<field_1>, <field_2>) VALUES (<field_1_val>, <field_2_val>);
INSERT INTO <tablename> (<field_1>, <field_2>) VALUES (<field_1_val>, <field_2_val>);
INSERT INTO <tablename> (<field_1>, <field_2>) VALUES (<field_1_val>, <field_2_val>);
...
```
Save the file and copy the full path to the file. Open a `psql` session and use `\i` which executes commands from a file. `\ir` allows you to use the relative path.
```sql
\i <full/path/to/file>
```
### Import CSV from web
You can read a file directly from the web using [[curl]], avoiding the need to download the file locally. Use quotes around the URL.
```sql
COPY <tablename>
FROM PROGRAM 'curl -L "<url>"'
HEADER CSV DELIMITER ',';
```
To test the download link, open [[Bash]] and use the same command `curl -L "<url>"`, replacing `<url>` with the URL to your data; you should see the contents printed in the terminal in comma-separated rows.
> [!tip]
> You can use [[Google Sheets]] to serve data.
In fact, you can read any data that can be accessed programmatically from standard output of your terminal ([[STOUT]]). See this [blog post](https://www.depesz.com/2013/02/28/waiting-for-9-3-add-support-for-piping-copy-tofrom-an-external-program/) for more detail.
### Export to CSV
To export to query, you can provide a table, view, or query to the COPY command.
```sql
COPY (<tablename/view/query>)
TO 'path/to/file.csv'
DELIMTER ','
CSV HEADER;
```
### Transactions
It is best practice to wrap commands in a transaction. [Transactions](https://www.postgresql.org/docs/current/tutorial-transactions.html) will preserve database integrity by rolling back all commands executed if one fails. To create a transaction, simply begin and end the series of commands with `BEGIN` and `COMMIT`.
```sql
BEGIN;
-- SQL Commands
COMMIT;
```
### Constraints
#### Unique constraint
```sql
-- SPECIFY IN CREATE TABLE
CREATE TABLE <tablenamd> (
col1 TEXT,
col2 TEXT,
CONSTRAINT <constraint_name> UNIQUE (col1)
);
-- OR USE ALTER TABLE
ALTER TABLE <tablename> ADD CONSTRAINT <contraint_name> UNIQUE (<col>);
```
#### Check constraint
Use check for validation of database inputs. See also [[#enumerators]].
```sql
-- SPECIFY IN CREATE TABLE
CREATE TABLE <tablename> (
gender TEXT CHECK (gender IN ('Female', 'Male'))
);
-- OR USE ALTER TABLE
ALTER TABLE <tablename> ADD CONSTRAINT <contraint_name> CHECK (gender IN ('Female', 'Male'));
```
#### On conflict do nothing
Use the `ON CONFLICT` command combined with `DO NOTHING` to skip records that violate constraints. For example, inserting a record with an existing primary key violates the `NOT NULL` constraint and will fail.
```sql
INSERT INTO <tablename> (first_name, last_name)
VALUES ('John', 'Doe')
ON CONFLICT (id) DO NOTHING;
```
This is useful when updating a table where some of the records in the data source are already in the table.
### Upsert
Use `ON CONFLICT` command combined with `DO UPDATE` to update a record when a constraint is violated. This is commonly referred to as an "upsert".
```sql
INSERT INTO <tablename> (first_name, last_name)
VALUES ('John', 'Doe')
ON CONFLICT (id) DO UPDATE SET first_name = EXCLUDED.first_name, last_name = EXCLUDED.last_name;
```
This is useful when a user is editing a record in your application or when you have edited many records in the data source.
Some caveats
- If using a partial index, you must constrain to only the indexed part of the table using a `WHERE` statement (see [here](https://stackoverflow.com/questions/42022362/no-unique-or-exclusion-constraint-matching-the-on-conflict)).
### Enumerators
[Enumerated (enum) data types](https://www.postgresql.org/docs/current/datatype-enum.html) are lists of valid options for a field, for example the days of the week or responses on a Likert scale.
To create an enumerator use the `CREATE TYPE` command. Once created, you can use the enumerator just like any other data type when creating a table.
```sql
CREATE TYPE likert AS ENUM ('strongly agree', 'agree', 'neutral', 'disagree', 'strongly disagree');
CREATE TABLE survey_results (
question INT
result likert
);
```
Enumerators are ordered sets, and so comparison operators are valid. For example, you could create an enumerator for the list of valid [[ordinal]] (i.e., ordered categorical) values and use a comparison operator to query records.
```sql
SELECT * FROM survey_results WHERE result >= 'agree' ORDER BY likert;
```
Each enumerated type is separate. The same value in two enumerators are not equivalent. Enumerators are unlike numeric and text data types, so regular numeric string operators and function won't work.
Alternatives to enumerators include using a [[#check]] constraint or creating a full table for the enumerated options and relating the field through a foreign key. For this second case, consider using the value itself as the primary key, as shown below. The downside of these options is that they are not ordered in the same way an enumerator is, and so comparison operators will not work.
```sql
CREATE TABLE likert (
response TEXT PRIMARY KEY
);
INSERT INTO likert (response)
VALUES ('strongly agree', 'agree', 'neutral', 'disagree', 'strongly disagree');
CREATE TABLE survey_results (
question INT
result TEXT REFERENCES likert (response) ON UPDATE CASCADE
);
```
### Relationships
If your [[#primary key]] uses BIGSERIAL, use BIGINT as the type for the foreign key (field that defines the relationship). If using SERIAL, use INT as the type for the field with the relationship.
There are two patterns for specifying relationships. First, you can write the reference inline when specifying the table. Alternatively, you can use an ALTER TABLE command to add the relationship. When using the first strategy, the table to which you are making the relationship must already exist (or be specified first in the `.sql` file).
### Queries
Use `\x` to toggle expand display and make results easier to read.
### Joins
```sql
SELECT * FROM <table1>
JOIN <table2> ON <table1.field> = <table2.field>;
```
### Working with dates
```sql
CREATE TABLE dates (
date DATE,
time TIMESTAMP
);
INSERT INTO dates (
date,
time
) VALUES (DATE '2022-11-02', 1999-01-08 04:05:06 );
```
### Time intervals
```sql
NOW() + INTERVAL '1 YEAR';
```
See the [docs](https://www.postgresql.org/docs/current/datatype-datetime.html) for more detail.
### Age
```sql
AGE(<birthday>)
```
### Extract
Use `EXTRACT` to get a day, week, month, year, decade or century from a date.
```sql
EXTRACT(DAY FROM NOW());
```
### Casting
Casting converts a value to another format, often used for datetimes. Use `::` (double colon) to cast.
```sql
NOW()::DATE;
```
In PostgreSQL, the `date` function is optional as its well
## Common psql commands
**psql command** | **result**
-------------|---------
`\?` | get help
`\q` | quit
`\l` | list databases
`\d` | list tables
`\d <table>` | list fields in table
`\i` | execute commands from a file
`\ir` | execute commands from file, relative path
`\x` | toggle expand display
`\copy`, `COPY` | copy from/to CSV
`\! clear` | clear prompt
`DROP TABLE <table>` | drop a table
## Extensions
```sql
SELECT * FROM pg_available_extensions;
```
uuid-ossp: Universally unique identifiers
h3: h3 bindings
PostGIS: Geospatial stuff man
[plv8](https://pgxn.org/dist/plv8/doc/plv8.html): javascript in SQL
[[SQL]]
## Data Types
[Data types](https://www.postgresql.org/docs/current/datatype.html#DATATYPE-TABLE)
Name | Description
--------| --------
`integer` | typical choice for integer
`real` | inexact floating point up to 6 decimal places (see also `double precision` and `numeric`)
`text` | variable length character string (see also `varchar(n)`)
`boolean` | true/false
`date` | date
`timestamp` | date and time (see options for time zones)
`bigserial` | auto-incrementing integer up to 9,223,372,036,854,775,807
`uuid` | universally unique identifier
`money` | currency amount
`point` | geometric point
`polygon` | geometric polygon
`jsonb` | textual json data (see also `json`)
`varchar(n)` | string not to exceed length n
`serial` | auto-incrementing integer up to 2,147,483,647
`double precision` | inexact floating point up to 15 decimal places
`numeric` | high precision floats (slow calculations)
`time` | time only
`time interval` | interval of time
### Notes
Data types `json` and `jsonb` store JSON data. Data type `text` could also be used. `jsonb` is preferred because it speeds up processing (although it slows down input). Read the [docs](https://www.postgresql.org/docs/current/datatype-json.html) before using this data type to understand it's limitations.
### PostgreSQL JSON data type
JSON and JSONB both store JSON data. JSONB is generally preferred because, while it is slower to load, it is faster to query. The "B" in JSONB stands for binary and is sometimes referred to as "BSON". JSONB will remove trailing white spaces and any duplicate keys. This shouldn't be a problem for well-constructed JSON, but it's important to know.
To insert JSON data, wrap the dictionary in single quotes and use double quotes for any string-like keys or values.
```sql
CREATE TABLE my_json(
id SERIAL PRIMARY KEY,
data JSONB
);
INSERT INTO my_json (data)
VALUES ('{"name": "Coffee Value Chains", "year": 2022, "results": {"indicator1": 100, "indicator2": 150}, "contacts":["Sergio", "Monica"]}'});
```
#### Basic queries
Use the `->` operator to query within a JSON object.
```sql
-- Select a property
SELECT data -> 'name' as name FROM my_json;
-- Select a property based on a condition
SELECT data -> 'results' FROM my_json WHERE data -> 'name' = '"Coffee Value Chains"';
```
You can chain multiple dash-arrow ( `->` ) operators to query within nested JSON. Alternatively, use the hash-arrow ( `#>` ) operator to query by passing a path of properties (wrapped in curly braces `{}` ). Queries for JSON and JSONB are the same.
```sql
-- Select nested JSON object field by property
SELECT data -> 'results' -> 'indicator1' as indicator1 FROM my_json;
-- Alternative to select nested JSON object field by property
SELECT data #> '{results, indicator1}' as indicator1 FROM my_json;
```
The previous queries all return JSON objects so that you can continue to query them. The dash-double-arrow ( `->>` ) operator will return text from the query instead of JSON.
```sql
-- Select a JSON object field as text
SELECT data->>'name' as name FROM users; -- you cannot chain after returning text
```
If your JSON contains arrays, you can index into the array using the query techniques described above combined with the element's index to get a specific element from an array.
```sql
-- Select a JSON array element
SELECT data -> 'contacts'-> 0 as first_contact FROM my_json;
-- Alternative to select a JSON array element
SELECT data #> '{contacts, 0}' FROM my_json;
```
You can also unpack an array using the `jsonb_array_elements_text()` function.
```sql
SELECT id, jsonb_array_elements_text(data -> 'contacts') FROM my_json;
```
#### Contains queries
Use the contains operator ( `@>` ) to return records that include a specific property or property value.
```sql
-- Select all records that contain a specific property
SELECT * FROM my_json WHERE data @> '{"results": "indicator1"}';
-- Select all records that contain a specific property value
SELECT * FROM my_json WHERE data @> '{"results": "indicator1: 100"}';
-- Select all records that contain a specific value(s) in an array
SELECT * FROM my_json WHERE data -> 'contacts' @> '["Monica"]';
```
Use the property-test ( `?` ) operator to query records based on the presence of one or more top-level properties.
```sql
-- Select all records that contain a specific key
SELECT * FROM my_json WHERE data ? 'results';
-- Select all records that contain any of these keys
SELECT * FROM my_json WHERE data ?| ['results', 'contacts'];
-- Select all records that contain all of these keys
SELECT * FROM my_json WHERE data ?& ['results', 'contacts'];
```
#### Updating JSON objects
#rough
Use the `jsonb_set()` function to update JSONB objects.
```sql
UPDATE my_json SET data = jsonb_set(data, '{year}', 2023);
```
If any records do not have the property you want to update, you must first create it. To avoid overwriting those records that do have the property, use the `COALESCE` command with an empty object ( `'{}'` ).
```sql
UPDATE my_json SET data = jsonb_set(data, '{"targets"}', COALESCE(data->'targets', '{}'));
SELECT jsonb_set(data, '{targets, indictaor1}', (SELECT data->'indicator1' FROM my_json j1 WHERE j1.id=j2.id)) FROM my_json j2;
```
#### Indexing JSON objects
JSONB supports GIN indexes by means of `json_ops` and `jsonb_path_ops`. `json_ops` creates an index item for every property and value in the JSON whereas `json_path_ops` is more compact and faster to process but supports fewer operations.
> [!tip]- Additional Resources
> - [DevHints.io/postgresql-json](https://devhints.io/postgresql-json)
> - [PostgreSQL JSON Functions & Operators](https://www.postgresql.org/docs/current/functions-json.html)
> - [PostgreSQL Tutorial JSON](https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-json/)
> [!tip]- Additional Resources
> - [Amigoscode YT series on PostgreSQL](https://www.youtube.com/playlist?list=PLwvrYc43l1MxAEOI_KwGe8l42uJxMoKeS)
> - [PostgresSQL Tutorials](https://www.postgresqltutorial.com/)
## Working with Latin characters
Use either UTF8 or LATIN9 as the database encoding when working with Latin languages (e.g., Spanish, Italian). Google Sheets uses UTF8, so go with that unless you specifically need LATIN9.
```
\encoding UTF8
```
White list text characters before importing if you continue to have encoding errors. In some rare cases, a string might contain a byte sequence that cannot be encoded with your chosen coding scheme and the character does not appear when you review the data in the data source.
```javascript
array_of_strings.map(ele => String(ele).replace(/[^\p{Letter}\p{Mark}\p{Number}.,<>"'\[\]\|+@\/#!¡$%\^&\*;:{}=\-_`~()?¿\s]/gui, ''))
```
`\p{Letter}` is a character class representing all letters and `\p{Mark}` represents the accents and other special characters (this is best practice as opposed to only selecting ASCII characters `[a-zA-Z]`).
`p{Number}` captures all numeric values.
You can try `p{Punctuation}` for all punctuation, but it includes some punctuation that is non-standard (e.g., left and right single and double quotes). I've listed all punctuation on my keyboard instead, including upside down exclamation and question marks common in Spanish, and escaping characters that have special meaning in regex ( `\[\]\|\^\*\-*\/` ).
`g` replaces all instances, rather than just the first; `u` allows for use of the Unicode character classes and `i` makes the regex case-insensitive.
See https://www.regular-expressions.info/tutorial.html