[psql](https://www.postgresql.org/docs/current/app-psql.html) is a [[command line application]] for [[Postgres]] databases. `psql` is installed by default with Postgres (unless specifically excluded during the installation process). ## getting started Before getting started on your first database, you may want to create a project directory using the [[project organization for databases]]. 1. [[add psql to PATH]] 2. Open [[Bash]] from the project direcory (alternatively you may use SQL Shell (psql), installed with Postgres, however the instructions will differ). 3. Log in to the `postgres` account. ```bash psql -U postgres ``` 3. You will be prompted for the password. Note that when typing in the password you may not see any characters appear. Input the password you set when installing Postgres. ```bash >>> Password for user postgres: ``` 4. Create a new database. If successful, the prompt will print `CREATE DATABASE`. You can also use [[pgAdmin]] to view the database. ```sql CREATE DATABASE <databasename>; ``` 5. Connect to the new database ( `\c <databasename>` is also valid). ``` \connect <databasename> ``` See [[#Common workflows]] for details on how to use `psql` to add tables, insert data, create views, and read from and write to CSV. Quit the interactive SQL session. ```psql \q ``` ## Common workflows ### Connect to a database When coming back to a database project, login and specify the database at the same time. You will be prompted for the password each time. ``` bash psql -U postgres -d <databasename> ``` ### Create a table Use the CREATE TABLE command to create a table. ```sql CREATE TABLE persons ( id BIGSERIAL PRIMARY KEY NOT NULL, first_name VARCHAR(50), last_name VARCHAR(50) ); ``` ### Primary key Include the key word `PRIMARY KEY NOT NULL` when defining the column to specify it as a primary key. #### Sequence as primary key Use the data type BIGSERIAL or SERIAL when specifying integer type primary keys so that they will auto increment. #### Compound primary key You can also set a compound primary key (e.g., two or more fields together). Be careful to specify fields that will never contain duplicates! ```sql -- SPECIFY IN CREATE TABLE CREATE TABLE persons ( first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, PRIMARY KEY (first_name, last_name) ); -- OR USE ALTER TABLE ALTER TABLE persons PRIMARY KEY (first_name, last_name); ``` #### UUID primary key A universally unique identifier (UUID) is a unique identifier that can be used as a primary key. The benefit of a UUID, as opposed to a sequence, is that it is unique across all tables (and even other databases). It also prevents hackers from mining a database by guessing primary keys. To use in Postgres, you must first install the extension. ```sql CREATE EXTENSION IF NOT EXISTS "uuid-ossp" ``` Use `\df` to list the UUID generation algorithms available. You'll need to select one of these to generate a UUID when inserting a record. I recommend `uuid_generat_v4()`. To create a primary key using UUID simply list `UUID` as the data type when creating the table. Foreign key references to a `UUID` should also use the data type `UUID`. When inserting a record, call the UUID function to create a unique identifier. You do not need to check for duplicates since these are guaranteed to be universally unique. ```sql CREATE TABLE cars ( car_uid UUID NOT NULL PRIMARY KEY ); -- foreign key references should also be UUID CREATE TABLE persons ( person_uid UUID car_uid UUID references car(car_uid) ); -- use a function to generate a uuid for each record INSERT INTO persons (person_uid) VALUES (uuid_generat_v4()); ``` ### Insert a record Use the INSERT command to insert a record in an existing table. ```sql INSERT INTO persons ( first_name, last_name, ) VALUES ('John', 'Doe'); ``` Fields not specified, such as the `id` field in this example, will be populated with their default values. ### Update a record Use the primary key to access the record. You can update multiple fields for any record by comma separating each `field = value` pair. You can also update multiple records at the same time, depending on the `WHERE` query. Always use the `WHERE` clause! ```sql UPDATE <tablename> SET <field> = <value>, <field_2> = <value_2> WHERE id = <id>; ``` ### Delete a record As with updating a record, always use the `WHERE` clause when deleting records. ```sql DELETE FROM <tablename> WHERE id = <id>; ``` >[!warning] Worst Practice >Using the `CASCADE` command to overcome the foreign key constraint when deleting records is not best practice. `CASCADE` will delete any record in your database that includes that foreign key. Instead, you should manually delete or update related records to have full control over what is deleted. ### Read from CSV Often, your data will initially be stored in a CSV. Before reading from a CSV, you must [[#create a table]] with the corresponding rows. PostgreSQL provides a helper [COPY](https://www.postgresql.org/docs/15/sql-copy.html) command for reading from `.csv` files. The will append the data to whatever is in the table already. Specify the fields if your CSV does not contain every column in your database table; those fields not specified will receive their default value. Most commonly, specify fields when your database has an auto-incrementing `id` field and your CSV does not have a unique identifier. ```sql COPY <tablename> (<field_1>, <field_2>) FROM <full/path/to/file> DELIMITER ‘,’ CSV Header; ``` Alternatively, use a scripting language to create a `.sql` file that creates the table and inserts each record one at a time. The file should look like this: ```sql CREATE TABLE <tablename> ( <field_1> <field_1_type>, <field_2> <field_2_type> ); INSERT INTO <tablename> (<field_1>, <field_2>) VALUES (<field_1_val>, <field_2_val>); INSERT INTO <tablename> (<field_1>, <field_2>) VALUES (<field_1_val>, <field_2_val>); INSERT INTO <tablename> (<field_1>, <field_2>) VALUES (<field_1_val>, <field_2_val>); ... ``` Save the file and copy the full path to the file. Open a `psql` session and use `\i` which executes commands from a file. `\ir` allows you to use the relative path. ```sql \i <full/path/to/file> ``` ### Import CSV from web You can read a file directly from the web using [[curl]], avoiding the need to download the file locally. Use quotes around the URL. ```sql COPY <tablename> FROM PROGRAM 'curl -L "<url>"' HEADER CSV DELIMITER ','; ``` To test the download link, open [[Bash]] and use the same command `curl -L "<url>"`, replacing `<url>` with the URL to your data; you should see the contents printed in the terminal in comma-separated rows. > [!tip] > You can use [[Google Sheets]] to serve data. In fact, you can read any data that can be accessed programmatically from standard output of your terminal ([[STOUT]]). See this [blog post](https://www.depesz.com/2013/02/28/waiting-for-9-3-add-support-for-piping-copy-tofrom-an-external-program/) for more detail. ### Export to CSV To export to query, you can provide a table, view, or query to the COPY command. ```sql COPY (<tablename/view/query>) TO 'path/to/file.csv' DELIMTER ',' CSV HEADER; ``` ### Transactions It is best practice to wrap commands in a transaction. [Transactions](https://www.postgresql.org/docs/current/tutorial-transactions.html) will preserve database integrity by rolling back all commands executed if one fails. To create a transaction, simply begin and end the series of commands with `BEGIN` and `COMMIT`. ```sql BEGIN; -- SQL Commands COMMIT; ``` ### Constraints #### Unique constraint ```sql -- SPECIFY IN CREATE TABLE CREATE TABLE <tablenamd> ( col1 TEXT, col2 TEXT, CONSTRAINT <constraint_name> UNIQUE (col1) ); -- OR USE ALTER TABLE ALTER TABLE <tablename> ADD CONSTRAINT <contraint_name> UNIQUE (<col>); ``` #### Check constraint Use check for validation of database inputs. See also [[#enumerators]]. ```sql -- SPECIFY IN CREATE TABLE CREATE TABLE <tablename> ( gender TEXT CHECK (gender IN ('Female', 'Male')) ); -- OR USE ALTER TABLE ALTER TABLE <tablename> ADD CONSTRAINT <contraint_name> CHECK (gender IN ('Female', 'Male')); ``` #### On conflict do nothing Use the `ON CONFLICT` command combined with `DO NOTHING` to skip records that violate constraints. For example, inserting a record with an existing primary key violates the `NOT NULL` constraint and will fail. ```sql INSERT INTO <tablename> (first_name, last_name) VALUES ('John', 'Doe') ON CONFLICT (id) DO NOTHING; ``` This is useful when updating a table where some of the records in the data source are already in the table. ### Upsert Use `ON CONFLICT` command combined with `DO UPDATE` to update a record when a constraint is violated. This is commonly referred to as an "upsert". ```sql INSERT INTO <tablename> (first_name, last_name) VALUES ('John', 'Doe') ON CONFLICT (id) DO UPDATE SET first_name = EXCLUDED.first_name, last_name = EXCLUDED.last_name; ``` This is useful when a user is editing a record in your application or when you have edited many records in the data source. Some caveats - If using a partial index, you must constrain to only the indexed part of the table using a `WHERE` statement (see [here](https://stackoverflow.com/questions/42022362/no-unique-or-exclusion-constraint-matching-the-on-conflict)). ### Enumerators [Enumerated (enum) data types](https://www.postgresql.org/docs/current/datatype-enum.html) are lists of valid options for a field, for example the days of the week or responses on a Likert scale. To create an enumerator use the `CREATE TYPE` command. Once created, you can use the enumerator just like any other data type when creating a table. ```sql CREATE TYPE likert AS ENUM ('strongly agree', 'agree', 'neutral', 'disagree', 'strongly disagree'); CREATE TABLE survey_results ( question INT result likert ); ``` Enumerators are ordered sets, and so comparison operators are valid. For example, you could create an enumerator for the list of valid [[ordinal]] (i.e., ordered categorical) values and use a comparison operator to query records. ```sql SELECT * FROM survey_results WHERE result >= 'agree' ORDER BY likert; ``` Each enumerated type is separate. The same value in two enumerators are not equivalent. Enumerators are unlike numeric and text data types, so regular numeric string operators and function won't work. Alternatives to enumerators include using a [[#check]] constraint or creating a full table for the enumerated options and relating the field through a foreign key. For this second case, consider using the value itself as the primary key, as shown below. The downside of these options is that they are not ordered in the same way an enumerator is, and so comparison operators will not work. ```sql CREATE TABLE likert ( response TEXT PRIMARY KEY ); INSERT INTO likert (response) VALUES ('strongly agree', 'agree', 'neutral', 'disagree', 'strongly disagree'); CREATE TABLE survey_results ( question INT result TEXT REFERENCES likert (response) ON UPDATE CASCADE ); ``` ### Relationships If your [[#primary key]] uses BIGSERIAL, use BIGINT as the type for the foreign key (field that defines the relationship). If using SERIAL, use INT as the type for the field with the relationship. There are two patterns for specifying relationships. First, you can write the reference inline when specifying the table. Alternatively, you can use an ALTER TABLE command to add the relationship. When using the first strategy, the table to which you are making the relationship must already exist (or be specified first in the `.sql` file). ### Queries Use `\x` to toggle expand display and make results easier to read. ### Joins ```sql SELECT * FROM <table1> JOIN <table2> ON <table1.field> = <table2.field>; ``` ### Working with dates ```sql CREATE TABLE dates ( date DATE, time TIMESTAMP ); INSERT INTO dates ( date, time ) VALUES (DATE '2022-11-02', 1999-01-08 04:05:06 ); ``` ### Time intervals ```sql NOW() + INTERVAL '1 YEAR'; ``` See the [docs](https://www.postgresql.org/docs/current/datatype-datetime.html) for more detail. ### Age ```sql AGE(<birthday>) ``` ### Extract Use `EXTRACT` to get a day, week, month, year, decade or century from a date. ```sql EXTRACT(DAY FROM NOW()); ``` ### Casting Casting converts a value to another format, often used for datetimes. Use `::` (double colon) to cast. ```sql NOW()::DATE; ``` In PostgreSQL, the `date` function is optional as its well ## Common psql commands **psql command** | **result** -------------|--------- `\?` | get help `\q` | quit `\l` | list databases `\d` | list tables `\d <table>` | list fields in table `\i` | execute commands from a file `\ir` | execute commands from file, relative path `\x` | toggle expand display `\copy`, `COPY` | copy from/to CSV `\! clear` | clear prompt `DROP TABLE <table>` | drop a table ## Extensions ```sql SELECT * FROM pg_available_extensions; ``` uuid-ossp: Universally unique identifiers h3: h3 bindings PostGIS: Geospatial stuff man [plv8](https://pgxn.org/dist/plv8/doc/plv8.html): javascript in SQL [[SQL]] ## Data Types [Data types](https://www.postgresql.org/docs/current/datatype.html#DATATYPE-TABLE) Name | Description --------| -------- `integer` | typical choice for integer `real` | inexact floating point up to 6 decimal places (see also `double precision` and `numeric`) `text` | variable length character string (see also `varchar(n)`) `boolean` | true/false `date` | date `timestamp` | date and time (see options for time zones) `bigserial` | auto-incrementing integer up to 9,223,372,036,854,775,807 `uuid` | universally unique identifier `money` | currency amount `point` | geometric point `polygon` | geometric polygon `jsonb` | textual json data (see also `json`) `varchar(n)` | string not to exceed length n `serial` | auto-incrementing integer up to 2,147,483,647 `double precision` | inexact floating point up to 15 decimal places `numeric` | high precision floats (slow calculations) `time` | time only `time interval` | interval of time ### Notes Data types `json` and `jsonb` store JSON data. Data type `text` could also be used. `jsonb` is preferred because it speeds up processing (although it slows down input). Read the [docs](https://www.postgresql.org/docs/current/datatype-json.html) before using this data type to understand it's limitations. ### PostgreSQL JSON data type JSON and JSONB both store JSON data. JSONB is generally preferred because, while it is slower to load, it is faster to query. The "B" in JSONB stands for binary and is sometimes referred to as "BSON". JSONB will remove trailing white spaces and any duplicate keys. This shouldn't be a problem for well-constructed JSON, but it's important to know. To insert JSON data, wrap the dictionary in single quotes and use double quotes for any string-like keys or values. ```sql CREATE TABLE my_json( id SERIAL PRIMARY KEY, data JSONB ); INSERT INTO my_json (data) VALUES ('{"name": "Coffee Value Chains", "year": 2022, "results": {"indicator1": 100, "indicator2": 150}, "contacts":["Sergio", "Monica"]}'}); ``` #### Basic queries Use the `->` operator to query within a JSON object. ```sql -- Select a property SELECT data -> 'name' as name FROM my_json; -- Select a property based on a condition SELECT data -> 'results' FROM my_json WHERE data -> 'name' = '"Coffee Value Chains"'; ``` You can chain multiple dash-arrow ( `->` ) operators to query within nested JSON. Alternatively, use the hash-arrow ( `#>` ) operator to query by passing a path of properties (wrapped in curly braces `{}` ). Queries for JSON and JSONB are the same. ```sql -- Select nested JSON object field by property SELECT data -> 'results' -> 'indicator1' as indicator1 FROM my_json; -- Alternative to select nested JSON object field by property SELECT data #> '{results, indicator1}' as indicator1 FROM my_json; ``` The previous queries all return JSON objects so that you can continue to query them. The dash-double-arrow ( `->>` ) operator will return text from the query instead of JSON. ```sql -- Select a JSON object field as text SELECT data->>'name' as name FROM users; -- you cannot chain after returning text ``` If your JSON contains arrays, you can index into the array using the query techniques described above combined with the element's index to get a specific element from an array. ```sql -- Select a JSON array element SELECT data -> 'contacts'-> 0 as first_contact FROM my_json; -- Alternative to select a JSON array element SELECT data #> '{contacts, 0}' FROM my_json; ``` You can also unpack an array using the `jsonb_array_elements_text()` function. ```sql SELECT id, jsonb_array_elements_text(data -> 'contacts') FROM my_json; ``` #### Contains queries Use the contains operator ( `@>` ) to return records that include a specific property or property value. ```sql -- Select all records that contain a specific property SELECT * FROM my_json WHERE data @> '{"results": "indicator1"}'; -- Select all records that contain a specific property value SELECT * FROM my_json WHERE data @> '{"results": "indicator1: 100"}'; -- Select all records that contain a specific value(s) in an array SELECT * FROM my_json WHERE data -> 'contacts' @> '["Monica"]'; ``` Use the property-test ( `?` ) operator to query records based on the presence of one or more top-level properties. ```sql -- Select all records that contain a specific key SELECT * FROM my_json WHERE data ? 'results'; -- Select all records that contain any of these keys SELECT * FROM my_json WHERE data ?| ['results', 'contacts']; -- Select all records that contain all of these keys SELECT * FROM my_json WHERE data ?& ['results', 'contacts']; ``` #### Updating JSON objects #rough Use the `jsonb_set()` function to update JSONB objects. ```sql UPDATE my_json SET data = jsonb_set(data, '{year}', 2023); ``` If any records do not have the property you want to update, you must first create it. To avoid overwriting those records that do have the property, use the `COALESCE` command with an empty object ( `'{}'` ). ```sql UPDATE my_json SET data = jsonb_set(data, '{"targets"}', COALESCE(data->'targets', '{}')); SELECT jsonb_set(data, '{targets, indictaor1}', (SELECT data->'indicator1' FROM my_json j1 WHERE j1.id=j2.id)) FROM my_json j2; ``` #### Indexing JSON objects JSONB supports GIN indexes by means of `json_ops` and `jsonb_path_ops`. `json_ops` creates an index item for every property and value in the JSON whereas `json_path_ops` is more compact and faster to process but supports fewer operations. > [!tip]- Additional Resources > - [DevHints.io/postgresql-json](https://devhints.io/postgresql-json) > - [PostgreSQL JSON Functions & Operators](https://www.postgresql.org/docs/current/functions-json.html) > - [PostgreSQL Tutorial JSON](https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-json/) > [!tip]- Additional Resources > - [Amigoscode YT series on PostgreSQL](https://www.youtube.com/playlist?list=PLwvrYc43l1MxAEOI_KwGe8l42uJxMoKeS) > - [PostgresSQL Tutorials](https://www.postgresqltutorial.com/) ## Working with Latin characters Use either UTF8 or LATIN9 as the database encoding when working with Latin languages (e.g., Spanish, Italian). Google Sheets uses UTF8, so go with that unless you specifically need LATIN9. ``` \encoding UTF8 ``` White list text characters before importing if you continue to have encoding errors. In some rare cases, a string might contain a byte sequence that cannot be encoded with your chosen coding scheme and the character does not appear when you review the data in the data source. ```javascript array_of_strings.map(ele => String(ele).replace(/[^\p{Letter}\p{Mark}\p{Number}.,<>"'\[\]\|+@\/#!¡$%\^&\*;:{}=\-_`~()?¿\s]/gui, '')) ``` `\p{Letter}` is a character class representing all letters and `\p{Mark}` represents the accents and other special characters (this is best practice as opposed to only selecting ASCII characters `[a-zA-Z]`). `p{Number}` captures all numeric values. You can try `p{Punctuation}` for all punctuation, but it includes some punctuation that is non-standard (e.g., left and right single and double quotes). I've listed all punctuation on my keyboard instead, including upside down exclamation and question marks common in Spanish, and escaping characters that have special meaning in regex ( `\[\]\|\^\*\-*\/` ). `g` replaces all instances, rather than just the first; `u` allows for use of the Unicode character classes and `i` makes the regex case-insensitive. See https://www.regular-expressions.info/tutorial.html