Image of a plain as a blank canvas for our custom software development

SQL Tutorial

Arteco - Information Technologies

  • :)
  • :0
  • :D
  • ;)
  • :]
foto Ramón Arnau

Ramón Arnau

Gerente de Arteco Consulting SL

Structured Query Language is the most widely used language for securely managing data through its referential integrity. Find out how!

The vast majority of applications delegate the responsibility of safeguarding information in a durable and consistent manner over time to SQL database systems. The following lines introduce the basic concepts of SQL in a simple manner.

This article describes how to perform the most basic operations on two of the most widely used database management systems: MySQL and PostgreSQL. If you don't have any database management system installed locally, try adding one to your PC using simple steps with Docker:

What is SQL

Applications need to be accompanied by a storage system that is durable and consistent over time. The program delegates to the persistence system the responsibility of safeguarding the data regardless of how many times the application is restarted. They can even serve more than one application simultaneously, allowing load balancing of the applications on the same set of data. Information management systems, commonly called databases, have evolved greatly over time, becoming complex systems that offer many more functionalities than mere data hosting.

The most widely used databases today are relational databases, based on mathematical set theory for data storage. They also take into account the relationship between different sets to establish strong links between the contained data. Among relational systems, the most used are those that allow interaction with them through the SQL language, which stands for Structured Query Language. SQL is a language that allows the manipulation of data sets and the extraction of information stored in the database system with a high level of control.

Organization of Data

SQL systems define that each set of data or collection must be housed in its own table, which indicates the type of data it stores. And each piece of information will be a row in the table. For example, the person table will be a collection of individual person records in the form of rows, similar to how a list of people would be represented in a spreadsheet. Therefore, each record will consist of one or more pieces of person data such as name, surname, ID number, etc. Obviously, each person will have their own data, so each row should have the necessary slots to store the name and surname of each person, just like the columns in a spreadsheet.

In summary, tables are composed of columns, and as in Java where an attribute of a class can have a type, the same will happen in SQL systems. Columns can have a SQL type to indicate that such a column should contain dates, numbers, texts, or one of the other accepted types. Some of these can be marked as mandatory, others can be defined with a default value, and others can simply be empty, a data representation that will be represented with NULL. How each column should behave is specified at the time of table creation. Continuing with the example of the collection of people and using standard SQL, the creation of the table would be:

create table PERSONA (
    nombre char(50),
    apellidos char(255),
    nif char(9),
    fechaNacimiento date
);

The above code for table creation requires a prior connection to an existing database and may vary slightly depending on the database to be used. Each database has its own column types, some of which are highly optimized for storing data in a specific way, especially when complex operations need to be performed on them or when storing large volumes of information. Official documentation of the database management system of the same version being used should be consulted. In any case, the table creation code will work on most systems and basically creates a table with four columns, three of them for storing text with char(<length>), plus a date-type column with date.

Note: Traditionally, MySQL databases used default tables that do not accept referential integrity, justified by a performance improvement called MyIsam. Ensure that the tables being created use the InnoDB storage system replacement. It can be enforced during table creation with the engine parameter if necessary. Fortunately, MySQL versions have the InnoDB table system activated by default.

create table PERSONA(
    -- omitidos los campos
    ...
    )
-- activado de las restricciones en MySQL
ENGINE=InnoDB;

Fixed-width columns, created with char, reserve the specified number of characters for each row, regardless of whether they are fully utilized or not. However, if the system only stored foreign individuals who have only one last name, much space would be wasted. Therefore, it is common to use a variable-width text column to which a maximum must also be indicated, such as varchar(50). It should be noted that due to technical inheritance, most systems have a maximum length of char of 255 characters and 65,535 for varchar. If the length exceeds this limit, another data type like longtext or clob is typically used.

MySQL PostgreSQL Comentario
int int 4 bytes de almacenamiento numérico
bit boolean 1 byte para almacenar true o false.
decimal(M,D) decimal(M,D) Número decimal de precisión fija.
float real Número decimal de precisión aprox. 4 bytes.
date date 4 bytes. Sólo fecha, sin hora.
datetime timestamp 8 bytes. Fecha más hora.
char(d) char(d) Texto de longitud fija.
varchar(d) varchar(d) Texto de longitud variable.
longtext text Texto con la longitud que se requiera.
longblob bytea Archivos binarios de la longitud necesaria.

The table summarizes the main data types that can be used in MySQL and PostgreSQL, but it is always advisable to have the official documentation of the system being used at hand. These are just some of the most commonly used types; there are many more. More information about the types for each system can be found at:

Returning to the creation of the PERSON table, it is possible to mark some fields as mandatory and others with a default value. By way of example, the creation of the table can be rewritten to indicate mandatory fields. Obviously, two tables with the same name cannot exist, so the previous table must be deleted beforehand with drop table <table_name>:

-- Borrado de la tabla anterior
drop table PERSONA;

-- MySQL default value para Date
create table PERSONA (
    nombre char(50) not null,
    apellidos char(255),
    nif char(9),
    fechaNacimiento date default CURRENT_TIMESTAMP
);

-- PostgreSQL default value para Date
create table PERSONA (
    nombre char(50) not null,
    apellidos char(255),
    nif char(9),
    fechaNacimiento date default NOW()
);

Only one of the create table statements shown should be executed, the one suitable for the database being used. Both CURRENT_TIMESTAMP and NOW are built-in functions in the corresponding database management system that return the current system date. Each system provides a collection of functions that can be used, for example, to convert a text string into a date.

-- MySQL String to Date
STR_TO_DATE('21-5-2013','%d-%m-%Y');

-- PostgreSQL String to Date
to_date('21-5-2013','%d-%m-%Y');

The functions accept two parameters: the text string containing the date and the date format to use. It will be necessary to invoke the delimited text conversion functions with ‘ to convert to objects and thus be able to insert records into the tables from the SQL console.

It will not always be necessary to delete the table and recreate it. In fact, it will not be desirable as it causes the loss of any existing data. Database management systems allow modification of columns in existing tables with alter table <table_name>. Through modification, it will be possible to add, delete, modify, or rename the definition of any column. Changing a column from one type to another may not always be possible. If the table contains data and the database management system does not know how to perform the type conversion, the operation will not be successful.

-- Borrado de columna nombre
alter table PERSONA drop column nombre;

-- Crear la columna no nula de tipo varchar
alter table PERSONA add column nombre varchar(50) not null;

-- Modificar el nombre del campo fecha
alter table PERSONA rename column fechaNacimiento to fec_nac;

-- Pasar nif a varchar
alter table PERSONA modify nif varchar(9);

Modifying an existing table allows for incremental changes to be applied to a database. It would not be very practical to transfer data from an obsolete table schema to a new one for every change. Therefore, it is important to analyze program enhancements to try to reflect modifications to new features as incremental changes to the data model.

Similar to what can happen during a type change for a column, where the statement may fail if the change cannot be completed due to the inability to automatically convert the data contained in the table, adding a non-null constraint may fail if there are already records with null values stored. In that case, all records must first be updated to give them a non-null value before including the constraint. This situation will occur when including any of the conditions explained below.

Data Constraints

Data constraints must be differentiated into two sets: those that affect only the table containing the data and those constraints involving at least two tables.

Table Constraints

Table-specific constraints allow for controlling the behavior of any of the columns it contains. These constraints will result in an error in data handling operations if an attempt is made to insert or store any data that does not comply with all constraints.

Non-Null Fields

This constraint was briefly discussed during the creation of the PERSONA table and during the modification of a column using alter table. Both are a summarized way of creating a non-null constraint on the column.

-- Poner constraint en MySQL
alter table PERSONA modify nombre varchar(50) NOT NULL;

-- Eliminar constraint en MySQL
alter table PERSONA modify nombre varchar(50);

-- Poner constraint en PostgreSQL
alter table PERSONA alter column nombre set not null;

-- Eliminar constraint en PostgreSQL
alter table PERSONA alter column nombre drop not null;

As can be seen, each database management system has its own particular way of manipulating the database. Nevertheless, ANSI SQL is a standard that attempts to standardize SQL statements to provide a common interface for programs and developers, although not very successfully, as ultimately the programmer will need to refer to the official database engine manual.

Primary Keys

It is highly recommended to have a way to unequivocally refer to a single record of any table in the database, allowing for any modification operation of a specific record to be performed quickly. To achieve this, a column or set of columns is defined as its identifier. The identifier is therefore a value or values that cannot be repeated and allow retrieving a data row from among all those in the table. For example, the nif field could be a valid identifier for the PERSONA table. If it is decided that this field should be the identifier of the table's records, it must be marked as a mandatory and unique field, both conditions being implicit when defining the column as a primary key.

-- MySQL y PostgreSQL
alter table PERSONA add primary key (nif);

Alternatively, the primary key condition could have been indicated at the time of table creation.

-- MySQL y PostgreSQL
create table PERSONA (
    nif varchar(9) PRIMARY KEY,
    nombre varchar(50) not null,
    apellidos char(255),
    fechaNacimiento date
);

In a hypothetical situation where it is determined that two people can have the same nif but in different countries, it would not be sufficient to have a primary key based solely on the nif. The country code should also be included as part of the identifier. This would result in needing both the country code and the nif to identify a row, and this combination should return at most one record from the database, resulting in a composite key. Composite keys are defined at the end of the table declaration or through an alter table statement:

-- MySQL y PostgreSQL
create table PERSONA (
    codPais varchar(3),
    nif varchar(9),
    nombre varchar(50) not null,
    apellidos char(255),
    fechaNacimiento date,
    PRIMARY KEY(nif,codPais)
);

When one or more fields form a primary key, the database creates an internal index where it maintains pointers indicating the position of each record in the table for quick retrieval of any of its rows, always based on the identifier. This avoids iterative searching through all its records. Therefore, the programmer must specify that the field, for example nif, will be part of the primary key, either at table creation or at a later time with the modification of it.

For convenience and to avoid working with more than one column, it is common to create a non-null, incremental numeric field from which atomic numbers are obtained for use as identifiers. This is why URLs like …/persona?id=453 are often found to refer to the data row of the person identified with the number 453. If a new person is saved, it will have the number 454, and so on. If an update needs to be made to a specific one, the process will retrieve the row based on its identifier and update all the data in the row, except for that identifier, as this number must be immutable to always retrieve the same record. Some databases like MySQL automatically assign the next available value to the column used as an identifier. Others like PostgreSQL require creating a database object responsible for issuing non-repeated values. These objects are called sequences.

The creation of the table of people with an automatic identifier in MySQL is as follows:

-- MySQL
create table PERSONA (
    id int NOT NULL AUTO_INCREMENT,
    nif varchar(9),
    nombre varchar(50) not null,
    apellidos char(255),
    fechaNacimiento date,
    PRIMARY KEY(id)
);

So, each time a new person is inserted into the table without providing a value for the id field, MySQL will treat it as a new record in the table and assign it an automatic value. If, on the other hand, a record with a previously used identifier is inserted, the existing record will be completely overwritten by the new one.

The auto-increment option is achieved in PostgreSQL by using a sequence. Therefore, it is necessary to create the sequence to obtain similar behavior. Each call to the sequence will get a different identifier, which must be passed as the value of the field used as the primary key of the record to insert.

-- PostgreSQL
create table PERSONA (
    id int PRIMARY KEY,
    nif varchar(9),
    nombre varchar(50) not null,
    apellidos char(255),
    fechaNacimiento date
);

create sequence persona_sequence
start 1
increment 1;

-- consumo de la secuencia
nextval('persona_sequence');

Whenever possible, it is recommended to work with identifiers of a single column because it will simplify the process in all subsequent operations with the table. In cases where a composite key must be used, it is preferable to create a unique identifier plus a unique key for the composite fields. Although it may seem redundant, the unique key will prevent more than one record with the same values, and the identifier will simplify the location of a row in the table.

Unique Keys

Just like the identifier intrinsically includes a unique key, databases allow creating a unique key constraint for a column or a set of columns. And it does not necessarily have to identify the row, as it allows the column or columns to contain NULL values. If the unique key is for a single field, simply add unique after its definition in the table creation.

-- MySQL y PostgreSQL
create table PERSONA (
    ...
    nif varchar(9) unique,
    ...
);

If the unique key consists of more than one field, it must be defined at the end of the table creation by listing all the columns. Optionally, you can give a name to the constraint.

-- MySQL y PostgreSQL
create table PERSONA (
    ...
    codPais varchar(3),
    nif varchar(9),
    ...,
    CONSTRAINT uk_pais_nif
    UNIQUE KEY (codPais, nif)
);

It is also possible to create the uniqueness constraint regardless of whether it is for one or more columns with the modification of an existing table:

-- MySQL y PostgreSQL
alter table PERSONA add
    constraint uk_pais_nif
    unique key (codPais, nif);

As mentioned, the main difference between a primary key and a unique key is that the latter allows having more than one row where the fields that make up the key are null. In contrast, the primary key does not allow more than one null value.

Other Constraints

Tables not only allow unique key constraints but also more complex constraints involving calculations, such as ensuring a date field is not earlier than a given date, or that the salary field is greater than a certain amount.

-- MySQL y PostgreSQL
create table PERSONA (
    ...
    salario decimal(9,2) check (salario > 1000.00),
    ...,
);

The check type checks will be performed on all insertions and modifications, throwing an error if they are not satisfied for any of the records being operated on. The check should be placed within parentheses and can be a valid SQL expression. It is advisable to review the official documentation of the database management system to identify possible uses and scenarios.

Constraints Across Other Tables

Constraints that affect more than one table are the strength of relational database management systems. And precisely because they allow relating records from different tables consistently. Thanks to the relational system, a record can refer to another, allowing the management system to protect the data from accidental operations and also facilitating data retrieval during queries.

For example, through constraints between tables, a record like the one in the person table can point to another collection like a company, reflecting the relationship between company and employee. To do this, from the person record, it must indicate which company they work for. It is understood that a person only works for a single company at a given time. If this relationship is defined at the SQL level, the management system will prevent the deletion of any company containing workers linked to it. If the company has a unique identifier generated automatically, whether by auto-increment or by sequence, the schema would be as follows:

-- MySQL y PostgreSQL
create table Empresa (
    id int primary key,
    nombre varchar(50)
);

create table Trabajador (
    id bigint primary key,
    nombre varchar(50) not null,
    empresa_id int,
    foreign key (empresa_id) references Empresa(id)
);

-- alternativamente se puede crear la FK
-- en un momento posterior en MySQL
alter table Trabajador
    add FOREIGN key fk_persona_empresa(empresa_id)
    references Empresa(id);

-- en un momento posterior en PostgreSQL
alter table Trabajador
    ADD CONSTRAINT fk_persona_empresa
    FOREIGN KEY (empresa_id) references Empresa(id);

If the destination table has a composite identifier consisting of two or more columns, the table that points to it must include as a foreign key all the fields that make up the primary key of the referenced table. In that case, the combination of fields, both from the source table and the destination table, must be separated by a comma in the parentheses of the foreign key definition.

-- Clave externa compuesta
alter table Trabajador
    ADD ...
    FOREIGN KEY (origen_1, origen_2, ...)
    references Empresa(destino_1, destino_2, ...);

The use of composite primary keys requires propagating all the columns to those tables that point to it, which is why using a single field as an identifier is preferable as it will produce simpler relationships with other tables.

Once the link is defined, no database operation can delete a company as long as there is at least one worker pointing to it. This protection is called referential integrity and prevents many problems when maintaining consistent data in an information system. As will be seen in the following points, the user can easily obtain lists of workers associated with one or another company, using the power offered by SQL for query execution.

Data Manipulation

Once the tables and relationships between them are defined, the next step is to add information to the recently created collections. Below are the operations that can be performed on them.

Data Insertion

To insert data into a collection, use the insert statement followed by the table name, the list of fields, and the values for each of the fields listed in the statement. Again, the syntax may vary slightly between database systems.

-- MySQL y PostgreSQL
create table Inventario (
    id int primary key,
    nombre varchar(50) not null,
    cantidad int not null
);

insert into Inventario(id,nombre,cantidad)
values (1,'Radio FM', 3);

Text fields must contain values in single quotes. The number of columns must match between the list of fields and the list of values. And of course, the position of the values must correspond to the order of the fields.

In the values, expressions or SQL function calls can be indicated, such as converting text strings to dates, or calling the next value of a sequence.

-- MySQL
create table Usuario (
    id int not null auto_increment,
    nombre varchar(50) not null,
    alta date not null,
    primary key(id)
);

insert into Usuario(id, nombre, alta)
values (NULL,'rarnau', CURRENT_DATE());


-- PostgreSQL
create table Usuario (
    id int not null primary key,
    nombre varchar(50) not null,
    alta date not null
);

create sequence Usuario_seq;

insert into Usuario(id, nombre, alta)
    values (nextval('Usuario_seq'),'rarnau', NOW());

If any of the fields is a foreign key to another table, the value indicated in the insert statement must contain an existing value in the identifier field of the destination table. If not, referential integrity will throw an error indicating that the value attached to the statement is incorrect.

Data Modification

Record modifications are performed with update statements. These must indicate which fields to update, containing a value or an expression, and usually the filtering condition that selects the records to be modified.

For example, the following lines create a new column in the user table, adding a default value to all but the record identified by 100, to which it assigns a specific value. If the filtering does not find any matching records, no modification is made to the data, without producing an error.

alter table Usuario add column notas varchar(200);

-- Modifica todos los usuarios
update Usuario set notas = 'sin notas';

-- Modifica un registro (si lo encuentra)
update Usuario set notas = 'super usuario' where id = 100;

Updates are performed on a set of records that meet the conditions specified after the where clause. It will be very common to include the id as part of the filter to affect only one row in the collection, but this won't always be the case. AND and OR conditions can be chained together to select the desired records. As an example, an update with a more sophisticated filter could be:

alter table Usuario add column departamento varchar(200);

-- MySQL
update Usuario set
    notas = 'expirados',
    departamento = 'IT'
    where id > 100
    and (alta < STR_TO_DATE('2010-01-01', 'YYYY-MM-DD')
    OR alta is null) ;

-- PostgreSQL
update Usuario set
    notas = 'expirados',
    departamento = 'IT'
    where id > 100
    and (alta < TO_DATE('2010-01-01', 'YYYY-MM-DD')
    OR alta is null);

The difference between the calls is that MySQL requires the STR_TO_DATE function to convert a text string to a date, while the same operation in PostgreSQL is performed by TO_DATE. Note that changes have been made to more than one column, separating the list of fields by commas.

Data Deletion

Very similar to modification is deletion using delete, being simpler to use as it only needs to know on which set of elements the deletion should be applied. The filters can be as complex as needed, but remember that records pointed to by others cannot be deleted.

delete from Usuario where id > 1000;

Deletions cannot be undone, so care must be taken with operations that delete or modify more than one record. When it is necessary to perform operations on several records, it is advisable to test that the filters are correctly applied to the desired set of records. Therefore, before executing any delete or update, a query with select using the same criteria should be performed.

Data Querying

Queries or data extractions from tables are performed with select statements and are another great tool provided by SQL databases. Through queries, grouped reports, paginated reports, data cross-referencing between tables, ordered results, and complex calculations on virtually all columns managed by the DBMS can be obtained. The select statement constructions are very broad; only the basic concepts will be described to perform management applications. However, the reader should be aware that there is an entire SQL grammar that allows for very complex operations on stored data. In short, the syntax for data extraction is as follows:

SELECT t1.campo1, t2.campo2, ...
FROM Tabla t1, Tabla t2, ...
WHERE t1.campo3 = ...
AND t2.campo3 > ...
AND (t1.campo4 = ... OR t2.campo5 = ...)
ORDER BY t1.campo6 ASC, t2.campo7 DESC;

The first thing that appears is the reserved word select, indicating the beginning of a query. The next segment is the list of fields desired to appear in the results. It continues with the from clause, which is a list of one or more tables separated by commas and to which an alias is applied to reference the columns of each table. It is quite common for two or more columns to have fields with the same name, and it is through the aliases that the management system knows exactly which column to use.

Next come the filters after the where. In them, fields from any table mentioned in the from can be specified, accompanied by checking expressions such as equal to X, greater than Y, different from NULL, etc. Filters can nest AND and OR expressions, and parentheses must be used to resolve ambiguities about which operation to apply first. Lastly, the list of columns by which to order and whether it should be ascending or descending is indicated. The columns in the sorting criteria must match columns appearing in the results list, i.e., after the select reserved word.

A query can contain many more elements than those mentioned here, but this definition is a good starting point to get acquainted with SQL queries.

To incrementally increase complexity starting with a simple query, the simplest operation that can be performed on the users' table is as follows:

SELECT * from Usuario;

This query operation does not have filtering, sorting, or field selection. As a result, the management system will offer all the columns and all the records from the users' table. If the table had millions of records, this operation could penalize the performance of the management system easily. So, it is advisable to indicate some filters, such as filtering by the identifier.

SELECT u.nombre, u.alta
from Usuario u where u.id = 1;

This last query will return at most a single tuple of results. Additionally, it will not return all the stored information for that user since only the name and registration date have been requested. If it were necessary to know which are the latest users registered, the results could be sorted with:

SELECT u.nombre, u.alta
from Usuario u
order by u.alta desc;

Therefore, the programmer can control the results they wish to obtain by including one or more clauses accepted by the select operation.

Result Pagination

To avoid requesting overly heavy reports, it is convenient to limit the number of records desired for each query execution. Additionally, if the management system offers the possibility of navigating through the results list, paginated queries that offer more reasonable results from the point of view of computer resource consumption can be easily obtained.

Pagination of queries with MySQL is done by including the limit <offset>, <rowcount> clause at the end of the select, allowing to indicate how many rows to skip and how many records to obtain. For PostgreSQL, these parameters can be added by including the offset <offset> limit <limit> clause.

-- MySQL
SELECT \* from Usuario u
    order by u.alta desc
    limit 50, 10;

-- PostgreSQL
SELECT \* from Usuario u
    order by u.alta desc
    offset 50 limit 10;

In both cases, the query skips the first 50 records and considers the next 10 to display in the results list. When implementing paginated lists from the application, the offset needs to be calculated as the page number to display multiplied by the page size, obtaining the number of records to skip. The second number should contain the page size.

-- MySQL
SELECT ...
    limit <pagina_solicitada * tamanyo_pagina>,
    <tamanyo_pagina>;

-- PostgreSQL
SELECT ...
    offset <pagina_solicitada * tamanyo_pagina>
    limit <tamanyo_pagina>;

The above code assumes that the first page is 0. Additionally, in applications, the total number of pages is usually displayed, including the total number of pages of matching records after applying any filters. So, from the application, two queries are always executed: the first one that obtains the records from the page the user has requested, and the second one that obtains the number of matching records, taking into account the filters, to get the total number of pages. To count records, group operations on queries are needed.

Result Aggregation

It will not always be necessary to obtain information from rows as they are in the database. Sometimes, it will be necessary to count how many matching records there are, for example, to display a paginated list to the user. Other times, results will need to be grouped to obtain, for example, the total invoices issued per service area. For this purpose, SQL databases include calculation functions such as maximum or MAX, minimum MIN, average AVG, or simply counting lines COUNT. To see the usage of one of them, imagine you want to count the total number of records in the user table:

SELECT count(u.id) from Usuario u;

The above query only returns one row corresponding to the count of values in the u.id column. Since there was no filtering, the result is the number of users in the table. If any filters had been specified, the query would have given the total number of records that meet the selection.

The following example shows how to use summation, but not to sum all records, but to sum those that have something in common. For example, to sum sales by product type. As partial results are needed in this case, an aggregator must be used in the summation operation:

create table Ventas (
    id int primary key,
    item_id int not null,
    precio decimal(6,2) not null
);

SELECT v.item_id,
    sum(v.precio) as total
    from Ventas v
    group by v.item_id;

It is understood that item_id is a foreign key to a table where all items for sale are housed. The above query returns the identifier of each different item sold, together with the sum of the amounts accumulated by item sales. This makes it easy to know which item has contributed the most revenue overall.

However, although the obtained report is very useful, the user must keep in mind what each item is based on its identifier. It can be easy to get lost if there are several hundred different items sold. It would be interesting to display the names of the items alongside the totals. Since the names of the items are in the inventory table, the query must combine fields from both the sales and inventory tables through table intersection as described in the next section.

Table Intersection

Table intersection allows combining columns in queries that come from different tables, such as displaying the names of the products sold by joining the inventory table with the sales table. It is important in SQL to minimize duplicate information in tables. Therefore, the sales table should only contain the item identifier and no other item characteristics if they are invariable between different sales. So, the item name will only be in the inventory table.

Continuing with the previous exercise, where the request is to list the best-selling products, both tables must be combined to produce a useful report for the user, having the following inventory and sales schema:

create table Ventas (
    id int primary key,
    item_id int not null,
    fecha date not null,
    precio decimal(6,2) not null
);

create table Inventario (
    id int primary key,
    nombre varchar(50) not null,
    cantidad int not null
);

-- MySQL
alter table Ventas
    add FOREIGN key fk_venta_inventario(item_id)
    references Inventario(id);

-- PostgreSQL
alter table Ventas
    ADD CONSTRAINT fk_venta_inventario
    FOREIGN KEY (item_id) references Inventario(id);

-- Creando algunos registros

insert into Inventario(id,nombre,cantidad)
    values(1,'Radio FM',1);
insert into Inventario(id,nombre,cantidad)
    values(2,'Radio AM',1);
insert into Inventario(id,nombre,cantidad)
    values(3,'Radio UHF',1);

insert into Ventas(id,item_id,fecha,precio)
    values (1,1,'2019-08-01',3.5);
insert into Ventas(id,item_id,fecha,precio)
    values (2,1,'2019-08-02',4);
insert into Ventas(id,item_id,fecha,precio)
    values (3,2,'2019-08-03',2);

As a note, it is highlighted that the conversion of text strings to dates is automatic if the date is written following the YYYY-MM-DD format both in MySQL and PostgreSQL.

Once the tables have been created and some data inserted, the query that provides information about the item identifier and name, together with the total sales made during the month of August 2019 for each of them, is as follows:

select i.id as item_id,
    i.nombre as item_nombre,
    sum(v.precio) as total
    from Inventario i, Ventas v
    where v.item_id = i.id
    and v.fecha >= '2019-08-01'
    and v.fecha <= '2019-08-31'
    group by i.id
    order by total desc;

For the management system to perform the intersection of both tables, it is necessary to indicate in the query which field will link the records from one table to another. For this purpose, it has been defined that the comparison of the identifiers of the items in the sales, must be equal to the identifiers of the same items in the inventory table with v.item_id = i.id. An expression of this type will always be necessary to establish connections between fields that are foreign keys of related tables.

Apart from date filters, the other important point is that the partial results of the summation of v.precio according to each different item are required, hence the grouping operation GROUP BY i.id.

The intersection of the tables as done in the example is called an inner join, because there must be a match of two records, one at each end, for a row to be shown in the results list. If the system does not find the related record either on the left or right side of the expression, the corresponding line would not be offered as a result.

This way of writing an inner join is called implicit, since the management system is required to join the tables to fulfill one of the conditions of the where clause, which compares the identifier of one with the foreign key of the other. But there is another way to force this fact more directly, using the words inner join explicitly in the statement. The previous query explicitly would be as follows:

-- reescritura con inner join
select i.id as item_id,
    i.nombre as item_nombre,
    sum(v.precio) as total
    from Inventario i
    inner join Ventas v
    on v.item_id = i.id
    where v.fecha >= '2019-08-01'
    and v.fecha <= '2019-08-31'
    group by i.id
    order by total desc;

Pay attention to how the conjunction between tables is described using inner join <table> on <comparison>. This is the correct way to make intersections, but as mentioned, this type of relationship requires that there are records on both sides of the relationship, causing item 3, Radio UHF, not to appear.

However, sometimes it is useful to display one of the endpoints even if the related record is not found. In that case, the columns of the unmatched record would be NULL unless otherwise specified in the select construction. This case would appear, for example, if you want to display all items, even if some of them have had no sales. For this purpose, another type of conjunction called left or right join can be used during the query. If the previous query is changed to a left join, the table hosted on the left side of the expression will take control, forcing the records it contains to appear.

-- reescritura con inner join
select i.id as item_id,
    i.nombre as item_nombre,
    sum(v.precio) as total
    from Inventario i
    left join Ventas v
    on v.item_id = i.id
    group by i.id
    order by total desc;

The result of the left join operation is the following table:

item_id item_nombre total
1 Radio FM 7,5
2 Radio FM 2
3 Radio FM NULL

The use of inner or left join will depend on the result needed in each case, but it is usually interesting to know these concepts, especially when the results do not match the tables used and the applied filters. If the expected records do not appear, it is quite likely that the reason is due to performing an inner join.

There are many options for making queries, and many other interesting options have been left out of this text to get the most out of SQL statements. Although with what has been explained there is more than enough to create a simple management application, mastery of SQL by the programmer is a basic and desirable requirement for anyone wishing to participate in a professional project. You can expand your knowledge of the area with numerous resources available on the Internet such as:

Transactions

SQL relational systems also include the possibility of executing a set of statements transactionally, meaning that either all of them are executed correctly or none are executed. This functionality is necessary when it is required that the database be consistent most of the time.

Imagine for a moment that some batch updates are being executed in the database, and suddenly one of them contains an error or does not pass some constraint defined in the table. That would imply that some statements may have been executed and others not. As such, there is no certainty about how the data has been left, making it extremely difficult to identify what steps need to be taken to restore the database to its previous state, where the information was correct in its entirety. Furthermore, if it occurs in a concurrent environment with several simultaneous requests. Hence, it is desirable for databases to incorporate transactionality. If the system provides this functionality, the database will automatically undo all changes made since the beginning of the transaction.

To execute statements in a transaction, the beginning and end of it must be indicated. The beginning is denoted by the begin statement, while the end depends on whether everything has gone correctly with commit. However, if an error is detected and it is desired that no changes remain persistent, everything will be undone with rollback.

The operation can be checked with a simple update that is rolled back at the end.

begin;
delete from Ventas;
rollback;
-- El borrado no se realiza
select * from Ventas;

Transactional systems offer a layer of security that, together with relational constraints, produce solid and reliable applications no matter how many updates are made. It is advisable to keep this in mind when choosing one management system over another for critical applications.

Execution of Code in the Database

SQL databases allow other functionalities that may be useful in some scenarios. The first of these is the creation of triggers that can execute procedural logic when certain situations occur, such as an insertion, modification, or deletion of a particular table. The second, along the same lines, is to create procedures and functions that contain application logic in the form of SQL statements.

PL/SQL Triggers

Small programs that are executed when manipulation occurs on a set of data are known as triggers. They are often used for audit controls or more sophisticated checks than those that can be done with check type constraints before or after insertions, modifications, or deletions.

As an example, a possible trigger is included to give the reader a brief idea that triggers exist and how they can be implemented at a later time. The syntax used is defined in PL/SQL of Procedural Language.

For example, an trigger is introduced to check amounts before an update with PL/SQL in MySQL so that this number is always between 0 and 100:

CREATE TABLE account(
    acct_num INT,
    amount DECIMAL(10,2)
);

CREATE TRIGGER upd_check
    BEFORE UPDATE ON account
        FOR EACH ROW
    BEGIN
    IF NEW.amount < 0 THEN
        SET NEW.amount = 0;
    ELSEIF NEW.amount > 100 THEN
        SET NEW.amount = 100;
    END IF;
END;

The following example, somewhat more complex, represents validations that can throw errors in PostgreSQL through a trigger that is executed before each insertion or update. The following lines check that the salary of employees (records in the emp table) is not null and greater than zero.

CREATE TABLE emp (
    empname text,
    salary integer,
    last_date timestamp,
    last_user text
);

CREATE FUNCTION emp_stamp() RETURNS trigger AS $emp_stamp$
BEGIN
-- Check that empname and salary are given
IF NEW.empname IS NULL THEN
RAISE EXCEPTION 'empname cannot be null';
END IF;
IF NEW.salary IS NULL THEN
RAISE EXCEPTION
'% cannot have null salary', NEW.empname;
END IF;

-- Who works for us when she must pay for it?
IF NEW.salary < 0 THEN
RAISE EXCEPTION
'% cannot have a negative salary', NEW.empname;
END IF;

-- Remember who changed the payroll when
NEW.last_date := current_timestamp;
NEW.last_user := current_user;
RETURN NEW;
END;
$emp_stamp$ LANGUAGE plpgsql;

CREATE TRIGGER emp_stamp BEFORE INSERT OR UPDATE ON emp
FOR EACH ROW EXECUTE PROCEDURE emp_stamp();

Triggers can be useful when there are multiple applications consuming data from the same database. Ideally, each application should apply the necessary validation rules and checks. However, to ensure data integrity, it should be the database as the central piece that performs the checks, preventing any checks from being missed between systems.

PL/SQL Stored Procedures

Using the same PL/SQL language, database management systems have the ability to define procedures and functions that can be invoked using SQL statements. These can contain logic like that shown in triggers with loops, conditionals, and execution of SQL operations on tables. All of this is executed by the database when invoking a function or procedure.

In principle, PL/SQL differentiates between functions and procedures based on the criterion that functions are intended to return one or more values given some parameters that can be executed in SQL statements like max or avg. However, procedures are not expected to return a result; they are normally used to execute SQL statements and make changes or build reports from the database.

The following procedure counts the rows of the sales table in MySQL and saves it in the indicated variable @a.

delimiter //

CREATE PROCEDURE simpleproc (OUT param1 INT)
BEGIN
SELECT COUNT(\*) INTO param1 FROM Ventas;
END//

delimiter ;

CALL simpleproc(@a);
SELECT @a;

On the other hand, in the following code snippet, a function is built that accepts an input parameter of type text and fixed width 20, returning a string of 50 letters by concatenating a prefix and suffix.

CREATE FUNCTION hello (s CHAR(20))
RETURNS CHAR(50) DETERMINISTIC
RETURN CONCAT('Hello, ',s,'!');

SELECT hello('world');

These procedures are executed by and within the database management system. The database compiles this code and generates binary code that will execute efficiently. Thus, database management systems are sophisticated software tools that encompass a large number of functionalities. This is just a small sample of the possibilities they offer. It is advisable to occasionally review the new features they include, as they remain important pieces within the services offered by the Information Technology department, constantly evolving.

The statements written by the programmer to perform functions and procedures in databases should be treated the same as Java code in a project. And like all source code, it must be controlled by a version control system and contain a versioning and deployment plan for modifications in different testing and production environments.

Conclusiones

SQL is the fundamental language for handling secure and reliable data. If you want to master this powerful language and improve your database skills, visit the website of Arteco Consulting SL now! Our experts can guide you through the basics of SQL and take you to the next level in data management. Don't miss the opportunity to advance your career in web development and programming with us!

Stay Connected

Newsletter

Stay up to date with the latest in technology and business! Subscribe to our newsletter and receive exclusive updates directly to your inbox.

Online Meeting

Don't miss the opportunity to explore new possibilities. Schedule an online meeting with us today and let's start building the future of your business together!

  • :D
  • :)
  • ;]
  • :0
  • :}
  • ;)
  • :>

Join the Team

We have a large portfolio of trainees who combine their academic training with experience at Arteco, learning firsthand from those on the front lines. We carry out an intensive training program aimed at rapid incorporation into real development teams.

Person running through the desert representing Arteco Consulting's Team Building
  • :)
  • :0
  • :D
  • ;)
  • :]