June 5, 2024
Currently, many IT teams work with agile methodologies such as Scrum and use tools such as Jira Software or Azure Boards to control activities during the Sprint. These tools, by default, offer alternative graphs for monitoring Sprint metrics, however, if you want to carry out more complex analyses involving the business rules of each company, it is necessary to customize a solution.
As an alternative to this customization, this project aims to create a database for building a Burndown graph, with the time spent by the team, providing an additional analysis to this metric already widely used to monitor the Sprint.
To develop the project, a sample board was created on Jira, with make-believe data, simulating an IT team that uses Scrum.
Then, based on the data in Jira, an ELT process will be carried out to make the data available to build the final graph in Power BI, as shown in the following image:
These tools were chosen because they enable the project to be carried out at zero cost, and thus represent a solution that can also be applied to small companies, without representing any maintenance costs. Furthermore, if a company already uses these or similar tools, the project can be adapted to the specifications and needs of each company.
Cost breakdown per tool:
To extract data from Jira it is necessary to configure it as a source on Airbyte. To do this, on Airbyte, in sources, search for and select the Jira source. Firstly, you need to have the API Token to connect to Jira. If your Jira account does not already have a Token, simply follow Jira's documentation contained in Airbyte to get the Token. After, the configuration can be carried out on Airbyte, with the option to filter the project and the start date for this source:
The data will be loaded into a Postgresql database to be stored and transformed. In order to have an isolated environment for the solution, Postgresql will be used from a Docker container. Docker Desktop is secure, out-of-the-box containerization software offering developers and teams a robust, hybrid toolkit to build, share, and run applications anywhere.
To install Docker locally, simply follow the instructions according to operating system at https://www.docker.com/products/docker-desktop/.
Docker allows us to create a completely customized container with the necessary software, files and configurations from a Dockerfile, which will create a Docker image that will serve as a template for creating containers with the configurations contained in that Dockerfile.
Due to the Airbyte update process, which will be clarified in section 9, some customizations will be necessary in the container, so to create the Docker container a customized image will be used, created using the following Dockerfile (without extension):
# Get postgres image in version 16.1 as base
FROM postgres:16.1
# Update and install cron and vim
RUN apt-get update && apt-get install -y cron vim
# Set cron time zone to Sao Paulo
RUN echo "America/Sao_Paulo" > /etc/timezone
# Set the container time zone to Sao Paulo
RUN ln -sf /usr/share/zoneinfo/America/Sao_Paulo /etc/localtime
# Copy crontab file with schedule settings
COPY crontab /var/spool/cron/crontabs/root
# Copy the shell scripts to be executed by crontab
COPY drop_view.sh /tmp
COPY create_view.sh /tmp
# Give execution permission to the files
RUN chmod +x /tmp/drop_view.sh
RUN chmod +x /tmp/create_view.sh
With the Dockerfile created, open the terminal in the folder where the file is located and execute the command (note: to execute the command, it is necessary to have the drop_view.sh, create_view.sh and crontab files created, their codes will be presented in section 9):
docker build -t postgres-cron-vim:16.1 .
Then run the command below to create the container, passing the environment variables.
docker run -d --name dashboardscrum -p 5579:5432 -e POSTGRES_DB=DashboardScrum -e POSTGRES_USER=user -e POSTGRES_PASSWORD=XXXX postgres-cron-vim:16.1
Note: the default port for running Postgresql is 5432, but if the local machine is already running Postgresql or port 5432 is already in use for some reason, port mapping can be performed (-p 5579:5432) . In that case, the access will be through port 5579 on the local machine and will be redirected to port 5432 inside the container, where Postgresql will be running on its default port.
To access the database, pgAdmin can be used, available for download at https://www.pgadmin.org/download/. In this project, pgAdmin was installed locally, but it is also possible for its installation and access to be realized through a Docker container.
After accessing the pgAdmin screen, right-click on Servers > Register > Server...
And then carry out the configuration according to the port and environment variables defined in the container creation command.
To load data into Postgresql it is necessary to configure it as a source in Airbyte. To do this, in Airbyte, in destinations, search for and select the Postgres destination and configure it according to the environment variables defined in the container creation command:
To configure the connection in Airbyte, in connections, select the source (Jira) and destination (Postgres). When selecting streams, select the tables, with their respective Sync mode (by default, some other tables are already selected and must be deselected):
Note: in the stream selection, it is also possible to select the columns that will be extracted from each table, therefore, the columns selected can be adapted according to the company's needs.
Later, the name and manual scheduling for the connection will be configured, so that the first load can be carried out in the database.
After configuring the connection, access it in connections and click "Sync now" to perform the first load in the database.
To build the Burndown chart, the following information will be needed:
In order to make this data available for use in Power BI, a view containing this information will be built.
After an exploratory analysis and data manipulation in pgAdmin, it was determined to build 4 functions to be used in the formation of the view. The next topics bring their respective codes.
CREATE FUNCTION cont_dias_sprint(sprint_name VARCHAR)
RETURNS SETOF NUMERIC AS $$
-- Select the number of Sprint days minus 1 for division in the planned column
SELECT COUNT(intervalo) - 1
FROM generate_series(
-- Generate a series of dates between the start and end days of the sprint
(SELECT "startDate" FROM public.sprints WHERE name = sprint_name),
(SELECT "endDate" FROM public.sprints WHERE name = sprint_name),
INTERVAL '1 day'
) AS intervalo
-- Extract the days of the week and filter to remove Saturday and Sunday
WHERE EXTRACT(dow FROM intervalo) NOT IN (6, 0);
$$ LANGUAGE sql;
CREATE FUNCTION estimado_total(sprint_name VARCHAR)
RETURNS SETOF NUMERIC AS $$
-- Sum the estimate for each card and bring it into relation with the sprint
SELECT SUM((CAST((i.fields -> 'timeoriginalestimate') AS INTEGER))/3600)
FROM public.issues AS i
JOIN public.sprint_issues AS si ON i.key = si.key
JOIN public.sprints AS s ON si."sprintId" = s.Id
WHERE s.name = sprint_name
GROUP BY s.name;
$$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION planejado_ideal(sprint_name VARCHAR)
RETURNS TABLE (dia DATE, planejado NUMERIC) AS $$
DECLARE
-- Declare variable to be used in the loop starting with the total estimated value for the sprint
current_value NUMERIC(8,6) := estimado_total(sprint_name);
-- Declare increment variable
increment_value NUMERIC(8,6) := estimado_total(sprint_name) / cont_dias_sprint(sprint_name);
BEGIN
FOR dia IN
-- Loop to run through sprint days
SELECT intervalo
FROM generate_series(
(SELECT "startDate" FROM public.sprints WHERE name = sprint_name),
(SELECT "endDate" FROM public.sprints WHERE name = sprint_name),
INTERVAL '1 day') AS intervalo
WHERE EXTRACT(dow FROM intervalo) NOT IN (6, 0)
LOOP
-- Starting with the first value as the estimated sprint total
planejado := current_value;
RETURN NEXT;
-- Update next column value being the previous one minus the increment
current_value := current_value - increment_value;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION horas_restantes(sprint_name VARCHAR, prev_horas_entregues NUMERIC, datas DATE)
RETURNS TABLE(dia DATE, horas_restantes NUMERIC) AS $$
DECLARE
-- Set cursor to run through all days of each sprint
tbl_cursor CURSOR FOR (
SELECT s.name AS sprint, data.data :: DATE
FROM public.sprints AS s
CROSS JOIN generate_series(s."startDate", s."endDate", interval '1 day') AS data
WHERE EXTRACT(dow FROM data.data) NOT IN (6, 0) AND s.name = sprint_name
ORDER BY data
);
tbl_row RECORD;
-- Set initial value as estimated total for sprint
horas_restantes NUMERIC := estimado_total(sprint_name);
-- Define control variables so that in each iteration can be select the previous day's delivered hours
horas_entregues NUMERIC := 0;
current_value NUMERIC := 0;
BEGIN
OPEN tbl_cursor;
LOOP
-- Set stopping criteria for the cursor
FETCH FROM tbl_cursor INTO tbl_row;
EXIT WHEN NOT FOUND;
-- Select delivered hours
SELECT COALESCE(SUM((issues.fields ->> 'timeoriginalestimate')::INTEGER / 3600), 0)
INTO horas_entregues
FROM issues
WHERE to_date(issues.fields ->> 'resolutiondate', 'YYYY-MM-DD') = tbl_row.data;
-- Subtract the value in the remaining hours column from the value of delivered hours
horas_restantes := horas_restantes - current_value;
-- Store the value of delivered hours on date x, so that in the next interaction this value
-- will be subtracted, representing the value of hours delivered on the previous date (x-1)
current_value := horas_entregues;
-- Returning the date and remaining hours in each iteration
RETURN QUERY SELECT tbl_row.data, horas_restantes;
END LOOP;
CLOSE tbl_cursor;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE VIEW public.vw_dadosburndown AS
WITH dados_sprint AS (
SELECT
s.name AS sprint,
data.data :: DATE
FROM public.sprints AS s
CROSS JOIN generate_series(s."startDate", s."endDate", INTERVAL '1 day') AS data
WHERE EXTRACT(dow FROM data.data) NOT IN (6, 0)
ORDER BY data
),
dados_horas_entregues AS(
SELECT
SUM((CAST((fields -> 'timeoriginalestimate') AS INTEGER))/3600) AS horas_entregues,
to_date(fields ->> 'resolutiondate', 'YYYY-MM-DD') AS data_encerramento
FROM public.issues
GROUP BY data_encerramento
),
prev_dados_tempo_gasto AS(
SELECT
CAST((jsonb_array_elements(fields -> 'worklog' -> 'worklogs') -> 'timeSpentSeconds') AS INTEGER)/3600 AS tempo_gasto,
to_date(jsonb_array_elements(fields -> 'worklog' -> 'worklogs') ->> 'updated', 'YYYY-MM-DD') AS dia_gasto
FROM public.issues
),
dados_tempo_gasto AS(
SELECT
dia_gasto,
SUM(tempo_gasto) AS tempo_gasto
FROM prev_dados_tempo_gasto
GROUP BY dia_gasto
)
SELECT
ds.sprint,
ds.data,
dhe.horas_entregues,
dtg.tempo_gasto,
plan.planejado,
he.horas_restantes
FROM dados_sprint AS ds
LEFT JOIN LATERAL planejado_ideal(ds.sprint) AS plan ON ds.data = plan.dia
LEFT JOIN dados_horas_entregues AS dhe ON ds.data = dhe.data_encerramento
LEFT JOIN dados_tempo_gasto AS dtg ON ds.data = dtg.dia_gasto
JOIN horas_restantes(ds.sprint, dhe.horas_entregues, ds.data) AS he ON ds.data = he.dia
ORDER BY ds.sprint,ds.data;
In pgAdmin, the view returns the following table with data. Next, Postgresql will be connected to Power Bi to build the Burndown graph with time spent.
In Power Bi, the source "PostgreSQL database" must be accessed. For the project in question, Power BI was accessed by another machine, so on the server, it is necessary to configure the IP and port (if a different port was mapped when creating the container):
Afterward, enter username and password according to the environment variables defined in the container creation command:
And finally select the view created from the available tables:
As can be seen in the graph, for the sprint in question, deliveries were late (hours remaining above the planned line) until 10/06/2024. Then there is a record of more time spent on that day and the next, which may represent the improvement in performance in the last days of the sprint. Furthermore, another advantage of keeping track of the time spent per day can be the help in maturity of the team, when you do not yet have good time control.
To update the data, you can schedule it on Airbyte. However, for this project, a small adaptation will be necessary. The "sprints" table does not have a column that would make it possible to perform an incremental append of data, so the synchronization mode is "Full refresh | Overwrite". In this mode, Airbyte drops (deletes) the table and then builds it again with the updated data. However, as the "sprints" table is used in the database view, it cannot be deleted. To overcome this issue, the proposed solution was to schedule the drop and creation of the view with CRON in the container and the Airbyte update be made between these two commands.
This is why, when creating the container, the files drop_view.sh, create_view.sh and crontab were copied, which represent, respectively, the files with the commands to delete the view, create the view and scheduling in CRON. The next topics bring these codes.
#!/bin/bash
# Set environment variables for PostgreSQL
export PGHOST=localhost
export PGUSER=user
export PGPASSWORD=XXXXX
export PGDATABASE=DashboardScrum
# Execute the drop view command
psql -c "DROP VIEW public.vw_dadosburndown;"
#!/bin/bash
# Set environment variables for PostgreSQL
export PGHOST=localhost
export PGUSER=gpb
export PGPASSWORD=gpb5579
export PGDATABASE=DashboardScrum
# Variable with view creation
VIEW_DEFINITION="
CREATE OR REPLACE VIEW public.vw_dadosburndown AS
WITH dados_sprint AS (
SELECT
s.name AS sprint,
data.data :: DATE
FROM public.sprints AS s
CROSS JOIN generate_series(s.\"startDate\", s.\"endDate\", INTERVAL '1 day') AS data
WHERE EXTRACT(dow FROM data.data) NOT IN (6, 0)
ORDER BY data
),
dados_horas_entregues AS(
SELECT
SUM((CAST((fields -> 'timeoriginalestimate') AS INTEGER))/3600) AS horas_entregues,
to_date(fields ->> 'resolutiondate', 'YYYY-MM-DD') AS data_encerramento
FROM public.issues
GROUP BY data_encerramento
),
prev_dados_tempo_gasto AS(
SELECT
CAST((jsonb_array_elements(fields -> 'worklog' -> 'worklogs') -> 'timeSpentSeconds') AS INTEGER)/3600 AS tempo_gasto,
to_date(jsonb_array_elements(fields -> 'worklog' -> 'worklogs') ->> 'updated', 'YYYY-MM-DD') AS dia_gasto
FROM public.issues
),
dados_tempo_gasto AS(
SELECT
dia_gasto,
SUM(tempo_gasto) AS tempo_gasto
FROM prev_dados_tempo_gasto
GROUP BY dia_gasto
)
SELECT
ds.sprint,
ds.data,
dhe.horas_entregues,
dtg.tempo_gasto,
plan.planejado,
he.horas_restantes
FROM dados_sprint AS ds
LEFT JOIN LATERAL planejado_ideal(ds.sprint) AS plan ON ds.data = plan.dia
LEFT JOIN dados_horas_entregues AS dhe ON ds.data = dhe.data_encerramento
LEFT JOIN dados_tempo_gasto AS dtg ON ds.data = dtg.dia_gasto
JOIN horas_restantes(ds.sprint, dhe.horas_entregues, ds.data) AS he ON ds.data = he.dia
ORDER BY ds.sprint,ds.data;"
# Executing the view creation command
psql -c "$VIEW_DEFINITION"
0 3 * * 1-5 /tmp/drop_view.sh >> /tmp/cron_drop.log 2>&1
0 5 * * 1-5 /tmp/create_view.sh >> /tmp/cron_create.log 2>&1
This way, the scripts will be executed at 3:00 am and 5:00 am from Monday to Friday, and their outputs/errors will be recorded in the cron_drop.log and cron_create.log files in the tmp folder of the container.
Note: using >> all logs will be recorded. Using > will only record the log of the last execution.
To activate the CRON commands, it is necessary to enter the container terminal and, using the bash language, execute the following command:
service cron start
To check if CRON is actually running, you can use the command:
service cron status
Finally, you need to schedule the update in Airbyte, which can be done in Settings, on the connection, as shown in the following image:
This way, at 3:00 am the view will be deleted from the database, at 4:00 am Airbyte will update the database and at 5:00 am the view will be recreated, leaving the data available to be updated in Power BI normally. This is just a configuration suggestion, scheduling can be changed depending on the company's needs.
The Burndown with time spent chart is just an example of what can be done by obtaining raw data from Jira and transforming it, making it possible to adapt this transformation according to the needs and business rules of each company. It is worth remembering that there are plug-ins and applications that can be added to Jira for customized charts, however they are not always free and bringing the solution in this way, the flexibility to build the charts and to implement the company's business rules is much bigger.
If you have any questions, suggestions or just want to talk about this project, I am at your disposal.
E-mail: gabrielapbarros15@gmail.com
LinkedIn: https://www.linkedin.com/in/gabriela-pereira-barros/