Dashboard Scrum¶

June 5, 2024

DashboardIcon.png

Summary

  • 1. Business problem
  • 2. Solution proposal
    • 2.1 Tools and cost
  • 3. Set up the source on Airbyte
  • 4. Preparing the database
    • 4.1 Using Docker
    • 4.2 Dockerfile build
    • 4.3 Creating image and container on Docker
    • 4.4 Accessing the database
  • 5. Set up the destination on Airbyte
  • 6. Set up the connection on Airbyte
  • 7. Data transform
    • 7.1 Function that returns the sprint day count
    • 7.2 Function that returns the estimated total hours of the sprint
    • 7.3 Function that determines ideal planning line
    • 7.4 Function that calculates remaining hours
    • 7.5 View
  • 8. Data visualization
    • 8.1 Connection to Power Bi
    • 8.2 Burndown with Time Spent Chart
  • 9. Updating the data
    • 9.1 Command to delete view (drop_view.sh)
    • 9.2 Command to create view (create_view.sh)
    • 9.3 Scheduling command in CRON (crontab)
    • 9.4 Starting CRON in the container
    • 9.5 Scheduling update in Airbyte
  • 10. Conclusion

1. Business problem ¶

Currently, many IT teams work with agile methodologies such as Scrum and use tools such as Jira Software or Azure Boards to control activities during the Sprint. These tools, by default, offer alternative graphs for monitoring Sprint metrics, however, if you want to carry out more complex analyses involving the business rules of each company, it is necessary to customize a solution.

2. Solution proposal ¶

As an alternative to this customization, this project aims to create a database for building a Burndown graph, with the time spent by the team, providing an additional analysis to this metric already widely used to monitor the Sprint.

To develop the project, a sample board was created on Jira, with make-believe data, simulating an IT team that uses Scrum.

Then, based on the data in Jira, an ELT process will be carried out to make the data available to build the final graph in Power BI, as shown in the following image:

English-ELTDashboardScrum.png

2.1 Tools and cost ¶

These tools were chosen because they enable the project to be carried out at zero cost, and thus represent a solution that can also be applied to small companies, without representing any maintenance costs. Furthermore, if a company already uses these or similar tools, the project can be adapted to the specifications and needs of each company.

Cost breakdown per tool:

  • Jira: free for up to 10 users (https://www.atlassian.com/software/jira/pricing).
  • Airbyte: free for local deployment with some limitations, for example, it can be used by only one user (https://airbyte.com/pricing).
  • Docker: free up to a limit of 250 employees or US$10 million in annual revenue (https://www.docker.com/pricing/).
  • PostreSQL: completely free according to the official website license (https://www.postgresql.org/about/licence/).
  • PgAdmin: completely free according to the official website license (https://www.pgadmin.org/licence/).
  • Power BI: free for local use and without the possibility of scheduling (https://powerbi.microsoft.com/en-us/pricing/).

3. Set up the source on Airbyte ¶

To extract data from Jira it is necessary to configure it as a source on Airbyte. To do this, on Airbyte, in sources, search for and select the Jira source. Firstly, you need to have the API Token to connect to Jira. If your Jira account does not already have a Token, simply follow Jira's documentation contained in Airbyte to get the Token. After, the configuration can be carried out on Airbyte, with the option to filter the project and the start date for this source:

JiraFonte.png

4. Preparing the database ¶

4.1 Using Docker ¶

The data will be loaded into a Postgresql database to be stored and transformed. In order to have an isolated environment for the solution, Postgresql will be used from a Docker container. Docker Desktop is secure, out-of-the-box containerization software offering developers and teams a robust, hybrid toolkit to build, share, and run applications anywhere.

To install Docker locally, simply follow the instructions according to operating system at https://www.docker.com/products/docker-desktop/.

Docker allows us to create a completely customized container with the necessary software, files and configurations from a Dockerfile, which will create a Docker image that will serve as a template for creating containers with the configurations contained in that Dockerfile.

4.2 Dockerfile build ¶

Due to the Airbyte update process, which will be clarified in section 9, some customizations will be necessary in the container, so to create the Docker container a customized image will be used, created using the following Dockerfile (without extension):

# Get postgres image in version 16.1 as base
FROM postgres:16.1

# Update and install cron and vim
RUN apt-get update && apt-get install -y cron vim

# Set cron time zone to Sao Paulo
RUN echo "America/Sao_Paulo" > /etc/timezone

# Set the container time zone to Sao Paulo
RUN ln -sf /usr/share/zoneinfo/America/Sao_Paulo /etc/localtime

# Copy crontab file with schedule settings
COPY crontab /var/spool/cron/crontabs/root

# Copy the shell scripts to be executed by crontab
COPY drop_view.sh /tmp
COPY create_view.sh /tmp

# Give execution permission to the files
RUN chmod +x /tmp/drop_view.sh
RUN chmod +x /tmp/create_view.sh

4.3 Creating image and container on Docker ¶

With the Dockerfile created, open the terminal in the folder where the file is located and execute the command (note: to execute the command, it is necessary to have the drop_view.sh, create_view.sh and crontab files created, their codes will be presented in section 9):

docker build -t postgres-cron-vim:16.1 .

Then run the command below to create the container, passing the environment variables.

docker run -d --name dashboardscrum -p 5579:5432 -e POSTGRES_DB=DashboardScrum -e POSTGRES_USER=user -e POSTGRES_PASSWORD=XXXX postgres-cron-vim:16.1

Note: the default port for running Postgresql is 5432, but if the local machine is already running Postgresql or port 5432 is already in use for some reason, port mapping can be performed (-p 5579:5432) . In that case, the access will be through port 5579 on the local machine and will be redirected to port 5432 inside the container, where Postgresql will be running on its default port.

4.4 Accessing the database ¶

To access the database, pgAdmin can be used, available for download at https://www.pgadmin.org/download/. In this project, pgAdmin was installed locally, but it is also possible for its installation and access to be realized through a Docker container.

After accessing the pgAdmin screen, right-click on Servers > Register > Server...

And then carry out the configuration according to the port and environment variables defined in the container creation command.

English-ConexãoPgAdmin.png

5. Set up the destination on Airbyte ¶

To load data into Postgresql it is necessary to configure it as a source in Airbyte. To do this, in Airbyte, in destinations, search for and select the Postgres destination and configure it according to the environment variables defined in the container creation command:

DBdestino.png

6. Set up the connection on Airbyte ¶

To configure the connection in Airbyte, in connections, select the source (Jira) and destination (Postgres). When selecting streams, select the tables, with their respective Sync mode (by default, some other tables are already selected and must be deselected):

  • issues - Incremental | Append + Deduped
  • sprint_issues - Incremental | Append + Deduped
  • sprints - Full refresh | Overwrite

Sele%C3%A7%C3%A3oStreamAirbytepng-2.png

Note: in the stream selection, it is also possible to select the columns that will be extracted from each table, therefore, the columns selected can be adapted according to the company's needs.

Later, the name and manual scheduling for the connection will be configured, so that the first load can be carried out in the database.

Conex%C3%A3oAirbyte-2.png

After configuring the connection, access it in connections and click "Sync now" to perform the first load in the database.

ConexãoAirbyte.png

7. Data transform ¶

To build the Burndown chart, the following information will be needed:

  • Sprint;
  • Date;
  • Delivered hours per day;
  • Time spent per day;
  • Planned hours per day;
  • Remaining hours per day (calculated as total hours minus what was delivered the previous day).

In order to make this data available for use in Power BI, a view containing this information will be built.

After an exploratory analysis and data manipulation in pgAdmin, it was determined to build 4 functions to be used in the formation of the view. The next topics bring their respective codes.

7.1 Function that returns the sprint day count ¶

CREATE FUNCTION cont_dias_sprint(sprint_name VARCHAR)
RETURNS SETOF NUMERIC AS $$
-- Select the number of Sprint days minus 1 for division in the planned column
SELECT COUNT(intervalo) - 1
FROM generate_series(
    -- Generate a series of dates between the start and end days of the sprint
    (SELECT "startDate" FROM public.sprints WHERE name = sprint_name),
    (SELECT "endDate" FROM public.sprints WHERE name = sprint_name),
    INTERVAL '1 day'
) AS intervalo
-- Extract the days of the week and filter to remove Saturday and Sunday
WHERE EXTRACT(dow FROM intervalo) NOT IN (6, 0);
$$ LANGUAGE sql;

7.2 Function that returns the estimated total hours of the sprint ¶

CREATE FUNCTION estimado_total(sprint_name VARCHAR)
RETURNS SETOF NUMERIC AS $$
-- Sum the estimate for each card and bring it into relation with the sprint
SELECT SUM((CAST((i.fields -> 'timeoriginalestimate') AS INTEGER))/3600)
FROM public.issues AS i
JOIN public.sprint_issues AS si ON i.key = si.key
JOIN public.sprints AS s ON si."sprintId" = s.Id
WHERE s.name = sprint_name
GROUP BY s.name;
$$ LANGUAGE sql;

7.3 Function that determines ideal planning line ¶

CREATE OR REPLACE FUNCTION planejado_ideal(sprint_name VARCHAR)
RETURNS TABLE (dia DATE, planejado NUMERIC) AS $$

DECLARE
  -- Declare variable to be used in the loop starting with the total estimated value for the sprint
  current_value NUMERIC(8,6) := estimado_total(sprint_name);
  -- Declare increment variable
  increment_value NUMERIC(8,6) := estimado_total(sprint_name) / cont_dias_sprint(sprint_name);

BEGIN

  FOR dia IN
    -- Loop to run through sprint days
    SELECT intervalo
    FROM generate_series(
        (SELECT "startDate" FROM public.sprints WHERE name = sprint_name),
        (SELECT "endDate" FROM public.sprints WHERE name = sprint_name),
        INTERVAL '1 day') AS intervalo
    WHERE EXTRACT(dow FROM intervalo) NOT IN (6, 0)
  LOOP
    -- Starting with the first value as the estimated sprint total
    planejado := current_value;
    RETURN NEXT;
    -- Update next column value being the previous one minus the increment
    current_value := current_value - increment_value;
  END LOOP;

  RETURN;

END;

$$ LANGUAGE plpgsql;

7.4 Function that calculates remaining hours ¶

CREATE OR REPLACE FUNCTION horas_restantes(sprint_name VARCHAR, prev_horas_entregues NUMERIC, datas DATE)
RETURNS TABLE(dia DATE, horas_restantes NUMERIC) AS $$

DECLARE
    -- Set cursor to run through all days of each sprint
    tbl_cursor CURSOR FOR (
        SELECT s.name AS sprint, data.data :: DATE
        FROM public.sprints AS s
        CROSS JOIN generate_series(s."startDate", s."endDate", interval '1 day') AS data
        WHERE EXTRACT(dow FROM data.data) NOT IN (6, 0) AND s.name = sprint_name
        ORDER BY data
    );
    tbl_row RECORD;
    -- Set initial value as estimated total for sprint
    horas_restantes NUMERIC := estimado_total(sprint_name);
    -- Define control variables so that in each iteration can be select the previous day's delivered hours
    horas_entregues NUMERIC := 0;
    current_value NUMERIC := 0;

BEGIN

    OPEN tbl_cursor;

    LOOP
        -- Set stopping criteria for the cursor
        FETCH FROM tbl_cursor INTO tbl_row;
        EXIT WHEN NOT FOUND;
        -- Select delivered hours
        SELECT COALESCE(SUM((issues.fields ->> 'timeoriginalestimate')::INTEGER / 3600), 0)
        INTO horas_entregues
        FROM issues
        WHERE to_date(issues.fields ->> 'resolutiondate', 'YYYY-MM-DD') = tbl_row.data;
        -- Subtract the value in the remaining hours column from the value of delivered hours
        horas_restantes := horas_restantes - current_value;
        -- Store the value of delivered hours on date x, so that in the next interaction this value 
        -- will be subtracted, representing the value of hours delivered on the previous date (x-1)
        current_value := horas_entregues;
        -- Returning the date and remaining hours in each iteration
        RETURN QUERY SELECT tbl_row.data, horas_restantes;
    END LOOP;

    CLOSE tbl_cursor;

END;

$$ LANGUAGE plpgsql;

7.5 View ¶

CREATE OR REPLACE VIEW public.vw_dadosburndown AS
WITH dados_sprint AS (
    SELECT 
        s.name AS sprint, 
        data.data :: DATE
    FROM public.sprints AS s
    CROSS JOIN generate_series(s."startDate", s."endDate", INTERVAL '1 day') AS data
    WHERE EXTRACT(dow FROM data.data) NOT IN (6, 0)
    ORDER BY data
),
dados_horas_entregues AS(
    SELECT 
        SUM((CAST((fields -> 'timeoriginalestimate') AS INTEGER))/3600) AS horas_entregues,
        to_date(fields ->> 'resolutiondate', 'YYYY-MM-DD') AS data_encerramento
    FROM public.issues
    GROUP BY data_encerramento
),
prev_dados_tempo_gasto AS(
    SELECT
        CAST((jsonb_array_elements(fields -> 'worklog' -> 'worklogs') -> 'timeSpentSeconds') AS INTEGER)/3600 AS tempo_gasto,
        to_date(jsonb_array_elements(fields -> 'worklog' -> 'worklogs') ->> 'updated', 'YYYY-MM-DD') AS dia_gasto
    FROM public.issues
),
dados_tempo_gasto AS(   
    SELECT
        dia_gasto,
        SUM(tempo_gasto) AS tempo_gasto
    FROM prev_dados_tempo_gasto
    GROUP BY dia_gasto
)
SELECT
    ds.sprint,
    ds.data,
    dhe.horas_entregues,
    dtg.tempo_gasto,
    plan.planejado,
    he.horas_restantes
FROM dados_sprint AS ds
LEFT JOIN LATERAL planejado_ideal(ds.sprint) AS plan ON ds.data = plan.dia
LEFT JOIN dados_horas_entregues AS dhe ON ds.data = dhe.data_encerramento
LEFT JOIN dados_tempo_gasto AS dtg ON ds.data = dtg.dia_gasto
JOIN horas_restantes(ds.sprint, dhe.horas_entregues, ds.data) AS he ON ds.data = he.dia
ORDER BY ds.sprint,ds.data;

8. Data visualization ¶

In pgAdmin, the view returns the following table with data. Next, Postgresql will be connected to Power Bi to build the Burndown graph with time spent.

English-DadosView.png

8.1 Connection to Power Bi ¶

In Power Bi, the source "PostgreSQL database" must be accessed. For the project in question, Power BI was accessed by another machine, so on the server, it is necessary to configure the IP and port (if a different port was mapped when creating the container):

English-ConexãoPowerBI_1.png

Afterward, enter username and password according to the environment variables defined in the container creation command:

English-ConexãoPowerBI_2.png

And finally select the view created from the available tables:

English-ConexãoPowerBI_3.png

8.2 Burndown with Time Spent Chart ¶

English-Burndown.png

As can be seen in the graph, for the sprint in question, deliveries were late (hours remaining above the planned line) until 10/06/2024. Then there is a record of more time spent on that day and the next, which may represent the improvement in performance in the last days of the sprint. Furthermore, another advantage of keeping track of the time spent per day can be the help in maturity of the team, when you do not yet have good time control.

9. Updating the data ¶

To update the data, you can schedule it on Airbyte. However, for this project, a small adaptation will be necessary. The "sprints" table does not have a column that would make it possible to perform an incremental append of data, so the synchronization mode is "Full refresh | Overwrite". In this mode, Airbyte drops (deletes) the table and then builds it again with the updated data. However, as the "sprints" table is used in the database view, it cannot be deleted. To overcome this issue, the proposed solution was to schedule the drop and creation of the view with CRON in the container and the Airbyte update be made between these two commands.

This is why, when creating the container, the files drop_view.sh, create_view.sh and crontab were copied, which represent, respectively, the files with the commands to delete the view, create the view and scheduling in CRON. The next topics bring these codes.

9.1 Command to delete view (drop_view.sh) ¶

#!/bin/bash

# Set environment variables for PostgreSQL
export PGHOST=localhost
export PGUSER=user
export PGPASSWORD=XXXXX
export PGDATABASE=DashboardScrum

# Execute the drop view command
psql -c "DROP VIEW public.vw_dadosburndown;"

9.2 Command to create view (create_view.sh) ¶

#!/bin/bash

# Set environment variables for PostgreSQL
export PGHOST=localhost
export PGUSER=gpb
export PGPASSWORD=gpb5579
export PGDATABASE=DashboardScrum

# Variable with view creation
VIEW_DEFINITION="
CREATE OR REPLACE VIEW public.vw_dadosburndown AS
WITH dados_sprint AS (
  SELECT 
      s.name AS sprint, 
      data.data :: DATE
  FROM public.sprints AS s
  CROSS JOIN generate_series(s.\"startDate\", s.\"endDate\", INTERVAL '1 day') AS data
  WHERE EXTRACT(dow FROM data.data) NOT IN (6, 0)
  ORDER BY data
),
dados_horas_entregues AS(
  SELECT 
    SUM((CAST((fields -> 'timeoriginalestimate') AS INTEGER))/3600) AS horas_entregues,
    to_date(fields ->> 'resolutiondate', 'YYYY-MM-DD') AS data_encerramento
  FROM public.issues
  GROUP BY data_encerramento
),
prev_dados_tempo_gasto AS(
  SELECT
    CAST((jsonb_array_elements(fields -> 'worklog' -> 'worklogs') -> 'timeSpentSeconds') AS INTEGER)/3600 AS tempo_gasto,
    to_date(jsonb_array_elements(fields -> 'worklog' -> 'worklogs') ->> 'updated', 'YYYY-MM-DD') AS dia_gasto
  FROM public.issues
),
dados_tempo_gasto AS( 
  SELECT
    dia_gasto,
    SUM(tempo_gasto) AS tempo_gasto
  FROM prev_dados_tempo_gasto
  GROUP BY dia_gasto
)
SELECT
  ds.sprint,
  ds.data,
  dhe.horas_entregues,
  dtg.tempo_gasto,
  plan.planejado,
  he.horas_restantes
FROM dados_sprint AS ds
LEFT JOIN LATERAL planejado_ideal(ds.sprint) AS plan ON ds.data = plan.dia
LEFT JOIN dados_horas_entregues AS dhe ON ds.data = dhe.data_encerramento
LEFT JOIN dados_tempo_gasto AS dtg ON ds.data = dtg.dia_gasto
JOIN horas_restantes(ds.sprint, dhe.horas_entregues, ds.data) AS he ON ds.data = he.dia
ORDER BY ds.sprint,ds.data;"

# Executing the view creation command
psql -c "$VIEW_DEFINITION"

9.3 Scheduling command in CRON (crontab) ¶

0 3 * * 1-5 /tmp/drop_view.sh >> /tmp/cron_drop.log 2>&1
0 5 * * 1-5 /tmp/create_view.sh >> /tmp/cron_create.log 2>&1

This way, the scripts will be executed at 3:00 am and 5:00 am from Monday to Friday, and their outputs/errors will be recorded in the cron_drop.log and cron_create.log files in the tmp folder of the container.

Note: using >> all logs will be recorded. Using > will only record the log of the last execution.

9.4 Starting CRON in the container ¶

To activate the CRON commands, it is necessary to enter the container terminal and, using the bash language, execute the following command:

service cron start

To check if CRON is actually running, you can use the command:

service cron status

9.5 Scheduling update in Airbyte ¶

Finally, you need to schedule the update in Airbyte, which can be done in Settings, on the connection, as shown in the following image:

Autalização Airbyte.png

This way, at 3:00 am the view will be deleted from the database, at 4:00 am Airbyte will update the database and at 5:00 am the view will be recreated, leaving the data available to be updated in Power BI normally. This is just a configuration suggestion, scheduling can be changed depending on the company's needs.

10. Conclusion ¶

The Burndown with time spent chart is just an example of what can be done by obtaining raw data from Jira and transforming it, making it possible to adapt this transformation according to the needs and business rules of each company. It is worth remembering that there are plug-ins and applications that can be added to Jira for customized charts, however they are not always free and bringing the solution in this way, the flexibility to build the charts and to implement the company's business rules is much bigger.

Contact¶

If you have any questions, suggestions or just want to talk about this project, I am at your disposal.

E-mail: gabrielapbarros15@gmail.com

LinkedIn: https://www.linkedin.com/in/gabriela-pereira-barros/