BI System Blueprint

BI
ETL
Author

Aleksei

Published

January 6, 2025

Intro

In this post, I’d like to present a blueprint for a BI system that I’ve been working on. The system is designed to provide insights into the company’s performance and support data-driven decision-making.

Architecture

The BI system primarily revolves around Microsoft Power BI, which serves as the key instrument for creating dozens of reports with hundreds of indicators. Data is sourced and structured using various tools and pipelines, with Google BigQuery acting as the central data warehouse.

The system consists of several modules that work together to deliver actionable insights to end-users. The flowchart below illustrates the main components of the system and their interactions.

---
config:
  theme: default
  look: classic
---
flowchart BT
 subgraph s5["Google BigQuery"]
        n51["Views"]
        n52["Tables"]
  end
 subgraph s6["Extract and transform"]
        n11["ETL scripts"]
        n17["dbt"]
  end
 subgraph s7["Data sources"]
        n61["DBs"]
        n62["APIs"]
        n63["Files"]
        n64["..."]
  end
    n5["Airflow"] -.-> n11 & n17 & s3["Power BI"] & n20["ML models"]
    n52 -- Data --> n20 & n51
    n51 -- Data --> s3
    n11 -- Table data --> n52
    n17 -- Models --> n51
    n20 -- Forecasts --> n52
    s7 -- Data --> n11
    style s5 fill:transparent
    style s6 fill:transparent
    style s7 fill:transparent
Figure 1: Data and control flows in the BI system

Modules

Airflow

The ETL process is orchestrated by Apache Airflow, an open-source platform for programmatically authoring, scheduling, and monitoring workflows. Some tasks are triggered by the arrival of new data, while others run at predefined intervals.

Extract and Transform

The first step in the process is collecting data from various sources such as databases, APIs, and files. The collected data is then stored in Google BigQuery.

---
config:
  theme: default
  look: classic
---
flowchart BT
 subgraph s7["PBX"]
        n53["Records"]
        n12["CDR DB"]
  end
 subgraph s8["ERP"]
        n1["ERP reporting<br>subsystem"]
  end
    n1 -- CSV files --> n59["SMB share"]
    n6["Helpdesk software"] -- Tickets --> n11["ETL scripts"]
    n7["PM software"] -- Tasks --> n11
    n8["CMS"] -- SKU metadata --> n11
    n9["IMAP"] -- Messages --> n11
    n12 -- Calls metadata --> n11
    n53 -- WAV files --> n54["Text to Speech"]
    n11 -- Table data --> n57["Google BigQuery"]
    n54 -- Transcribed<br>records --> n11
    n58["Google Analytics"] -- Events flow --> n57
    n59 -- Reports --> n11
    style s7 fill:transparent
    style s8 fill:transparent
Figure 2: ETL subsystem

dbt

While the data in BigQuery is already structured, further transformation may be required to optimize it for analysis. dbt (data build tool) facilitates this by allowing you to define transformations, often as SQL views, and execute them in a reproducible manner.

ML models

In addition to traditional BI analytics, the system incorporates machine learning models to predict future outcomes based on historical data. The predicted values are stored in BigQuery and integrated into the reporting process.

Power BI

The final step is visualizing the data using Power BI, a business analytics tool that provides interactive visualizations and business intelligence capabilities.

The data is fetched from BigQuery and stored in Power BI datasets. Although a direct connection to BigQuery is available, importing the data is necessary to fully leverage the DAX language for creating complex calculations. Data refreshes are triggered by Airflow DAGs after the ETL tasks are completed.

---
config:
  theme: default
  look: classic
---
flowchart BT
 subgraph s3["Datasets"]
        n22["Sales"]
        n23["Supply"]
        n24["Finance"]
        n25["HR"]
  end
 subgraph s4["Reports and dashboards"]
        n26["Report 2"]
        n27["Report 3"]
        n29["Report 1"]
        n40["Dashboard 1"]
        n42["..."]
        n41["Report n"]
  end
 subgraph s5["Google BigQuery"]
        n51["Views"]
  end
    n51 --> n24 & n22 & n25 & n23
    n22 --> n26 & n40
    n23 --> n27 & n40 & n41
    n24 --> n29
    n25 --> n40
    style s3 fill:transparent
    style s4 fill:transparent
    style s5 fill:transparent
Figure 3: Power BI reports and dashboards

Conclusion

In summary, the BI system outlined in this post is purpose-built to deliver deep insights into the company’s performance and empower data-driven decision-making. By integrating tools like ETL pipelines, Airflow for orchestration, dbt for data transformation, and Power BI for advanced analytics and visualization, it establishes a scalable and robust analytics platform tailored to meet evolving business needs.

---
config:
  theme: default
  look: classic
---
flowchart BT
 subgraph s3["Datasets"]
        n22["Sales"]
        n23["Supply"]
        n24["Finance"]
        n25["HR"]
  end
 subgraph s4["Reports and dashboards"]
        n26["Report 1"]
        n27["Report 2"]
        n29["..."]
  end
 subgraph s5["Google BigQuery"]
        n51["Views"]
        n52["Tables"]
  end
 subgraph s7["PBX"]
        n53["Records"]
        n12["CDR DB"]
  end
 subgraph s8["ERP system"]
        n1["ERP reporting<br>subsystem"]
  end
    n51 --> s3
    s3 --> s4
    n1 -- CSV files --> n55["SMB share"]
    n6["Helpdesk software"] -- Tickets --> n11["ETL scripts"]
    n7["PM software"] -- Tasks --> n11
    n8["CMS"] -- SKU metadata --> n11
    n9["IMAP"] -- Messages --> n11
    n12 -- Calls metadata --> n11
    n11 -- Table data --> n52
    n17["dbt"] -- Models --> n51
    n52 -- Data --> n51 & n20["ML models"]
    n20 -- Forecasts --> n52
    n53 -- WAV files --> n54["Text to Speech"]
    n54 -- Transcribed<br>records --> n11
    n55 -- Reports --> n11

    style s3 fill:transparent
    style s4 fill:transparent
    style s5 fill:transparent
    style s7 fill:transparent
    style s8 fill:transparent
Figure 4: BI system overview