---
config:
theme: default
look: classic
---
flowchart BT
subgraph s5["Google BigQuery"]
n51["Views"]
n52["Tables"]
end
subgraph s6["Extract and transform"]
n11["ETL scripts"]
n17["dbt"]
end
subgraph s7["Data sources"]
n61["DBs"]
n62["APIs"]
n63["Files"]
n64["..."]
end
n5["Airflow"] -.-> n11 & n17 & s3["Power BI"] & n20["ML models"]
n52 -- Data --> n20 & n51
n51 -- Data --> s3
n11 -- Table data --> n52
n17 -- Models --> n51
n20 -- Forecasts --> n52
s7 -- Data --> n11
style s5 fill:transparent
style s6 fill:transparent
style s7 fill:transparent
Intro
In this post, I’d like to present a blueprint for a BI system that I’ve been working on. The system is designed to provide insights into the company’s performance and support data-driven decision-making.
Architecture
The BI system primarily revolves around Microsoft Power BI, which serves as the key instrument for creating dozens of reports with hundreds of indicators. Data is sourced and structured using various tools and pipelines, with Google BigQuery acting as the central data warehouse.
The system consists of several modules that work together to deliver actionable insights to end-users. The flowchart below illustrates the main components of the system and their interactions.
Modules
Airflow
The ETL process is orchestrated by Apache Airflow, an open-source platform for programmatically authoring, scheduling, and monitoring workflows. Some tasks are triggered by the arrival of new data, while others run at predefined intervals.
Extract and Transform
The first step in the process is collecting data from various sources such as databases, APIs, and files. The collected data is then stored in Google BigQuery.
---
config:
theme: default
look: classic
---
flowchart BT
subgraph s7["PBX"]
n53["Records"]
n12["CDR DB"]
end
subgraph s8["ERP"]
n1["ERP reporting<br>subsystem"]
end
n1 -- CSV files --> n59["SMB share"]
n6["Helpdesk software"] -- Tickets --> n11["ETL scripts"]
n7["PM software"] -- Tasks --> n11
n8["CMS"] -- SKU metadata --> n11
n9["IMAP"] -- Messages --> n11
n12 -- Calls metadata --> n11
n53 -- WAV files --> n54["Text to Speech"]
n11 -- Table data --> n57["Google BigQuery"]
n54 -- Transcribed<br>records --> n11
n58["Google Analytics"] -- Events flow --> n57
n59 -- Reports --> n11
style s7 fill:transparent
style s8 fill:transparent
dbt
While the data in BigQuery is already structured, further transformation may be required to optimize it for analysis. dbt (data build tool) facilitates this by allowing you to define transformations, often as SQL views, and execute them in a reproducible manner.
ML models
In addition to traditional BI analytics, the system incorporates machine learning models to predict future outcomes based on historical data. The predicted values are stored in BigQuery and integrated into the reporting process.
Power BI
The final step is visualizing the data using Power BI, a business analytics tool that provides interactive visualizations and business intelligence capabilities.
The data is fetched from BigQuery and stored in Power BI datasets. Although a direct connection to BigQuery is available, importing the data is necessary to fully leverage the DAX language for creating complex calculations. Data refreshes are triggered by Airflow DAGs after the ETL tasks are completed.
---
config:
theme: default
look: classic
---
flowchart BT
subgraph s3["Datasets"]
n22["Sales"]
n23["Supply"]
n24["Finance"]
n25["HR"]
end
subgraph s4["Reports and dashboards"]
n26["Report 2"]
n27["Report 3"]
n29["Report 1"]
n40["Dashboard 1"]
n42["..."]
n41["Report n"]
end
subgraph s5["Google BigQuery"]
n51["Views"]
end
n51 --> n24 & n22 & n25 & n23
n22 --> n26 & n40
n23 --> n27 & n40 & n41
n24 --> n29
n25 --> n40
style s3 fill:transparent
style s4 fill:transparent
style s5 fill:transparent
Conclusion
In summary, the BI system outlined in this post is purpose-built to deliver deep insights into the company’s performance and empower data-driven decision-making. By integrating tools like ETL pipelines, Airflow for orchestration, dbt for data transformation, and Power BI for advanced analytics and visualization, it establishes a scalable and robust analytics platform tailored to meet evolving business needs.
---
config:
theme: default
look: classic
---
flowchart BT
subgraph s3["Datasets"]
n22["Sales"]
n23["Supply"]
n24["Finance"]
n25["HR"]
end
subgraph s4["Reports and dashboards"]
n26["Report 1"]
n27["Report 2"]
n29["..."]
end
subgraph s5["Google BigQuery"]
n51["Views"]
n52["Tables"]
end
subgraph s7["PBX"]
n53["Records"]
n12["CDR DB"]
end
subgraph s8["ERP system"]
n1["ERP reporting<br>subsystem"]
end
n51 --> s3
s3 --> s4
n1 -- CSV files --> n55["SMB share"]
n6["Helpdesk software"] -- Tickets --> n11["ETL scripts"]
n7["PM software"] -- Tasks --> n11
n8["CMS"] -- SKU metadata --> n11
n9["IMAP"] -- Messages --> n11
n12 -- Calls metadata --> n11
n11 -- Table data --> n52
n17["dbt"] -- Models --> n51
n52 -- Data --> n51 & n20["ML models"]
n20 -- Forecasts --> n52
n53 -- WAV files --> n54["Text to Speech"]
n54 -- Transcribed<br>records --> n11
n55 -- Reports --> n11
style s3 fill:transparent
style s4 fill:transparent
style s5 fill:transparent
style s7 fill:transparent
style s8 fill:transparent