--- config: theme: default look: classic --- flowchart BT subgraph s5["Google BigQuery"] n51["Views"] n52["Tables"] end subgraph s6["Extract and transform"] n11["ETL scripts"] n17["dbt"] end subgraph s7["Data sources"] n61["DBs"] n62["APIs"] n63["Files"] n64["..."] end n5["Airflow"] -.-> n11 & n17 & s3["Power BI"] & n20["ML models"] n52 -- Data --> n20 & n51 n51 -- Data --> s3 n11 -- Table data --> n52 n17 -- Models --> n51 n20 -- Forecasts --> n52 s7 -- Data --> n11 style s5 fill:transparent style s6 fill:transparent style s7 fill:transparent
Intro
In this post, I’d like to present a blueprint for a BI system that I’ve been working on. The system is designed to provide insights into the company’s performance and support data-driven decision-making.
Architecture
The BI system primarily revolves around Microsoft Power BI, which serves as the key instrument for creating dozens of reports with hundreds of indicators. Data is sourced and structured using various tools and pipelines, with Google BigQuery acting as the central data warehouse.
The system consists of several modules that work together to deliver actionable insights to end-users. The flowchart below illustrates the main components of the system and their interactions.
Modules
Airflow
The ETL process is orchestrated by Apache Airflow, an open-source platform for programmatically authoring, scheduling, and monitoring workflows. Some tasks are triggered by the arrival of new data, while others run at predefined intervals.
Extract and Transform
The first step in the process is collecting data from various sources such as databases, APIs, and files. The collected data is then stored in Google BigQuery.
--- config: theme: default look: classic --- flowchart BT subgraph s7["PBX"] n53["Records"] n12["CDR DB"] end subgraph s8["ERP"] n1["ERP reporting<br>subsystem"] end n1 -- CSV files --> n59["SMB share"] n6["Helpdesk software"] -- Tickets --> n11["ETL scripts"] n7["PM software"] -- Tasks --> n11 n8["CMS"] -- SKU metadata --> n11 n9["IMAP"] -- Messages --> n11 n12 -- Calls metadata --> n11 n53 -- WAV files --> n54["Text to Speech"] n11 -- Table data --> n57["Google BigQuery"] n54 -- Transcribed<br>records --> n11 n58["Google Analytics"] -- Events flow --> n57 n59 -- Reports --> n11 style s7 fill:transparent style s8 fill:transparent
dbt
While the data in BigQuery is already structured, further transformation may be required to optimize it for analysis. dbt (data build tool) facilitates this by allowing you to define transformations, often as SQL views, and execute them in a reproducible manner.
ML models
In addition to traditional BI analytics, the system incorporates machine learning models to predict future outcomes based on historical data. The predicted values are stored in BigQuery and integrated into the reporting process.
Power BI
The final step is visualizing the data using Power BI, a business analytics tool that provides interactive visualizations and business intelligence capabilities.
The data is fetched from BigQuery and stored in Power BI datasets. Although a direct connection to BigQuery is available, importing the data is necessary to fully leverage the DAX language for creating complex calculations. Data refreshes are triggered by Airflow DAGs after the ETL tasks are completed.
--- config: theme: default look: classic --- flowchart BT subgraph s3["Datasets"] n22["Sales"] n23["Supply"] n24["Finance"] n25["HR"] end subgraph s4["Reports and dashboards"] n26["Report 2"] n27["Report 3"] n29["Report 1"] n40["Dashboard 1"] n42["..."] n41["Report n"] end subgraph s5["Google BigQuery"] n51["Views"] end n51 --> n24 & n22 & n25 & n23 n22 --> n26 & n40 n23 --> n27 & n40 & n41 n24 --> n29 n25 --> n40 style s3 fill:transparent style s4 fill:transparent style s5 fill:transparent
Conclusion
In summary, the BI system outlined in this post is purpose-built to deliver deep insights into the company’s performance and empower data-driven decision-making. By integrating tools like ETL pipelines, Airflow for orchestration, dbt for data transformation, and Power BI for advanced analytics and visualization, it establishes a scalable and robust analytics platform tailored to meet evolving business needs.
--- config: theme: default look: classic --- flowchart BT subgraph s3["Datasets"] n22["Sales"] n23["Supply"] n24["Finance"] n25["HR"] end subgraph s4["Reports and dashboards"] n26["Report 1"] n27["Report 2"] n29["..."] end subgraph s5["Google BigQuery"] n51["Views"] n52["Tables"] end subgraph s7["PBX"] n53["Records"] n12["CDR DB"] end subgraph s8["ERP system"] n1["ERP reporting<br>subsystem"] end n51 --> s3 s3 --> s4 n1 -- CSV files --> n55["SMB share"] n6["Helpdesk software"] -- Tickets --> n11["ETL scripts"] n7["PM software"] -- Tasks --> n11 n8["CMS"] -- SKU metadata --> n11 n9["IMAP"] -- Messages --> n11 n12 -- Calls metadata --> n11 n11 -- Table data --> n52 n17["dbt"] -- Models --> n51 n52 -- Data --> n51 & n20["ML models"] n20 -- Forecasts --> n52 n53 -- WAV files --> n54["Text to Speech"] n54 -- Transcribed<br>records --> n11 n55 -- Reports --> n11 style s3 fill:transparent style s4 fill:transparent style s5 fill:transparent style s7 fill:transparent style s8 fill:transparent