Modules 2 and 3 landed three classes of data into Fabric - mirrored operational tables, an uploaded facility catalog, and live KQL streams. None of it is yet shaped for the dashboards, semantic models, and agents you'll build in the next four modules.
Fabric gives you several ways to shape data once it has landed. This module presents four independent transformation paths. Pick whichever path you prefer - you only need one to produce curated output. The paths are fully optional and interchangeable; choose based on your team's tooling.
lh_urbanpulse_bronze from M2 (mirrors + uploaded catalog).There is no single required method for transformation in Fabric. Each path below is optional and self-contained - pick the one that matches your team's skills. You only need one to complete this module. Running more than one is purely for comparison.
What landed in M2 is operational shape: each mirrored table looks like the source system, and the uploaded facility catalog is denormalized for storage, not analysis. That's fine for an audit trail, but it's not what a Power BI report or a Data Agent should see. Operators ask questions like "how many beds are available in Region North?" - which spans Hospitals, Wards, and Regions, and requires consistent column names and types.
Transforming the Bronze layer into a curated Silver layer (clean, joined, typed) plus a Gold layer of BI-ready aggregates is what makes the rest of the lab possible. The medallion split also lets you evolve raw schemas without breaking downstream consumers.
| Approach | Best for | You'll use it in |
|---|---|---|
| Notebook (Spark Python / Spark SQL) | Joins, type cleanup, derived columns, ML prep, complex logic | Path A |
| SQL view on the SQL endpoint | Lightweight aggregates and friendly names for BI / agents | Path B |
| Materialized Lake View | SQL transforms you want persisted as Delta with managed refresh | Path D |
| Dataflow Gen2 (Power Query) | Visual, no-code prep - filtering, type changes, joins | Path C |
From your workspace, open lh_urbanpulse_bronze.
In the left-hand explorer you should see the tables you created in M2:
Hospitals, Wards, Staff — mirrored from Azure SQL.regions, trainRoutes — mirrored from Cosmos DB for NoSQL.facility_catalog — uploaded Parquet (or shortcut, depending on your M2 path).Click any table and use the Preview pane to confirm rows are present.
Stay in Lakehouse view and use table Preview to check data quality
before transforming. For example, verify that Hospitals.hospitalId lines up with
Wards.hospitalId, and that regions.regionId has expected values.
After this pre-check, choose one path below and execute only that path (or multiple paths for comparison).
Now you'll create a curated silver_hospital_facility Delta table that joins
Hospitals, Wards, and regions into a single analytics-friendly
shape. A Spark notebook is the right tool because the join spans tables originating from two different
mirrors (Azure SQL + Cosmos DB) and we want consistent typing on the way out.
From the workspace, click + New item → Notebook.
Name it nb_urbanpulse_silver.
In the left explorer of the notebook, click + Lakehouse and pick
lh_urbanpulse_bronze as the default Lakehouse. This makes
spark.sql("SELECT * FROM Hospitals") resolve without any extra path.
Paste the following into the first cell. This produces one wide row per ward, with the hospital and region attributes joined in:
silver_df = spark.sql("""
SELECT
w.wardId AS ward_id,
w.name AS ward_name,
w.capacity AS beds,
h.hospitalId AS hospital_id,
h.name AS hospital_name,
h.address AS hospital_address,
h.facilityType AS hospital_type,
r.regionId AS region_id,
r.name AS region_name,
CURRENT_TIMESTAMP() AS curated_at
FROM Wards w
JOIN Hospitals h ON h.hospitalId = w.hospitalId
LEFT JOIN regions r ON r.regionId = h.regionId
""")
silver_df.write \
.format("delta") \
.mode("overwrite") \
.option("overwriteSchema", "true") \
.saveAsTable("silver_hospital_facility")
Run the cell. The output should report a successful Delta write with a row count.
Add a second cell:
display(spark.sql("""
SELECT region_name, COUNT(*) AS wards, SUM(beds) AS total_beds
FROM silver_hospital_facility
GROUP BY region_name
ORDER BY total_beds DESC
"""))
You should see one row per region with the total number of wards and beds. This is the shape every downstream consumer (BI report, ontology, hospital agent) will read from.
saveAsTable registers
the table in the Lakehouse metastore so it shows up automatically in the SQL endpoint, in Power BI's
Direct Lake mode, and in Fabric Data Agents. No extra plumbing.
This path is SQL-first and independent of the notebook path. You'll create serving views directly from Bronze tables on the Lakehouse SQL analytics endpoint.
From the workspace, open lh_urbanpulse_bronze and switch the upper-right toggle
to SQL analytics endpoint.
Click New SQL query.
Paste and run:
CREATE OR ALTER VIEW gold_region_capacity_sqlpath AS
SELECT
r.regionId AS region_id,
r.name AS region_name,
COUNT(DISTINCT h.hospitalId) AS hospitals,
COUNT(*) AS wards,
SUM(w.capacity) AS total_beds
FROM Wards w
JOIN Hospitals h ON h.hospitalId = w.hospitalId
LEFT JOIN regions r ON r.regionId = h.regionId
GROUP BY r.regionId, r.name;
The transit topology lives in trainRoutes from the Cosmos mirror. Add a
lightweight Gold view for it too:
CREATE OR ALTER VIEW gold_transit_lines_sqlpath AS
SELECT
routeId AS line,
color AS line_color,
stopCount AS station_count
FROM trainRoutes;
SELECT * FROM gold_region_capacity_sqlpath ORDER BY total_beds DESC;
SELECT * FROM gold_transit_lines_sqlpath ORDER BY line;
Both should return rows. This confirms the SQL endpoint path works without running Path A.
This path is visual/no-code. Use it when transformation owners are analysts or low-code teams.
From your workspace, click + New item → Dataflow Gen2.
Name it df_urbanpulse_silver.
Add queries for Hospitals, Wards, and regions from lh_urbanpulse_bronze.
Merge Wards with Hospitals on hospitalId, then merge with regions on regionId = regionId.
Expand columns and rename fields to a serving shape (for example: capacity → beds, name → hospital_name, region name → region_name).
Set destination to Lakehouse table silver_hospital_facility_df and publish the dataflow.
Run the dataflow once and confirm row output in the Lakehouse.
This path is SQL-driven with persisted output and Fabric-managed refresh. Materialized Lake Views are defined with Spark SQL inside a notebook bound to the Lakehouse (they are not created on the read-only SQL analytics endpoint).
lh_urbanpulse_bronze from Module 2
has Lakehouse schemas turned off, so it cannot host an MLV directly. This path provisions a dedicated
schema-enabled gold Lakehouse and shortcuts the bronze tables into it (zero copy).
In your workspace, choose New item → Lakehouse. Name it
lh_urbanpulse_gold and check Lakehouse schemas (preview) before
creating it.
Confirm the workspace (and the notebook you use below) runs on Fabric Runtime 1.3, which is required for Materialized Lake Views.
In lh_urbanpulse_gold, expand Tables, then under the dbo
schema choose New shortcut → Microsoft OneLake.
Select lh_urbanpulse_bronze as the source and add shortcuts for
Hospitals, Wards, and regions. They land under
dbo with no copy.
Open a notebook with lh_urbanpulse_gold as the default Lakehouse, then run this
in a Spark SQL cell (or a Python cell wrapped in spark.sql(...)):
CREATE SCHEMA IF NOT EXISTS gold;
CREATE MATERIALIZED LAKE VIEW IF NOT EXISTS gold.region_capacity AS
SELECT
r.regionId AS region_id,
r.name AS region_name,
COUNT(DISTINCT h.hospitalId) AS hospitals,
COUNT(*) AS wards,
SUM(w.capacity) AS total_beds
FROM dbo.Wards w
JOIN dbo.Hospitals h ON h.hospitalId = w.hospitalId
LEFT JOIN dbo.regions r ON r.regionId = h.regionId
GROUP BY r.regionId, r.name;
Fabric manages refresh as underlying data changes. Validate with:
SELECT * FROM gold.region_capacity ORDER BY total_beds DESC;
Use these official docs if you want deeper implementation patterns for each path.
You only need the output from the one path you chose to pass this module:
silver_hospital_facility exists as a managed Delta table in the Bronze Lakehouse.gold_region_capacity_sqlpath and gold_transit_lines_sqlpath views return rows from the SQL endpoint.silver_hospital_facility_df exists from the Dataflow Gen2 run.gold.region_capacity returns rows from the Materialized Lake View, managed by lh_urbanpulse_gold.Module 5 reads these tables and views directly to build the live KQL dashboard, and Module 6 layers a Direct Lake semantic model on top.