Module 2 · Build Phase 1

Data Ingestion

Real-world SaaS deployments rarely deal with a single data shape. UrbanPulse needs to bring in file-based, relational, and NoSQL data, and Fabric provides a distinct ingest pattern for each. This module walks through all three:

Upload - load a facility catalog Parquet file directly into the Lakehouse Files zone, then convert to a managed Delta table.
Shortcut - mount the same facility catalog from Azure Blob Storage as a virtualized OneLake folder (zero copy).
Mirror - replicate operational databases - Azure SQL (Hospitals / Wards / Staff) and Cosmos DB for NoSQL (regions, train routes) - into Fabric with zero-ETL CDC.

All three patterns converge on a single Bronze Lakehouse with a single SQL endpoint, ready for downstream analytics and AI in later modules.

⏱ 45 minutes

🎯 Goal: 3 ingest patterns, 1 Lakehouse

✅ Requirement: Import from at least one data source

🛠 Upload · Mirror · Shortcut

Learning Objectives

Upload a Parquet facility catalog + photo cards from the lab repo into the Lakehouse Files zone.
Convert raw Parquet into managed Delta tables with one click.
Mirror an Azure SQL database into Fabric - zero-ETL, near-real-time.
Mirror a Cosmos DB for NoSQL database (regions, train routes) into Fabric.
Run a single SQL query that joins all three sources via the Lakehouse SQL endpoint.

Why Three Ingest Patterns?

Real SaaS deployments hit data in many shapes. Fabric's value is that you don't have to pick one tool - you pick the right pattern per source and they all converge on OneLake.

Figure 2.1 - Three ingest patterns, one Lakehouse, one consumer surface. The "OneLake" promise.

Step 1 - Create the Lakehouse & Upload Parquet UPLOAD

Create your Bronze Lakehouse
From your workspace home, click + New item → type "Lakehouse" in the search box → pick Lakehouse.

In the dialog:
- Name: lh_urbanpulse_bronze
- Lakehouse schemas (preview): leave unchecked - this lab uses the classic Tables/ + Files/ layout
Click Create. Fabric opens the new, empty Lakehouse Explorer.

📷

SCREENSHOT NEEDED

module-2/00-create-lakehouse.png

"New Lakehouse" dialog with name lh_urbanpulse_bronze filled in

ⓘ
Creating a Lakehouse also auto-provisions an SQL analytics endpoint and a default semantic model alongside it. You'll see all three appear in your workspace - that's expected.
Confirm the empty Lakehouse Explorer

The Explorer pane should now show two top-level folders: Tables (empty) and Files (empty). You're ready to ingest.

Bring the facility catalog into the Lakehouse

The facility catalog is provisioned in an Azure Storage account. The container facilitycatalog contains:

facility_catalog.parquet - the 5-row facility table you'll promote to Delta
facilities/ - a folder of 5 facility photo JPGs (referenced by the photo_path column)

Fabric supports two patterns for getting this data into your Lakehouse. Both are valid; pick the one that matches your scenario:

Pattern	When to use it	Data movement
Option A - Shortcut	Source data already lives in ADLS Gen2 / Blob Storage and you want a live, virtualized view with no copy. Recommended for production SaaS deployments.	Zero copy - data stays in the source storage account; OneLake reads it on demand.
Option B - Upload	You need a local copy in OneLake (e.g., the source will be decommissioned, or you want full control over the file lifecycle).	One-time copy from local disk into the Lakehouse `Files` zone.

Option A - Create a OneLake shortcut to the storage account

This option requires a short-lived Shared Access Signature (SAS) token on the container. Generate it first, then create the shortcut.

Generate the SAS token (Azure portal):

Open the storage account from your lab handout → Storage browser → Blob containers → right-click facilitycatalog → Generate SAS.
Signing key: Key 1. Permissions: select Read and List.
Expiry: set far enough out to cover the full lab session (e.g., end of day or end of week).
Allowed protocols: HTTPS only.
Click Generate SAS token and URL, then copy the Blob SAS token value (the string starting with sp=r&st=...; not the full URL).

Create the shortcut (Fabric Lakehouse Explorer):

Right-click Files → New shortcut → Azure Blob Storage.
Account name or URL: paste the container URL, e.g. https://<storage-acct>.blob.core.windows.net/facilitycatalog.
Connection: select Create new connection.
Authentication kind: Shared Access Signature (SAS).
SAS token: paste the token you copied from the Azure portal.
Click Next. Give the shortcut the name facilitycatalog and leave the sub-path blank to mount the container root.
Click Create. The shortcut appears under Files/facilitycatalog/ with both facility_catalog.parquet and facilities/ visible immediately - no copy, no upload.

ⓘ

Why SAS instead of Account key? Many enterprise storage accounts have shared-key access disabled as a security baseline; SAS tokens provide narrowly-scoped, time-bounded access that works in those environments. Organizational account (Entra) authentication is also supported if your identity has Storage Blob Data Reader on the container.

📷

SCREENSHOT NEEDED

module-2/02-shortcut-adls.png

Lakehouse Files explorer with the facilitycatalog shortcut expanded, showing facility_catalog.parquet and facilities/ mounted from Azure Blob Storage

📷

SCREENSHOT NEEDED

module-2/03-shortcut-files-tree.png

Lakehouse Files tree after the shortcut is created, with facilitycatalog visible alongside the standard Files folder

Option B - Download from Azure, then upload to the Lakehouse

Two ways to grab the source files:

From the lab guide (easiest): open the Data Setup appendix and download the urbanpulse-lab-data.zip bundle. The Parquet file and the five facility photos are inside it.
From the Azure portal: open the storage account listed in your lab handout → Storage browser → Blob containers → facilitycatalog. Select facility_catalog.parquet and the facilities/ folder → Download.

In the Fabric Lakehouse Explorer, right-click Files → Upload → Upload files for the parquet, and Upload folder for the facilities directory.

📷

SCREENSHOT NEEDED

module-2/01-files-upload.png

Lakehouse Files explorer with facility_catalog.parquet and facilities / uploaded directly under Files

ⓘ

Either path produces a queryable Parquet file. The remaining steps in this section work the same way regardless of which option you chose - the only difference is that with Option A the file lives at Files/facilitycatalog/facility_catalog.parquet and with Option B it lives at Files/facility_catalog.parquet.

Promote the Parquet to a Delta table

In Files, right-click facility_catalog.parquet (whether under the shortcut folder or directly under Files) → Load to Tables → New table.

Name it facility_catalog and click Load.

ⓘ
Behind the scenes Fabric runs a Spark job that reads the Parquet, applies the schema, and writes a managed Delta table to Tables/facility_catalog. Takes ~30 seconds.

📷

SCREENSHOT NEEDED

module-2/04-load-to-tables.png

Load to Tables dialog with the table name set to facility_catalog before clicking Load
Preview the table

Refresh the Tables list. Click facility_catalog. You should see rows with columns including facility_id, facility_name, region_id, photo_path, beds_total, beds_icu.

📷

SCREENSHOT NEEDED

module-2/05-table-preview.png

Lakehouse table preview pane showing rows of facility_catalog with all columns

Step 2 - Mirror an Azure SQL Database MIRROR

Mirroring brings your operational SQL data into Fabric continuously, with no ETL pipelines to author. Fabric reads the source's CDC log and writes Delta into OneLake.

✨

Using the shared Fabric lab workspace? Shortcut the mirror instead of creating it. If your instructor has already mirrored the Azure SQL data into a shared Fabric workspace that you have member access to, you do not need to create your own Mirrored database. Skip the create-and-connect steps below and shortcut the existing mirror straight into your Lakehouse:

Open lh_urbanpulse_bronze. Right-click Tables → New shortcut → Microsoft OneLake.
In the data-source picker, select the shared workspace, then the mirrored database (for example mir_urbanpulse_cityops).
Select Hospitals, Wards, and Staff and click Create.

The tables appear in your Bronze Lakehouse with no copy and no source credentials. Member access on the shared workspace is what lets you create the OneLake shortcut. Then jump ahead to Step 4.

Create a Mirrored Database

From the workspace, click + New item → search "Mirrored". Choose Mirrored Azure SQL Database.

Name it: mir_urbanpulse_cityops.
Connect to the lab SQL DB
Open the Azure SQL database resource in the Azure portal under your assigned resource group. Copy the server name, database name, and SQL admin credentials from the resource blades (see Module 0 · Setup & Environment Check) and paste them into the connection wizard:
Connection
```
Server:   (from Azure portal → SQL database → Connection strings)
Database: (from Azure portal → SQL database overview)
Auth:     SQL authentication
Username: (SQL admin login from your SQL server)
Password: (SQL admin password from your SQL server)
```
Choose tables to mirror

Tick all three: Hospitals, Wards, Staff. Click Mirror database.

📷

SCREENSHOT NEEDED

module-2/02-mirror-tables.png

Mirror table picker with Hospitals, Wards, Staff selected
Watch the initial snapshot

The mirror creates a snapshot first, then switches to incremental CDC. Status pane should reach Replicated in under two minutes.

📷

SCREENSHOT NEEDED

module-2/03-mirror-sql-replicated.png

Mirrored Azure SQL replication status page showing Hospitals, Wards, and Staff with status Replicated and row counts populated

✓
Why this matters for ISVs: When you onboard a new customer, mirroring lets you pull their operational SQL data without deploying anything to their environment. No agents, no scheduled exports, no broken pipelines.
Shortcut the mirrored tables into the Bronze Lakehouse

Open lh_urbanpulse_bronze. Right-click Tables → New shortcut → Microsoft OneLake. Pick mir_urbanpulse_cityops and select Hospitals, Wards, and Staff. Click Create.

The three SQL tables now appear in the Bronze Lakehouse alongside the Parquet upload, ready to be joined with the Cosmos and uploaded data in a single SQL query.

📷

SCREENSHOT NEEDED

module-2/04-mirror-sql-shortcut.png

Bronze Lakehouse with OneLake shortcuts to Hospitals, Wards, and Staff from mir_urbanpulse_cityops visible in the Tables tree

ⓘ
Why two items? A Mirrored database is its own Fabric item with its own OneLake folder. The OneLake-to-OneLake shortcut into the Bronze Lakehouse is the standard pattern for keeping mirrors and uploaded data in one queryable surface without copying.

Step 3 - Mirror a Cosmos DB for NoSQL Database MIRROR

Mirroring brings Cosmos DB documents into Fabric continuously with no ETL pipelines to author. Fabric reads the Cosmos change feed and lands the documents as managed Delta tables in OneLake, queryable through the same SQL endpoint as your Lakehouse.

✨

Using the shared Fabric lab workspace? Shortcut the mirror instead of creating it. If your instructor has already mirrored the Cosmos DB data into a shared Fabric workspace that you have member access to, skip the create-and-connect steps below and shortcut the existing mirror into your Lakehouse:

Open lh_urbanpulse_bronze. Right-click Tables → New shortcut → Microsoft OneLake.
In the data-source picker, select the shared workspace, then the mirrored database (for example mir_urbanpulse_routing).
Select regions and trainRoutes and click Create.

Both containers appear in your Bronze Lakehouse with no copy and no Cosmos keys. Member access on the shared workspace is what lets you create the OneLake shortcut. Then continue to Step 4.

Create a Mirrored Cosmos DB

From the workspace, click + New item → search "Mirrored". Choose Mirrored Azure Cosmos DB.

Name it: mir_urbanpulse_routing.
Connect to the Cosmos account

Open your Azure Cosmos DB account in the Azure portal (under your assigned resource group). Go to Settings → Keys, copy the URI and primary key (or read-only key), and paste them into the connection pane. See Module 0 · Setup & Environment Check for context.

Choose database: urbanpulse.

📷

SCREENSHOT NEEDED

module-2/06-mirror-cosmos-connect.png

Mirrored Cosmos DB connection pane with the endpoint and key entered, and the urbanpulse database selected
Select the containers to mirror

Tick regions and trainRoutes. Click Mirror database.

Initial snapshot lands within ~1 minute; the change feed keeps the Delta copy current after that.

📷

SCREENSHOT NEEDED

module-2/07-mirror-cosmos-containers.png

Container picker with regions and trainRoutes selected, ready to click Mirror database
Confirm replication is running

Open the Mirrored database item. The Replication status page lists both containers as replicated tables once the initial snapshot completes. Status should read Running with row counts populated for regions and trainRoutes.

📷

SCREENSHOT NEEDED

module-2/08-mirror-cosmos-replication-status.png

Replication status page showing regions and trainRoutes replicated with status Running
Shortcut the mirrored tables into the Bronze Lakehouse

Open lh_urbanpulse_bronze. Right-click Tables → New shortcut → Microsoft OneLake. Pick mir_urbanpulse_routing and select both regions and trainRoutes. Click Create.

This makes both Cosmos collections appear as tables in the same Bronze Lakehouse alongside the Parquet upload and the SQL mirror, so a single SQL query in the next step can join all three sources.

📷

SCREENSHOT NEEDED

module-2/09-mirror-cosmos-shortcut.png

Bronze Lakehouse with OneLake shortcuts to regions and trainRoutes from mir_urbanpulse_routing visible in the Tables tree

ⓘ
Why two items? A Mirrored database is its own Fabric item with its own OneLake folder. The OneLake-to-OneLake shortcut into the Bronze Lakehouse is the standard pattern for keeping mirrors and uploaded data in one queryable surface without copying.

Step 4 - One SQL Query Across All Three Sources

The payoff. Three different ingest mechanisms, but to a downstream consumer they're all just tables in your Lakehouse SQL endpoint.

Switch to the SQL analytics endpoint

Top-right of the Lakehouse, click the dropdown that says Lakehouse and choose SQL analytics endpoint.

Run the unified query

Click New SQL query and paste:

T-SQL

-- A query that proves the three sources unify into one Lakehouse view.
-- Hospital, its mirrored ward count, its region, and the train routes that serve that region.
SELECT
    h.hospitalId,
    h.name              AS hospital_name,
    h.facilityType,
    f.beds_total,
    f.beds_icu,
    f.photo_path        AS sample_photo,                 -- uploaded Parquet → Delta
    (
        SELECT COUNT(*) FROM dbo.Wards w WHERE w.hospitalId = h.hospitalId
    )                   AS ward_count,                   -- mirrored from Azure SQL
    r.name              AS region_name,                  -- mirrored from Cosmos
    r.population        AS region_population,
    (
        SELECT STRING_AGG(tr.color, ', ')
        FROM trainRoutes tr
        CROSS APPLY OPENJSON(tr.regionsServed) WITH (rid VARCHAR(16) '$') s
        WHERE s.rid = h.regionId
    )                   AS train_lines_serving_region
FROM dbo.Hospitals      AS h                              -- mirrored from Azure SQL
JOIN dbo.facility_catalog AS f ON f.facility_id = h.hospitalId  -- uploaded Parquet → Delta
JOIN regions     AS r  ON r.regionId   = h.regionId      -- mirrored from Cosmos
ORDER BY h.hospitalId;

Inspect the result

You should see one row per hospital, with bed counts from the uploaded Parquet, ward counts from the mirrored SQL DB, and region + train-line info from the mirrored Cosmos DB. That's three different physical data stores in one query.

Validation

Lakehouse Tables shows facility_catalog (uploaded), Hospitals/Wards/Staff (mirrored from SQL), and regions + trainRoutes (mirrored from Cosmos).
The unified query above returns one row per hospital.
OneLake explorer shows all six tables in one workspace, surfaced through one Lakehouse SQL endpoint.

What You Just Built

Object	Type	How it got there	Use case
`facility_catalog`	Delta	Parquet file upload + Load to Tables	Static reference data, region/photo lookups
`Hospitals` / `Wards` / `Staff`	Mirrored Delta	Mirrored Azure SQL DB (CDC)	Operational OLTP data, near-real-time
`regions` / `trainRoutes`	Mirrored Delta	Mirrored Azure Cosmos DB for NoSQL (change feed)	NoSQL data (geo + nested arrays), zero-ETL CDC

Module 3 brings in the real-time half of the medallion: live Event Hub data flowing through Eventstream into the Eventhouse.

References

What is Microsoft Fabric? - platform overview and core concepts.
OneLake Overview - the unified data lake, workspaces, and architecture.
OneLake Shortcuts - reference external data without copying it.
Mirroring in Microsoft Fabric - near real-time replication of operational databases.

Data Ingestion

Learning Objectives

Why Three Ingest Patterns?

Step 1 - Create the Lakehouse & Upload Parquet UPLOAD

Create your Bronze Lakehouse

Confirm the empty Lakehouse Explorer

Bring the facility catalog into the Lakehouse

Option A - Create a OneLake shortcut to the storage account

Option B - Download from Azure, then upload to the Lakehouse

Promote the Parquet to a Delta table

Preview the table

Step 2 - Mirror an Azure SQL Database MIRROR

Create a Mirrored Database

Connect to the lab SQL DB

Choose tables to mirror

Watch the initial snapshot

Shortcut the mirrored tables into the Bronze Lakehouse

Step 3 - Mirror a Cosmos DB for NoSQL Database MIRROR

Create a Mirrored Cosmos DB

Connect to the Cosmos account

Select the containers to mirror

Confirm replication is running

Shortcut the mirrored tables into the Bronze Lakehouse

Step 4 - One SQL Query Across All Three Sources

Switch to the SQL analytics endpoint

Run the unified query

Inspect the result

Validation

What You Just Built

References