Module 2 · Build Phase 1

Data Ingestion

Real-world SaaS deployments rarely deal with a single data shape. UrbanPulse needs to bring in file-based, relational, and NoSQL data, and Fabric provides a distinct ingest pattern for each. This module walks through all three:

All three patterns converge on a single Bronze Lakehouse with a single SQL endpoint, ready for downstream analytics and AI in later modules.

45 minutes
🎯 Goal: 3 ingest patterns, 1 Lakehouse
Requirement: Import from at least one data source
🛠 Upload · Mirror · Shortcut

Learning Objectives

Why Three Ingest Patterns?

Real SaaS deployments hit data in many shapes. Fabric's value is that you don't have to pick one tool - you pick the right pattern per source and they all converge on OneLake.

📁 Facility catalog Parquet file-based · static 🗄 Azure SQL · Hospitals relational · OLTP 🌐 Cosmos DB · Regions JSON · NoSQL · partitioned PATTERN: UPLOAD Files → Load to Tables (Delta) PATTERN: MIRROR CDC into managed Delta · zero-ETL PATTERN: MIRROR CDC into managed Delta · NoSQL source lh_urbanpulse_bronze OneLake · Delta tables single SQL endpoint single Direct Lake source SQL endpoint Notebooks Power BI
Figure 2.1 - Three ingest patterns, one Lakehouse, one consumer surface. The "OneLake" promise.

Step 1 - Create the Lakehouse & Upload Parquet UPLOAD

  1. Create your Bronze Lakehouse

    From your workspace home, click + New item → type "Lakehouse" in the search box → pick Lakehouse.

    In the dialog:

    • Name: lh_urbanpulse_bronze
    • Lakehouse schemas (preview): leave unchecked - this lab uses the classic Tables/ + Files/ layout

    Click Create. Fabric opens the new, empty Lakehouse Explorer.

    📷
    SCREENSHOT NEEDED
    module-2/00-create-lakehouse.png
    "New Lakehouse" dialog with name lh_urbanpulse_bronze filled in
    Creating a Lakehouse also auto-provisions an SQL analytics endpoint and a default semantic model alongside it. You'll see all three appear in your workspace - that's expected.
  2. Confirm the empty Lakehouse Explorer

    The Explorer pane should now show two top-level folders: Tables (empty) and Files (empty). You're ready to ingest.

  3. Bring the facility catalog into the Lakehouse

    The facility catalog is provisioned in an Azure Storage account. The container facilitycatalog contains:

    • facility_catalog.parquet - the 5-row facility table you'll promote to Delta
    • facilities/ - a folder of 5 facility photo JPGs (referenced by the photo_path column)

    Fabric supports two patterns for getting this data into your Lakehouse. Both are valid; pick the one that matches your scenario:

    Pattern When to use it Data movement
    Option A - Shortcut Source data already lives in ADLS Gen2 / Blob Storage and you want a live, virtualized view with no copy. Recommended for production SaaS deployments. Zero copy - data stays in the source storage account; OneLake reads it on demand.
    Option B - Upload You need a local copy in OneLake (e.g., the source will be decommissioned, or you want full control over the file lifecycle). One-time copy from local disk into the Lakehouse Files zone.

    Option A - Create a OneLake shortcut to the storage account

    This option requires a short-lived Shared Access Signature (SAS) token on the container. Generate it first, then create the shortcut.

    Generate the SAS token (Azure portal):

    1. Open the storage account from your lab handout → Storage browserBlob containers → right-click facilitycatalogGenerate SAS.
    2. Signing key: Key 1. Permissions: select Read and List.
    3. Expiry: set far enough out to cover the full lab session (e.g., end of day or end of week).
    4. Allowed protocols: HTTPS only.
    5. Click Generate SAS token and URL, then copy the Blob SAS token value (the string starting with sp=r&st=...; not the full URL).

    Create the shortcut (Fabric Lakehouse Explorer):

    1. Right-click FilesNew shortcutAzure Blob Storage.
    2. Account name or URL: paste the container URL, e.g. https://<storage-acct>.blob.core.windows.net/facilitycatalog.
    3. Connection: select Create new connection.
    4. Authentication kind: Shared Access Signature (SAS).
    5. SAS token: paste the token you copied from the Azure portal.
    6. Click Next. Give the shortcut the name facilitycatalog and leave the sub-path blank to mount the container root.
    7. Click Create. The shortcut appears under Files/facilitycatalog/ with both facility_catalog.parquet and facilities/ visible immediately - no copy, no upload.
    Why SAS instead of Account key? Many enterprise storage accounts have shared-key access disabled as a security baseline; SAS tokens provide narrowly-scoped, time-bounded access that works in those environments. Organizational account (Entra) authentication is also supported if your identity has Storage Blob Data Reader on the container.
    📷
    SCREENSHOT NEEDED
    module-2/02-shortcut-adls.png
    Lakehouse Files explorer with the facilitycatalog shortcut expanded, showing facility_catalog.parquet and facilities/ mounted from Azure Blob Storage
    📷
    SCREENSHOT NEEDED
    module-2/03-shortcut-files-tree.png
    Lakehouse Files tree after the shortcut is created, with facilitycatalog visible alongside the standard Files folder

    Option B - Download from Azure, then upload to the Lakehouse

    Two ways to grab the source files:

    • From the lab guide (easiest): open the Data Setup appendix and download the urbanpulse-lab-data.zip bundle. The Parquet file and the five facility photos are inside it.
    • From the Azure portal: open the storage account listed in your lab handout → Storage browserBlob containersfacilitycatalog. Select facility_catalog.parquet and the facilities/ folder → Download.

    In the Fabric Lakehouse Explorer, right-click FilesUploadUpload files for the parquet, and Upload folder for the facilities directory.

    📷
    SCREENSHOT NEEDED
    module-2/01-files-upload.png
    Lakehouse Files explorer with facility_catalog.parquet and facilities / uploaded directly under Files
    Either path produces a queryable Parquet file. The remaining steps in this section work the same way regardless of which option you chose - the only difference is that with Option A the file lives at Files/facilitycatalog/facility_catalog.parquet and with Option B it lives at Files/facility_catalog.parquet.
  4. Promote the Parquet to a Delta table

    In Files, right-click facility_catalog.parquet (whether under the shortcut folder or directly under Files) → Load to TablesNew table.

    Name it facility_catalog and click Load.

    Behind the scenes Fabric runs a Spark job that reads the Parquet, applies the schema, and writes a managed Delta table to Tables/facility_catalog. Takes ~30 seconds.
    📷
    SCREENSHOT NEEDED
    module-2/04-load-to-tables.png
    Load to Tables dialog with the table name set to facility_catalog before clicking Load
  5. Preview the table

    Refresh the Tables list. Click facility_catalog. You should see rows with columns including facility_id, facility_name, region_id, photo_path, beds_total, beds_icu.

    📷
    SCREENSHOT NEEDED
    module-2/05-table-preview.png
    Lakehouse table preview pane showing rows of facility_catalog with all columns

Step 2 - Mirror an Azure SQL Database MIRROR

Mirroring brings your operational SQL data into Fabric continuously, with no ETL pipelines to author. Fabric reads the source's CDC log and writes Delta into OneLake.

Using the shared Fabric lab workspace? Shortcut the mirror instead of creating it. If your instructor has already mirrored the Azure SQL data into a shared Fabric workspace that you have member access to, you do not need to create your own Mirrored database. Skip the create-and-connect steps below and shortcut the existing mirror straight into your Lakehouse:
  1. Open lh_urbanpulse_bronze. Right-click TablesNew shortcutMicrosoft OneLake.
  2. In the data-source picker, select the shared workspace, then the mirrored database (for example mir_urbanpulse_cityops).
  3. Select Hospitals, Wards, and Staff and click Create.
The tables appear in your Bronze Lakehouse with no copy and no source credentials. Member access on the shared workspace is what lets you create the OneLake shortcut. Then jump ahead to Step 4.
  1. Create a Mirrored Database

    From the workspace, click + New item → search "Mirrored". Choose Mirrored Azure SQL Database.

    Name it: mir_urbanpulse_cityops.

  2. Connect to the lab SQL DB

    Open the Azure SQL database resource in the Azure portal under your assigned resource group. Copy the server name, database name, and SQL admin credentials from the resource blades (see Module 0 · Setup & Environment Check) and paste them into the connection wizard:

    Connection
    Server:   (from Azure portal → SQL database → Connection strings)
    Database: (from Azure portal → SQL database overview)
    Auth:     SQL authentication
    Username: (SQL admin login from your SQL server)
    Password: (SQL admin password from your SQL server)
  3. Choose tables to mirror

    Tick all three: Hospitals, Wards, Staff. Click Mirror database.

    📷
    SCREENSHOT NEEDED
    module-2/02-mirror-tables.png
    Mirror table picker with Hospitals, Wards, Staff selected
  4. Watch the initial snapshot

    The mirror creates a snapshot first, then switches to incremental CDC. Status pane should reach Replicated in under two minutes.

    📷
    SCREENSHOT NEEDED
    module-2/03-mirror-sql-replicated.png
    Mirrored Azure SQL replication status page showing Hospitals, Wards, and Staff with status Replicated and row counts populated
    Why this matters for ISVs: When you onboard a new customer, mirroring lets you pull their operational SQL data without deploying anything to their environment. No agents, no scheduled exports, no broken pipelines.
  5. Shortcut the mirrored tables into the Bronze Lakehouse

    Open lh_urbanpulse_bronze. Right-click TablesNew shortcutMicrosoft OneLake. Pick mir_urbanpulse_cityops and select Hospitals, Wards, and Staff. Click Create.

    The three SQL tables now appear in the Bronze Lakehouse alongside the Parquet upload, ready to be joined with the Cosmos and uploaded data in a single SQL query.

    📷
    SCREENSHOT NEEDED
    module-2/04-mirror-sql-shortcut.png
    Bronze Lakehouse with OneLake shortcuts to Hospitals, Wards, and Staff from mir_urbanpulse_cityops visible in the Tables tree
    Why two items? A Mirrored database is its own Fabric item with its own OneLake folder. The OneLake-to-OneLake shortcut into the Bronze Lakehouse is the standard pattern for keeping mirrors and uploaded data in one queryable surface without copying.

Step 3 - Mirror a Cosmos DB for NoSQL Database MIRROR

Mirroring brings Cosmos DB documents into Fabric continuously with no ETL pipelines to author. Fabric reads the Cosmos change feed and lands the documents as managed Delta tables in OneLake, queryable through the same SQL endpoint as your Lakehouse.

Using the shared Fabric lab workspace? Shortcut the mirror instead of creating it. If your instructor has already mirrored the Cosmos DB data into a shared Fabric workspace that you have member access to, skip the create-and-connect steps below and shortcut the existing mirror into your Lakehouse:
  1. Open lh_urbanpulse_bronze. Right-click TablesNew shortcutMicrosoft OneLake.
  2. In the data-source picker, select the shared workspace, then the mirrored database (for example mir_urbanpulse_routing).
  3. Select regions and trainRoutes and click Create.
Both containers appear in your Bronze Lakehouse with no copy and no Cosmos keys. Member access on the shared workspace is what lets you create the OneLake shortcut. Then continue to Step 4.
  1. Create a Mirrored Cosmos DB

    From the workspace, click + New item → search "Mirrored". Choose Mirrored Azure Cosmos DB.

    Name it: mir_urbanpulse_routing.

  2. Connect to the Cosmos account

    Open your Azure Cosmos DB account in the Azure portal (under your assigned resource group). Go to Settings → Keys, copy the URI and primary key (or read-only key), and paste them into the connection pane. See Module 0 · Setup & Environment Check for context.

    Choose database: urbanpulse.

    📷
    SCREENSHOT NEEDED
    module-2/06-mirror-cosmos-connect.png
    Mirrored Cosmos DB connection pane with the endpoint and key entered, and the urbanpulse database selected
  3. Select the containers to mirror

    Tick regions and trainRoutes. Click Mirror database.

    Initial snapshot lands within ~1 minute; the change feed keeps the Delta copy current after that.

    📷
    SCREENSHOT NEEDED
    module-2/07-mirror-cosmos-containers.png
    Container picker with regions and trainRoutes selected, ready to click Mirror database
  4. Confirm replication is running

    Open the Mirrored database item. The Replication status page lists both containers as replicated tables once the initial snapshot completes. Status should read Running with row counts populated for regions and trainRoutes.

    📷
    SCREENSHOT NEEDED
    module-2/08-mirror-cosmos-replication-status.png
    Replication status page showing regions and trainRoutes replicated with status Running
  5. Shortcut the mirrored tables into the Bronze Lakehouse

    Open lh_urbanpulse_bronze. Right-click TablesNew shortcutMicrosoft OneLake. Pick mir_urbanpulse_routing and select both regions and trainRoutes. Click Create.

    This makes both Cosmos collections appear as tables in the same Bronze Lakehouse alongside the Parquet upload and the SQL mirror, so a single SQL query in the next step can join all three sources.

    📷
    SCREENSHOT NEEDED
    module-2/09-mirror-cosmos-shortcut.png
    Bronze Lakehouse with OneLake shortcuts to regions and trainRoutes from mir_urbanpulse_routing visible in the Tables tree
    Why two items? A Mirrored database is its own Fabric item with its own OneLake folder. The OneLake-to-OneLake shortcut into the Bronze Lakehouse is the standard pattern for keeping mirrors and uploaded data in one queryable surface without copying.

Step 4 - One SQL Query Across All Three Sources

The payoff. Three different ingest mechanisms, but to a downstream consumer they're all just tables in your Lakehouse SQL endpoint.

  1. Switch to the SQL analytics endpoint

    Top-right of the Lakehouse, click the dropdown that says Lakehouse and choose SQL analytics endpoint.

  2. Run the unified query

    Click New SQL query and paste:

    T-SQL
    -- A query that proves the three sources unify into one Lakehouse view.
    -- Hospital, its mirrored ward count, its region, and the train routes that serve that region.
    SELECT
        h.hospitalId,
        h.name              AS hospital_name,
        h.facilityType,
        f.beds_total,
        f.beds_icu,
        f.photo_path        AS sample_photo,                 -- uploaded Parquet → Delta
        (
            SELECT COUNT(*) FROM dbo.Wards w WHERE w.hospitalId = h.hospitalId
        )                   AS ward_count,                   -- mirrored from Azure SQL
        r.name              AS region_name,                  -- mirrored from Cosmos
        r.population        AS region_population,
        (
            SELECT STRING_AGG(tr.color, ', ')
            FROM trainRoutes tr
            CROSS APPLY OPENJSON(tr.regionsServed) WITH (rid VARCHAR(16) '$') s
            WHERE s.rid = h.regionId
        )                   AS train_lines_serving_region
    FROM dbo.Hospitals      AS h                              -- mirrored from Azure SQL
    JOIN dbo.facility_catalog AS f ON f.facility_id = h.hospitalId  -- uploaded Parquet → Delta
    JOIN regions     AS r  ON r.regionId   = h.regionId      -- mirrored from Cosmos
    ORDER BY h.hospitalId;
  3. Inspect the result

    You should see one row per hospital, with bed counts from the uploaded Parquet, ward counts from the mirrored SQL DB, and region + train-line info from the mirrored Cosmos DB. That's three different physical data stores in one query.

Validation

  • Lakehouse Tables shows facility_catalog (uploaded), Hospitals/Wards/Staff (mirrored from SQL), and regions + trainRoutes (mirrored from Cosmos).
  • The unified query above returns one row per hospital.
  • OneLake explorer shows all six tables in one workspace, surfaced through one Lakehouse SQL endpoint.

What You Just Built

ObjectTypeHow it got thereUse case
facility_catalog Delta Parquet file upload + Load to Tables Static reference data, region/photo lookups
Hospitals / Wards / Staff Mirrored Delta Mirrored Azure SQL DB (CDC) Operational OLTP data, near-real-time
regions / trainRoutes Mirrored Delta Mirrored Azure Cosmos DB for NoSQL (change feed) NoSQL data (geo + nested arrays), zero-ETL CDC

Module 3 brings in the real-time half of the medallion: live Event Hub data flowing through Eventstream into the Eventhouse.

References