Oracle AI Data Platform Workbench Samples
This repository contains a curated collection of sample notebooks demonstrating how to build data pipelines, run machine learning workloads, and integrate AI capabilities using Oracle AI Data Platform (AIDP) Workbench — a unified, governed workspace for data engineering, ML, and AI development powered by Apache Spark.
What is Oracle AI Data Platform Workbench?
Oracle AI Data Platform Workbench is a unified, governed workspace for building, managing, and deploying AI and data-driven solutions. It brings together notebooks, agent development, orchestration, and catalog management in a single collaborative platform — empowering teams to explore data, fine-tune models, and operationalize AI with trust and speed.
Learn more about AIDP Workbench →
Repository Structure
oracle-aidp-samples/
├── getting-started/ # Foundational notebooks for new users
│ ├── Delta_Lake/ # Delta Lake feature walkthroughs
│ └── migration/ # Migrating workloads to AIDP
├── data-engineering/
│ ├── ingestion/ # Connectors and data loading patterns
│ └── transformation/ # Pipeline architectures and table formats
│ ├── liquid-clustering/
│ ├── medallion-lake/
│ ├── scd/
│ └── streaming/
├── ai/
│ ├── agent-flows/ # Agent orchestration and scheduling
│ └── ml-datascience/ # ML, LLM, and AI service integrations
└── shared-utils/ # Reusable utilities and data generatorsSample Catalog
Getting Started
Foundational examples to help you get up and running on AIDP Workbench.
| Notebook | Description |
|---|---|
| Access ALH Data | Write and query data in Oracle Autonomous AI Lakehouse (ALH) using PySpark insertInto and SQL INSERT statements with external catalogs. |
| Access Object Storage Data | Read and write data from OCI Object Storage using direct access, external volumes, and external tables. |
| Analyse Data Using PySpark | PySpark fundamentals: catalog and schema setup, table creation, data insertion, schema exploration, and matplotlib visualizations. |
| Analyse Data Using SQL | Core SQL operations on AIDP including DataFrame creation, transformations, aggregations, and simple visualizations. |
| ALH External Catalog MERGE | End-to-end MERGE workflow into an ALH table via an AIDP external catalog: insert/update/delete with merge keys and OOS-staging skip optimization. |
Delta Lake
| Notebook | Description |
|---|---|
| Use Delta Lake Table | Comprehensive guide covering Delta table operations: updates, merges, time travel, liquid clustering, and vacuuming. |
| Delta Change Data Feed | Capture row-level changes (inserts, updates, deletes) from Delta tables for CDC, incremental processing, and streaming pipelines. |
| Handle Schema Evolution | Add and evolve columns in Delta tables without rewriting existing data, leveraging automatic schema evolution. |
| Delta UniForm Tables | Create Delta UniForm tables that automatically synchronize Iceberg metadata for cross-format interoperability. |
Migration
| Notebook | Description |
|---|---|
| Migrate Files from Databricks to AIDP | Recursively export notebooks and files from a Databricks workspace to AIDP using the databricks-sdk library. |
| Download from Git to AIDP | Download notebooks and files from a Git repository as a ZIP archive and extract them directly into an AIDP workspace volume. |
Data Engineering — Ingestion
Patterns for connecting to and loading data from a wide range of sources.
| Notebook | Description |
|---|---|
| Read/Write Oracle Ecosystem Connectors | Connect to Oracle Database, Oracle Exadata, ALH, and ATP with external catalog support and SQL pushdown. |
| Read/Write External Ecosystem Connectors | Read/write operations with Hive Metastore, Microsoft SQL Server, PostgreSQL, and MySQL. |
| Read-Only Ingestion Connectors | Use read-only connectors for MySQL HeatWave, REST APIs, Oracle Fusion BICC, Kafka, and other sources. |
| Connect Using Custom JDBC Driver | Integrate custom JDBC drivers (e.g., SQLite, Snowflake) with Spark for connecting to databases not bundled by default. |
| Execute Oracle ALH SQL | Execute SQL statements directly against Oracle ALH using the oracledb Python package. |
| Ingest Data Using YAML | Config-driven ingestion from cloud storage (CSV, JSON) and JDBC sources with schema validation and data quality checks. |
| Ingest from Multi-Cloud | Ingest data from Azure Data Lake Storage (ADLS) and AWS S3 with proper JAR configuration and credential management. |
| Ingest into Apache Iceberg (OCI Native) | End-to-end Apache Iceberg workflow: table creation, querying, schema evolution, time travel, and metadata inspection using OCI native protocol and Hadoop catalog. |
| Pipe-Delimited File Ingestion | Read pipe-delimited (|) files from OCI Object Storage and register them as external tables. |
| Read Excel Files | Read Excel (.xlsx) files using the Spark Excel connector and convert them to Spark DataFrames or CSV. |
| Streaming from OCI Streaming Service | Consume messages from OCI Streaming (Kafka-compatible) using Spark Structured Streaming with SASL/OAUTHBearer authentication. |
| Streaming from Volume Path | Process CSV files from a workspace volume using one-time micro-batch streaming with Trigger.Once(). |
Data Engineering — Transformation
Architectural patterns and pipeline templates for data transformation at scale.
Medallion Architecture
Implements the Bronze → Silver → Gold lakehouse pattern with data quality checks and aggregations. Industry variants available:
| Notebook | Industry |
|---|---|
| Education | Education analytics pipeline |
| Energy | Energy consumption and reporting |
| Financial Services | Financial transactions and risk |
| Healthcare | Patient records and clinical data |
| Hospitality | Hotel bookings and guest analytics |
| Insurance | Policy and claims processing |
| Manufacturing | Production line and quality data |
| Media | Content engagement and subscriptions |
| Real Estate | Property listings and transactions |
| Retail | Sales, inventory, and customer data |
| Telecommunications | Network usage and customer churn |
| Transportation | Logistics and fleet tracking |
Delta Liquid Clustering
Demonstrates Delta Lake liquid clustering for automatic query optimization and data layout management. Industry variants available:
| Notebook | Industry |
|---|---|
| Education | Student performance analytics with ML prediction |
| Energy | Smart grid monitoring and anomaly detection |
| Financial Services | Transaction analytics and reporting |
| Healthcare | Patient data access patterns |
| Hospitality | Booking and occupancy analytics |
| Insurance | Claims and policy data optimization |
| Manufacturing | Production and quality metrics |
| Media | Content and engagement data |
| Real Estate | Property and transaction data |
| Retail | Sales and inventory analytics |
| Telecommunications | Network and customer usage data |
| Transportation | Fleet and logistics optimization |
Apache Iceberg Uniform Liquid Clustering
Combines Delta UniForm with Apache Iceberg Liquid Clustering for open-format, cross-engine table optimization. Industry variants available:
| Notebook | Industry |
|---|---|
| Education | Student performance data |
| Energy | Grid and sensor data |
| Financial Services | Transaction and risk data |
| Healthcare | Clinical and patient records |
| [Hospitality](data-engineering/transformation/liquid- |
…