Skip to main content

Chapter 7: API & Orchestration Layer

In the previous chapter, we explored the delta-extractor's brain: the Database Metadata Model. This "card catalog" stores all our connections, pipeline definitions, and run history, giving our system a memory.

We now have all the pieces of a professional data pipeline: extraction engines, a transformation assembly line, a cloud uploader, and a brain to keep track of it all. But how do we put them all together and command them effectively? So far, we've only run single, manual jobs from the command line. It's time to upgrade from a manual tool to an automated service.

Welcome to the final chapter, where we meet the conductor of our data orchestra: the API & Orchestration Layer.

The Problem: From Manual Drone Pilot to Fleet Commander

Imagine our delta-extractor is a powerful delivery drone. Using the config.yaml and running python main.py is like being a pilot who flies one drone manually for a single delivery. This is great for a one-time job, but what if you need to run a fleet of drones?

  • You want to schedule regular delivery routes (e.g., "run the sales report every night at 2 AM").
  • You need a central dashboard to add new drones (SAP connections) or define new routes (pipelines) without rewriting files.
  • You want to check the status of all your deliveries from a single place.

You need a control center. The API & Orchestration Layer is that control center for our fleet of data extraction "drones". It transforms delta-extractor from a simple command-line utility into a robust, web-based service that you can manage programmatically.

Key Concepts: Inside the Control Center

Our control center is built with a few modern, powerful tools.

1. The Dashboard: FastAPI

Our control center needs a user interface. Since this is a programmatic service, the "interface" is a web API. We use a framework called FastAPI to build this.

FastAPI is a Python library that makes it incredibly easy to create high-performance web APIs. Think of it as the manufacturer that builds our control panel. It provides all the wiring and buttons so we can focus on what happens when a button is pushed. It even automatically generates interactive documentation for our API, like a user manual for our control panel!

2. The Buttons: A REST API

Our control panel communicates using a standard set of rules called a REST API. This is the universal language for web services. You interact with it using standard HTTP methods, which act like buttons on our dashboard:

  • GET: Show me something. (e.g., GET /connections shows all saved SAP connections).
  • POST: Create something new. (e.g., POST /pipelines creates a new extraction pipeline).
  • PUT: Update something. (e.g., PUT /connections/1 updates the details for connection #1).
  • DELETE: Remove something. (e.g., DELETE /pipelines/5 removes pipeline #5).

By using this standard, any application—from a simple command-line script to a complex web dashboard—can interact with and control our delta-extractor.

3. The Automation Brain: Orchestration

This is the most important part. The API layer doesn't just provide buttons; it orchestrates the entire process. When you tell it to run a pipeline, it uses the Database Metadata Model to look up all the necessary information, and then it systematically calls all the other components in the correct order:

  1. Start the correct Extraction Engine.
  2. Wait for it to finish, then trigger the Data Transformation to Parquet pipeline.
  3. When that's done, start the Cloud Storage Uploader.
  4. Log the results of the run back into the database.

It's the conductor ensuring every part of the orchestra plays its part at the right time.

How to Use It: Commanding the Fleet

Instead of running python main.py, you first start the API server from your terminal:

# This command starts the web server
uvicorn main:app --reload

Your delta-extractor is now running as a live web service! You can open a browser and go to http://127.0.0.1:8000/docs to see the automatically generated interactive documentation for your API.

Now, you can use any tool that can make web requests (like the command-line tool curl or a Python script) to manage your extractions.

Example 1: Creating a new SAP connection

Let's say you want to add a new connection without editing any files. You would send a POST request to the /connections endpoint with the details in JSON format.

# Using curl to send a POST request
curl -X POST "http://127.0.0.1:8000/connections/" \
-H "Content-Type: application/json" \
-d '{
"name": "SAP_Test_System",
"protocol": "RFC",
"rfc_config": { "ashost": "test.sap.server", "sysnr": "10", ... }
}'

This tells the API to create a new connection named "SAP_Test_System" and save its details to the database. No more config.yaml!

Example 2: Triggering a pipeline

Once you've defined a pipeline (let's say it has an ID of 1), you can trigger it with a simple POST request.

# Tell the API to run the pipeline with ID 1
curl -X POST "http://127.0.0.1:8000/pipelines/1/run"

The API will receive this, look up pipeline 1 in the database, and kick off the entire end-to-end extraction process in the background.

Under the Hood: The Conductor at Work

When you send a request to run a pipeline, a well-orchestrated sequence of events happens.

The API acts as the central coordinator, fetching the "sheet music" from the database and instructing the "musicians" (the other components) to play.

Let's look at the simplified code that powers this.

Step 1: Setting up the FastAPI App

The main.py file is where our API server is born. It's incredibly simple.

# main.py (simplified)
from fastapi import FastAPI

# Create the main application "dashboard"
app = FastAPI()

# Plug in the control panels for different resources
from routers import pipeline, odp_connection
app.include_router(pipeline.router)
app.include_router(odp_connection.router)
# ... and so on for other routers

This code creates a central FastAPI app and then "includes" different routers. A router is just a collection of related API endpoints (like all the "buttons" for managing pipelines).

Step 2: Defining an Endpoint

Inside a file like routers/pipeline.py, we define what happens when a "button" is pushed.

# routers/pipeline.py (simplified)
from fastapi import APIRouter
from database import crud
from background_jobs import run_pipeline_in_background

# Create a router for all pipeline-related endpoints
router = APIRouter()

# Define what happens when a user POSTs to /pipelines/{id}/run
@router.post("/pipelines/{pipeline_id}/run")
def run_a_pipeline(pipeline_id: int):
# 1. Get the pipeline's "recipe" from the database
pipeline_config = crud.get_pipeline_by_id(pipeline_id)

# 2. Start the full extraction job in the background
run_pipeline_in_background(pipeline_config)

return {"message": "Pipeline run started!"}

This snippet is the heart of the orchestration logic.

  1. @router.post(...): This "decorator" tells FastAPI that this function should handle POST requests to the specified URL.
  2. crud.get_pipeline_by_id(...): It calls a function from our Database Metadata Model to fetch the pipeline's configuration.
  3. run_pipeline_in_background(...): It then passes that configuration to a function that kicks off the entire extraction job.

This simple, clean structure allows delta-extractor to go from a manual tool to a fully automated service.

Conclusion: Your Journey is Complete!

Congratulations! You've reached the end of the delta-extractor tutorial. You've journeyed through every major component of a modern data pipeline:

You are now equipped with the knowledge to use, manage, and extend delta-extractor to solve your real-world data extraction challenges. Happy extracting


Generated by AI Codebase Knowledge Builder