Chapter 7: API & Orchestration Layer
In the previous chapter, we explored the delta-extractor's brain: the Database Metadata Model. This "card catalog" stores all our connections, pipeline definitions, and run history, giving our system a memory.
We now have all the pieces of a professional data pipeline: extraction engines, a transformation assembly line, a cloud uploader, and a brain to keep track of it all. But how do we put them all together and command them effectively? So far, we've only run single, manual jobs from the command line. It's time to upgrade from a manual tool to an automated service.
Welcome to the final chapter, where we meet the conductor of our data orchestra: the API & Orchestration Layer.
The Problem: From Manual Drone Pilot to Fleet Commander
Imagine our delta-extractor is a powerful delivery drone. Using the config.yaml and running python main.py is like being a pilot who flies one drone manually for a single delivery. This is great for a one-time job, but what if you need to run a fleet of drones?
- You want to schedule regular delivery routes (e.g., "run the sales report every night at 2 AM").
- You need a central dashboard to add new drones (SAP connections) or define new routes (pipelines) without rewriting files.
- You want to check the status of all your deliveries from a single place.
You need a control center. The API & Orchestration Layer is that control center for our fleet of data extraction "drones". It transforms delta-extractor from a simple command-line utility into a robust, web-based service that you can manage programmatically.
Key Concepts: Inside the Control Center
Our control center is built with a few modern, powerful tools.
1. The Dashboard: FastAPI
Our control center needs a user interface. Since this is a programmatic service, the "interface" is a web API. We use a framework called FastAPI to build this.
FastAPI is a Python library that makes it incredibly easy to create high-performance web APIs. Think of it as the manufacturer that builds our control panel. It provides all the wiring and buttons so we can focus on what happens when a button is pushed. It even automatically generates interactive documentation for our API, like a user manual for our control panel!
2. The Buttons: A REST API
Our control panel communicates using a standard set of rules called a REST API. This is the universal language for web services. You interact with it using standard HTTP methods, which act like buttons on our dashboard:
GET: Show me something. (e.g.,GET /connectionsshows all saved SAP connections).POST: Create something new. (e.g.,POST /pipelinescreates a new extraction pipeline).PUT: Update something. (e.g.,PUT /connections/1updates the details for connection #1).DELETE: Remove something. (e.g.,DELETE /pipelines/5removes pipeline #5).
By using this standard, any application—from a simple command-line script to a complex web dashboard—can interact with and control our delta-extractor.
3. The Automation Brain: Orchestration
This is the most important part. The API layer doesn't just provide buttons; it orchestrates the entire process. When you tell it to run a pipeline, it uses the Database Metadata Model to look up all the necessary information, and then it systematically calls all the other components in the correct order:
- Start the correct Extraction Engine.
- Wait for it to finish, then trigger the Data Transformation to Parquet pipeline.
- When that's done, start the Cloud Storage Uploader.
- Log the results of the run back into the database.
It's the conductor ensuring every part of the orchestra plays its part at the right time.
How to Use It: Commanding the Fleet
Instead of running python main.py, you first start the API server from your terminal:
# This command starts the web server
uvicorn main:app --reload
Your delta-extractor is now running as a live web service! You can open a browser and go to http://127.0.0.1:8000/docs to see the automatically generated interactive documentation for your API.
Now, you can use any tool that can make web requests (like the command-line tool curl or a Python script) to manage your extractions.
Example 1: Creating a new SAP connection
Let's say you want to add a new connection without editing any files. You would send a POST request to the /connections endpoint with the details in JSON format.
# Using curl to send a POST request
curl -X POST "http://127.0.0.1:8000/connections/" \
-H "Content-Type: application/json" \
-d '{
"name": "SAP_Test_System",
"protocol": "RFC",
"rfc_config": { "ashost": "test.sap.server", "sysnr": "10", ... }
}'
This tells the API to create a new connection named "SAP_Test_System" and save its details to the database. No more config.yaml!
Example 2: Triggering a pipeline
Once you've defined a pipeline (let's say it has an ID of 1), you can trigger it with a simple POST request.
# Tell the API to run the pipeline with ID 1
curl -X POST "http://127.0.0.1:8000/pipelines/1/run"
The API will receive this, look up pipeline 1 in the database, and kick off the entire end-to-end extraction process in the background.
Under the Hood: The Conductor at Work
When you send a request to run a pipeline, a well-orchestrated sequence of events happens.
The API acts as the central coordinator, fetching the "sheet music" from the database and instructing the "musicians" (the other components) to play.
Let's look at the simplified code that powers this.
Step 1: Setting up the FastAPI App
The main.py file is where our API server is born. It's incredibly simple.
# main.py (simplified)
from fastapi import FastAPI
# Create the main application "dashboard"
app = FastAPI()
# Plug in the control panels for different resources
from routers import pipeline, odp_connection
app.include_router(pipeline.router)
app.include_router(odp_connection.router)
# ... and so on for other routers
This code creates a central FastAPI app and then "includes" different routers. A router is just a collection of related API endpoints (like all the "buttons" for managing pipelines).
Step 2: Defining an Endpoint
Inside a file like routers/pipeline.py, we define what happens when a "button" is pushed.
# routers/pipeline.py (simplified)
from fastapi import APIRouter
from database import crud
from background_jobs import run_pipeline_in_background
# Create a router for all pipeline-related endpoints
router = APIRouter()
# Define what happens when a user POSTs to /pipelines/{id}/run
@router.post("/pipelines/{pipeline_id}/run")
def run_a_pipeline(pipeline_id: int):
# 1. Get the pipeline's "recipe" from the database
pipeline_config = crud.get_pipeline_by_id(pipeline_id)
# 2. Start the full extraction job in the background
run_pipeline_in_background(pipeline_config)
return {"message": "Pipeline run started!"}
This snippet is the heart of the orchestration logic.
@router.post(...): This "decorator" tells FastAPI that this function should handlePOSTrequests to the specified URL.crud.get_pipeline_by_id(...): It calls a function from our Database Metadata Model to fetch the pipeline's configuration.run_pipeline_in_background(...): It then passes that configuration to a function that kicks off the entire extraction job.
This simple, clean structure allows delta-extractor to go from a manual tool to a fully automated service.
Conclusion: Your Journey is Complete!
Congratulations! You've reached the end of the delta-extractor tutorial. You've journeyed through every major component of a modern data pipeline:
- You started with the simple Extraction Entrypoint for manual runs.
- You dove into the powerful RFC-based and modern HTTP/OData-based engines for pulling data.
- You saw how the Transformation to Parquet pipeline cleans and prepares the data for analysis.
- You learned how the Cloud Storage Uploader ships the final product to a central location.
- You explored the Database Metadata Model, the system's brain that gives it memory.
- And finally, you've seen how the API & Orchestration Layer ties everything together, turning a simple tool into a powerful, automated service.
You are now equipped with the knowledge to use, manage, and extend delta-extractor to solve your real-world data extraction challenges. Happy extracting
Generated by AI Codebase Knowledge Builder