Chapter 1: Extraction Entrypoint
Welcome to the delta-extractor tutorial! We're excited to have you on board. Over the next few chapters, we'll explore how this powerful tool can help you extract data from various systems. Let's start with the very beginning: how to turn the machine on.
Imagine you have a complex machine, like a professional coffee maker. It can make all sorts of drinks, but to get started, you just need to know which button to press. In our project, that "on" button is the Extraction Entrypoint.
What's the Point? The Ad-Hoc Report
Let's imagine a common scenario. Your manager asks you for a quick, one-time report on sales data from your company's SAP system. You don't need a fancy, automated pipeline that runs every hour. You just need to run the extraction once, right now, from your computer.
This is the perfect job for the Extraction Entrypoint. It's designed to be the manual "on" switch for the entire data extraction machine, ideal for single, ad-hoc jobs.
The Recipe: Your config.yaml File
Our extraction machine doesn't just turn on; it needs instructions. It needs to know what to extract and how. These instructions are written in a simple file named config.yaml.
Think of the entrypoint as a chef and the config.yaml as their recipe. The chef reads the recipe to understand what to cook.
Here’s a very simple example of what a config.yaml recipe might look like:
# config.yaml
# What's the method of communication?
protocol: "RFC"
# --- Settings for the RFC protocol ---
rfc:
ashost: "your-sap-server.com"
sysnr: "00"
client: "100"
user: "YOUR_USERNAME"
# more settings...
This simple recipe tells our chef (the entrypoint) three crucial things:
protocol: "RFC": "We'll be communicating using the RFC method."rfc: ...: "Here are the specific connection details for the RFC system."
Based on the protocol specified, the entrypoint knows exactly which part of the machine to start up.
How to Run an Extraction
Running a one-time extraction is as simple as telling the chef to start cooking. You do this from your command line.
- Prepare your recipe: Create a
config.yamlfile in the project's root directory. - Run the command: Open your terminal, navigate to the project folder, and run:
python main.py
That's it! When you run this command, the entrypoint will:
- Read your
config.yamlfile. - Create any necessary folders for storing the output data (e.g., a
data/folder). - See that
protocolis set to"RFC". - Kick off the RFC-based Extraction Engine to do the actual work.
You don't need to do anything else. The entrypoint handles the initial setup and delegates the task to the correct engine.
Under the Hood: How It Works
So, what really happens when you type python main.py and press Enter? Let's peek inside the machine.
The process is a straightforward sequence of events.
As you can see, the entrypoint is a traffic controller. It reads the instructions and directs the flow to the correct specialist component.
Let's look at a simplified piece of code that represents the core logic inside main.py.
Step 1: Reading the Recipe (config.yaml)
The first thing the script does is find and load the configuration file.
# main.py (simplified)
import yaml # A library to read YAML files
# --- Load the configuration ---
with open("config.yaml", "r") as f:
config = yaml.safe_load(f)
protocol = config.get("protocol")
This small snippet opens the config.yaml file, reads its contents, and stores the value of the protocol key in a variable.
Step 2: Choosing the Right Engine
Next, it uses a simple if/else statement to decide which extraction engine to start, based on the protocol.
# main.py (continued)
from rfc_engine import run_rfc_extraction
from http_engine import run_http_extraction
# --- Decide which engine to run ---
if protocol == "RFC":
print("Starting RFC extraction...")
run_rfc_extraction(config) # Hand off to the RFC engine
elif protocol == "HTTP":
print("Starting HTTP extraction...")
run_http_extraction(config) # Hand off to the HTTP engine
else:
print("Error: Protocol not supported!")
This logic is the heart of the entrypoint. It acts as a switchboard, connecting your request to the correct module. We'll dive into what run_rfc_extraction and run_http_extraction do in the next chapters. For now, just know that the entrypoint's job is to call the right one.
Conclusion
You've just learned about the simplest way to use delta-extractor: the Extraction Entrypoint. It's your manual "on" switch for running single, ad-hoc jobs. You give it a config.yaml recipe, run a simple command, and it takes care of starting the right process.
Now that you understand how to start an extraction, let's look at the first engine it can trigger. In our next chapter, we'll explore the powerful engine used to talk to many SAP systems.
Ready to continue? Let's move on to the RFC-based Extraction Engine.
Generated by AI Codebase Knowledge Builder