Chapter 2: RFC-based Extraction Engine
In our last chapter, we learned how to turn on the delta-extractor using the Extraction Entrypoint and a config.yaml file. We saw that if you set protocol: "RFC", the entrypoint hands the job over to a specialist.
Welcome to that specialist's workshop! This chapter is all about the RFC-based Extraction Engine, the component responsible for high-speed, direct communication with SAP systems.
The Problem: We Need a Direct Line to SAP
Imagine your company's data lives in a massive, secure vault called SAP. You need to extract millions of sales records for a big analysis. Sending a request through the public mail (like a standard web request) would be slow, insecure, and inefficient. What you really want is a private, armored vehicle that can drive right into the vault, load up the data, and bring it back quickly.
The RFC-based Extraction Engine is that armored vehicle. It uses a special communication method called Remote Function Call (RFC) to create a private, high-speed tunnel directly into the heart of an SAP system. This is the preferred method for large, performance-critical data extractions.
Key Concepts: The Engine's Toolkit
To understand how our engine works, let's look at its three main tools.
1. The Language: RFC (Remote Function Call)
RFC is the native language that SAP systems use to talk to each other and to other certified programs. It's not like the general-purpose language of the internet (HTTP). Think of it as a secret, efficient shorthand spoken only by SAP experts. By speaking RFC, our engine gains direct, privileged access.
2. The Translator: The pyrfc Library
Our delta-extractor is written in Python, but SAP speaks RFC. How do they communicate? They use a translator! The pyrfc library is a special Python package that translates our Python commands into RFC commands that SAP understands, and translates SAP's RFC responses back into Python data.
+----------------+ +-------------+ +--------------+
| Our Python App | <--> | pyrfc | <--> | SAP System |
| (delta-extractor)| | (Translator)| | (Speaks RFC) |
+----------------+ +-------------+ +--------------+
3. The Strategy: Pointers and Packages
If you need to move a million books from a library, you don't try to carry them all at once. You'd grab a cart, load a few dozen books (a "package"), wheel them out, and come back for the next batch.
The RFC engine does the same thing with data. It doesn't ask for all million rows at once. Instead, it:
- Opens a "Pointer": It tells SAP, "I want to start reading the sales data. Please mark my spot." SAP creates a pointer, like putting a bookmark in a giant book.
- Fetches in "Packages": The engine then asks, "Give me the first 50,000 rows." SAP sends this package. The engine processes it and then asks for the next 50,000, and so on, until SAP says, "You've reached the end of the book."
This approach is highly efficient and prevents the system from being overwhelmed by a massive, single data dump.
How to Use It: Configuring the Engine
You already know how to turn the engine on from the last chapter! The config.yaml file is where you provide the specific instructions for the RFC engine.
Let's look at the rfc section of the config file.
# config.yaml
# This tells the entrypoint to use the RFC Engine
protocol: "RFC"
# --- Settings specifically for the RFC Engine ---
rfc:
ashost: "your-sap-server.com" # Address of the SAP "vault"
sysnr: "00" # The specific system/instance number
client: "100" # The client or tenant within the system
user: "YOUR_USERNAME" # Your username for the vault
# The password is often stored securely, not in this file
extraction_settings:
extraction_type: "Z_SALES_DATA_EXTRACTOR" # The specific data report to run
package_size: 50000 # How many rows to fetch in each "package"
When the RFC engine starts, it reads this configuration to get the exact address and credentials for the SAP system, and it knows exactly what data to ask for (extraction_type) and how big each chunk of data should be (package_size).
Under the Hood: A Step-by-Step Extraction
When the Extraction Entrypoint calls the RFC engine, a well-defined sequence of events unfolds.
Let's look at some simplified code to see how this is implemented.
Step 1: Establishing the Connection
The first step is to use the pyrfc library and the details from config.yaml to open that private tunnel to SAP.
# Simplified code from the RFC engine
from pyrfc import Connection
def connect_to_sap(rfc_config):
# Use credentials from the config file
conn = Connection(
ashost=rfc_config['ashost'],
sysnr=rfc_config['sysnr'],
client=rfc_config['client'],
user=rfc_config['user'],
# ... and other credentials
)
return conn
This snippet creates a Connection object, which is our active line to the SAP server.
Step 2: The Extraction Loop
Once connected, the engine starts fetching data in a loop. It calls a specific function on the SAP side, asking for data packages until there are none left.
# Simplified representation of the fetching loop
def fetch_data_in_packages(conn, settings):
# 1. Tell SAP to prepare the data and get a request ID
result = conn.call("INITIATE_EXTRACTION", TABLE=settings['extraction_type'])
request_id = result['REQUEST_ID']
while True:
# 2. Ask for the next package of data
package_result = conn.call(
"FETCH_PACKAGE",
REQUEST_ID=request_id,
PACKAGE_SIZE=settings['package_size']
)
# 3. If the package is empty, we're done!
if not package_result['DATA']:
break # Exit the loop
# 4. Otherwise, process the data
process_data(package_result['DATA'])
This while loop is the heart of the engine. It efficiently pulls data piece by piece, which is why this method is so powerful for large datasets.
Conclusion
You've now seen inside the high-performance engine of our delta-extractor. The RFC-based Extraction Engine is a specialist tool that creates a direct, fast, and secure connection to an SAP system. It uses the pyrfc library to speak SAP's native language and fetches data in efficient packages, making it ideal for heavy-duty extractions.
But what if you don't have direct RFC access? Many modern systems, including SAP, expose their data over the web using standard APIs. For that, we need a completely different kind of engine.
Let's explore that alternative in our next chapter: the HTTP/OData-based Extraction Engine.
Generated by AI Codebase Knowledge Builder