Chapter 2: RFC-based Extraction Engine

In our last chapter, we learned how to turn on the delta-extractor using the Extraction Entrypoint and a config.yaml file. We saw that if you set protocol: "RFC", the entrypoint hands the job over to a specialist.

Welcome to that specialist's workshop! This chapter is all about the RFC-based Extraction Engine, the component responsible for high-speed, direct communication with SAP systems.

The Problem: We Need a Direct Line to SAP

Imagine your company's data lives in a massive, secure vault called SAP. You need to extract millions of sales records for a big analysis. Sending a request through the public mail (like a standard web request) would be slow, insecure, and inefficient. What you really want is a private, armored vehicle that can drive right into the vault, load up the data, and bring it back quickly.

The RFC-based Extraction Engine is that armored vehicle. It uses a special communication method called Remote Function Call (RFC) to create a private, high-speed tunnel directly into the heart of an SAP system. This is the preferred method for large, performance-critical data extractions.

Key Concepts: The Engine's Toolkit

To understand how our engine works, let's look at its three main tools.

1. The Language: RFC (Remote Function Call)

RFC is the native language that SAP systems use to talk to each other and to other certified programs. It's not like the general-purpose language of the internet (HTTP). Think of it as a secret, efficient shorthand spoken only by SAP experts. By speaking RFC, our engine gains direct, privileged access.

2. The Translator: The `pyrfc` Library

Our delta-extractor is written in Python, but SAP speaks RFC. How do they communicate? They use a translator! The pyrfc library is a special Python package that translates our Python commands into RFC commands that SAP understands, and translates SAP's RFC responses back into Python data.

+----------------+      +-------------+      +--------------+
| Our Python App | <--> |   pyrfc     | <--> |  SAP System  |
| (delta-extractor)|      | (Translator)|      | (Speaks RFC) |
+----------------+      +-------------+      +--------------+

3. The Strategy: Pointers and Packages

If you need to move a million books from a library, you don't try to carry them all at once. You'd grab a cart, load a few dozen books (a "package"), wheel them out, and come back for the next batch.

The RFC engine does the same thing with data. It doesn't ask for all million rows at once. Instead, it:

Opens a "Pointer": It tells SAP, "I want to start reading the sales data. Please mark my spot." SAP creates a pointer, like putting a bookmark in a giant book.
Fetches in "Packages": The engine then asks, "Give me the first 50,000 rows." SAP sends this package. The engine processes it and then asks for the next 50,000, and so on, until SAP says, "You've reached the end of the book."

This approach is highly efficient and prevents the system from being overwhelmed by a massive, single data dump.

How to Use It: Configuring the Engine

You already know how to turn the engine on from the last chapter! The config.yaml file is where you provide the specific instructions for the RFC engine.

Let's look at the rfc section of the config file.

# config.yaml

# This tells the entrypoint to use the RFC Engine
protocol: "RFC" 

# --- Settings specifically for the RFC Engine ---
rfc:
  ashost: "your-sap-server.com"  # Address of the SAP "vault"
  sysnr: "00"                   # The specific system/instance number
  client: "100"                 # The client or tenant within the system
  user: "YOUR_USERNAME"         # Your username for the vault
  # The password is often stored securely, not in this file

extraction_settings:
  extraction_type: "Z_SALES_DATA_EXTRACTOR" # The specific data report to run
  package_size: 50000                       # How many rows to fetch in each "package"

When the RFC engine starts, it reads this configuration to get the exact address and credentials for the SAP system, and it knows exactly what data to ask for (extraction_type) and how big each chunk of data should be (package_size).

Under the Hood: A Step-by-Step Extraction

When the Extraction Entrypoint calls the RFC engine, a well-defined sequence of events unfolds.

Let's look at some simplified code to see how this is implemented.

Step 1: Establishing the Connection

The first step is to use the pyrfc library and the details from config.yaml to open that private tunnel to SAP.

# Simplified code from the RFC engine
from pyrfc import Connection

def connect_to_sap(rfc_config):
    # Use credentials from the config file
    conn = Connection(
        ashost=rfc_config['ashost'],
        sysnr=rfc_config['sysnr'],
        client=rfc_config['client'],
        user=rfc_config['user'],
        # ... and other credentials
    )
    return conn

This snippet creates a Connection object, which is our active line to the SAP server.

Step 2: The Extraction Loop

Once connected, the engine starts fetching data in a loop. It calls a specific function on the SAP side, asking for data packages until there are none left.

# Simplified representation of the fetching loop

def fetch_data_in_packages(conn, settings):
    # 1. Tell SAP to prepare the data and get a request ID
    result = conn.call("INITIATE_EXTRACTION", TABLE=settings['extraction_type'])
    request_id = result['REQUEST_ID']

    while True:
        # 2. Ask for the next package of data
        package_result = conn.call(
            "FETCH_PACKAGE",
            REQUEST_ID=request_id,
            PACKAGE_SIZE=settings['package_size']
        )
        
        # 3. If the package is empty, we're done!
        if not package_result['DATA']:
            break # Exit the loop
        
        # 4. Otherwise, process the data
        process_data(package_result['DATA'])

This while loop is the heart of the engine. It efficiently pulls data piece by piece, which is why this method is so powerful for large datasets.

Conclusion

You've now seen inside the high-performance engine of our delta-extractor. The RFC-based Extraction Engine is a specialist tool that creates a direct, fast, and secure connection to an SAP system. It uses the pyrfc library to speak SAP's native language and fetches data in efficient packages, making it ideal for heavy-duty extractions.

But what if you don't have direct RFC access? Many modern systems, including SAP, expose their data over the web using standard APIs. For that, we need a completely different kind of engine.

Let's explore that alternative in our next chapter: the HTTP/OData-based Extraction Engine.

Generated by AI Codebase Knowledge Builder

The Problem: We Need a Direct Line to SAP​

Key Concepts: The Engine's Toolkit​

1. The Language: RFC (Remote Function Call)​

2. The Translator: The pyrfc Library​

3. The Strategy: Pointers and Packages​

How to Use It: Configuring the Engine​

Under the Hood: A Step-by-Step Extraction​

Conclusion​