Chapter 1: The Configuration Hub
Welcome to the hsr-cloud-accelerator! We're excited to have you on board. This series of tutorials will guide you through the core concepts of the project, one step at a time. Let's start with the most fundamental piece: the project's control panel.
The Problem: Managing a Mess of Settings
Imagine you're building a house. You have a blueprint for the foundation, another for the plumbing, and a separate one for the electrical wiring. What if the master blueprint changes the location of a wall? You'd have to manually update the plumbing and electrical plans, hoping you don't miss anything. If you forget one, you might end up with pipes or wires in the wrong place!
Software projects are similar. They have lots of "settings": database passwords, API keys, project names, file locations, etc. If these are scattered across many different files, changing just one thing can be a nightmare. It's easy to make a mistake and break the entire application.
The Solution: A Single Source of Truth
The hsr-cloud-accelerator solves this with a simple but powerful idea: the Configuration Hub. This is one central file that holds all the important settings for the entire project.
Think of it as the project's DNA. It contains all the genetic instructions needed for every part of the system to build and run itself correctly.
This central file is called run/main_config.yaml.
What is main_config.yaml?
It's a simple, human-readable text file written in a format called YAML. YAML uses indentation to organize information, making it easy to see the structure of your settings.
Let's look at a small piece of it:
# File: run/main_config.yaml
# Project related variables
GCP_PROJECT:
PROJECT_ID : "dev-analytics-gcp-bt"
PROJECT_REGION : "asia-south1"
# Storage bucket details
BUCKET:
NAME : "babai-tiffins-dev"
RAW : "zoho_inventory"
As you can see, settings are grouped into logical sections like GCP_PROJECT and BUCKET. This file is the only place you'll need to edit to change core project settings.
How It Works: The Magic Generator
So, if we only have one configuration file, how do all the other parts of the application get their settings? They don't read the main_config.yaml file directly.
Instead, we use a special script, run/generate_config.py, which acts as a "Configuration Generator". Here’s what it does:
- It reads the
main_config.yamlfile. - It translates the settings into different formats needed by other components.
- It writes new, specialized configuration files (like
.envand.shfiles).
This process ensures that every component receives the exact same, up-to-date information from our single source of truth.
Let's visualize the flow.
Let's See it in Action
Imagine we want to set up our project.
Step 1: We edit main_config.yaml
We open run/main_config.yaml and ensure our project ID and bucket name are correct.
# File: run/main_config.yaml
GCP_PROJECT:
PROJECT_ID : "dev-analytics-gcp-bt"
# ...
BUCKET:
NAME : "babai-tiffins-dev"
# ...
Step 2: We run the generator script
In our terminal, we would execute the generate_config.py script. This kicks off the "magic."
Step 3: The script creates specialized configurations
The script reads our YAML file and produces two important types of files:
-
A shell script (
config.sh) for infrastructure setup. It looks like this:# Generated File: run/scripts/config.sh
#!/bin/bash
PROJECT_ID="dev-analytics-gcp-bt"
RAW_BUCKET="babai-tiffins-dev"
# ... and so onThis file contains variables that our deployment scripts will use to create or update resources in Google Cloud.
-
An environment file (
.env) for our application code.# Generated File: run/pubsub/.env
GCP_PROJECT=dev-analytics-gcp-bt
CA_RAW=babai-tiffins-dev/zoho_inventory
# ... and so onOur Python applications will load this file to get access to things like bucket names and API keys while they are running.
Notice how the PROJECT_ID from our single main_config.yaml file was correctly placed into both generated files! If we ever need to change it, we only change it in one place, re-run the script, and our entire project is updated instantly and consistently.
Under the Hood: A Peek at the Code
How does generate_config.py actually work? Let's look at a simplified version of its logic.
The script first opens and reads the main_config.yaml file using a library called PyYAML.
# File: run/generate_config.py
import yaml
# ... (other code) ...
# 1. Find and open the main configuration file
with open("main_config.yaml", encoding="utf-8") as stream:
# 2. Load the YAML content into a Python dictionary
config = yaml.safe_load(stream)
This turns our neat YAML structure into a Python dictionary named config that the script can easily access.
Next, it calls a function to format the settings for a shell script.
# File: run/generate_config.py
def get_shell_config(data):
# Pluck values from the data dictionary
project_id = data["GCP_PROJECT"]["PROJECT_ID"]
bucket_name = data["BUCKET"]["NAME"]
# Format them into a shell script string
return f"""#!/bin/bash
PROJECT_ID="{project_id}"
RAW_BUCKET="{bucket_name}"
"""
This function simply takes the config dictionary, pulls out the values it needs, and plugs them into a pre-defined text template.
Finally, the script saves this newly created text into the config.sh file.
# File: run/generate_config.py
# ... (after loading config) ...
# 3. Generate the shell config string
shell_config = get_shell_config(config)
# 4. Write that string to a new file
with open("scripts/config.sh", "w") as fp:
fp.write(shell_config)
The same process is repeated for the .env file, just with a different formatting function. It's a simple but incredibly effective pattern: Read -> Format -> Write.
Conclusion
You've just learned about the most important concept in the hsr-cloud-accelerator: the Configuration Hub.
- It is the single source of truth for all project settings.
- It lives in one file:
run/main_config.yaml. - A generator script,
run/generate_config.py, reads this file to automatically create configurations for all other parts of the application.
This approach keeps our project organized, consistent, and easy to manage.
Now that you understand how we configure our project, we're ready to see how these configurations are used. In the next chapter, we'll dive into the first major component of our system.
Next up: Data Ingestion Pipeline
Generated by AI Codebase Knowledge Builder