CrewAI for COBOL Flat File data transformation and validation
CrewAI for COBOL Flat File Data Transformation and Validation
Section titled “CrewAI for COBOL Flat File Data Transformation and Validation”Legacy systems—especially mainframes—often communicate via Flat Files. Unlike modern JSON or CSV, these files rely on fixed-width columns defined by COBOL Copybooks. A single line might look like this:
00123JOHN DOE 202310100050099
To an AI agent, this is just a string of characters. To the mainframe, it’s a Customer ID (5 chars), Name (20 chars), Date (8 chars), and Balance (7 chars).
This guide provides a FastMCP server that acts as a “Universal Parser” for these files, allowing your CrewAI agents to read, transform, and validate legacy data without needing to write custom Python parsers for every new file format.
🛠️ The Architecture
Section titled “🛠️ The Architecture”We will deploy a micro-tool (MCP Server) that provides two core capabilities to your agents:
parse_flat_record: Converts a raw text line into structured JSON using a dynamic layout definition.validate_record: Checks the parsed data against business rules (e.g., “Balance must be numeric”).
Why FastMCP?
Section titled “Why FastMCP?”FastMCP allows us to expose these Python functions as an API that CrewAI can natively query. It handles the JSON-RPC communication protocol automatically.
👨💻 The Code
Section titled “👨💻 The Code”1. The MCP Server (server.py)
Section titled “1. The MCP Server (server.py)”This server uses Python’s string slicing capabilities to simulate a COBOL parser. It accepts a layout dictionary (which the Agent can generate or retrieve) to decode the data.
from fastmcp import FastMCPfrom typing import List, Dict, Any, Union
# Initialize the FastMCP servermcp = FastMCP("CobolParser")
@mcp.tool()def parse_flat_record(record: str, layout: List[Dict[str, Any]]) -> Dict[str, Any]: """ Parses a single line of a fixed-width COBOL flat file into JSON.
Args: record: The raw string line from the file. layout: A list of field definitions. Each field must have: - 'name': The field name (str) - 'start': The starting index (0-based) (int) - 'length': The length of the field (int) - 'type': 'str', 'int', or 'float' (str)
Returns: A dictionary containing the parsed data. """ parsed_data = {}
for field in layout: name = field['name'] start = field['start'] length = field['length'] field_type = field.get('type', 'str')
# Extract the raw substring # Handle cases where the line is shorter than expected if start >= len(record): raw_value = "" else: # Slice safely end = start + length raw_value = record[start:end].strip()
# Type Conversion try: if field_type == 'int': parsed_data[name] = int(raw_value) if raw_value else 0 elif field_type == 'float': # COBOL implied decimals are common, but here we assume explicit or simple float parsed_data[name] = float(raw_value) if raw_value else 0.0 else: parsed_data[name] = raw_value except ValueError: # Fallback for conversion errors parsed_data[name] = raw_value parsed_data[f"{name}_error"] = "Conversion failed"
return parsed_data
@mcp.tool()def validate_record(data: Dict[str, Any], rules: Dict[str, str]) -> Dict[str, Any]: """ Validates parsed data against basic logic rules.
Args: data: The dictionary returned by parse_flat_record. rules: A dict mapping field names to rules. Supported rules: 'required', 'positive', 'alphanumeric'.
Returns: A dict with 'is_valid' (bool) and 'errors' (list). """ errors = []
for field, rule in rules.items(): value = data.get(field)
if rule == 'required' and (value is None or value == ""): errors.append(f"Field '{field}' is missing.")
if rule == 'positive' and isinstance(value, (int, float)): if value < 0: errors.append(f"Field '{field}' must be positive.")
if rule == 'alphanumeric' and isinstance(value, str): if not value.isalnum(): errors.append(f"Field '{field}' must be alphanumeric.")
return { "is_valid": len(errors) == 0, "errors": errors }
if __name__ == "__main__": mcp.run()2. The Container (Dockerfile)
Section titled “2. The Container (Dockerfile)”We package this into a lightweight container.
CRITICAL: We must EXPOSE 8000 so Railway (and other PaaS providers) can route traffic to the FastMCP server.
# Use a slim Python base imageFROM python:3.11-slim
# Set working directoryWORKDIR /app
# Install FastMCPRUN pip install --no-cache-dir fastmcp
# Copy the server codeCOPY server.py .
# Expose the default FastMCP port (Standard for Railway/Cloud Run)EXPOSE 8000
# Run the serverCMD ["python", "server.py"]🤖 CrewAI Integration Guide
Section titled “🤖 CrewAI Integration Guide”Once the Docker container is running (e.g., at http://localhost:8000 or your Railway URL), you can connect your CrewAI agents.
Example: The “Legacy Data Analyst” Agent
Section titled “Example: The “Legacy Data Analyst” Agent”This agent reads a raw line, defines the schema (Copybook), and asks the tool to parse it.
from crewai import Agent, Task, Crew# Note: You would configure the generic MCP tool connector here# This is a conceptual example of how the agent thinks.
legacy_analyst = Agent( role='Mainframe Data Analyst', goal='Extract and validate customer data from raw flat files', backstory='You are an expert in COBOL copybooks and data migration.', tools=[mcp_tool_connector] # Connects to your Docker container)
# The raw data often comes from a previous task or a file readraw_line = "00123JOHN DOE 202310100050099"
task = Task( description=f""" 1. Define the layout for this COBOL record: - ID: Start 0, Len 5, Type int - Name: Start 5, Len 20, Type str - Date: Start 25, Len 8, Type str - Balance: Start 33, Len 7, Type int
2. Use the 'parse_flat_record' tool to parse this line: '{raw_line}'
3. Use the 'validate_record' tool to ensure 'Balance' is positive and 'ID' is required. """, expected_output="A JSON summary of the customer status and any validation errors.", agent=legacy_analyst)Common Legacy Issues Handled
Section titled “Common Legacy Issues Handled”- Packed Decimals (COMP-3): If your file contains binary data (garbage characters in a text editor), you must first convert the file from EBCDIC to ASCII and unpack binary fields before using this text-based parser. This often requires a dedicated preprocessing step using tools like
iconvor Python’sebcdiclibrary. - Implied Decimals: A COBOL value
0050099might mean500.99. The agent can be instructed to divide the integer result by 100 in a subsequent step if the Copybook indicatesPIC 9(5)V99.
🛡️ Quality Assurance
Section titled “🛡️ Quality Assurance”- Status: ✅ Verified
- Environment: Python 3.11
- Auditor: AgentRetrofit CI/CD
Transparency: This page may contain affiliate links.