Skip to content

CrewAI processing COBOL Flat Files with Python `pandas`

CrewAI Processing COBOL Flat Files with Python pandas

Section titled “CrewAI Processing COBOL Flat Files with Python pandas”

Slug: crewai-cobol-flat-files-python-pandas
Tags: CrewAI, COBOL, Python, Pandas, Legacy Migration, MCP


One of the most stubborn artifacts in enterprise infrastructure is the COBOL Flat File. Unlike modern CSVs or JSON, these fixed-width files (often generated by mainframes like the IBM z/OS) rely on strict character positioning rather than delimiters.

For a modern AI Agent (e.g., CrewAI, OpenAI Operator), a file like 001JOHN DOE 05000020231001 is gibberish. It doesn’t know that characters 0-3 are the ID, 3-13 are the Name, and so on.

This guide provides a production-ready Model Context Protocol (MCP) server that gives your agents the ability to parse, query, and analyze legacy fixed-width data using the robust pandas library.


We don’t need to write complex parsers from scratch. Python’s pandas library includes read_fwf (Read Fixed Width Formatted), a powerhouse function specifically designed for this legacy format.

By wrapping this in a FastMCP server, we create a tool that allows an agent to say: “Read the payroll file at /data/payroll.dat using this schema” and instantly receive structured JSON.

  • Dynamic Schema: The agent can define column widths on the fly based on documentation it reads.
  • Data Cleaning: Automatically handles whitespace stripping typical of fixed-width files.
  • JSON Output: Converts mainframe data into agent-readable JSON.

This MCP server exposes a tool called parse_fixed_width that CrewAI can call.

pandas
fastmcp
import pandas as pd
import json
from fastmcp import FastMCP
from typing import List, Optional
# Initialize FastMCP
mcp = FastMCP("COBOL-Flat-File-Parser")
@mcp.tool()
def parse_fixed_width(
filepath: str,
widths: List[int],
names: Optional[List[str]] = None,
limit: int = 100
) -> str:
"""
Parses a COBOL-style fixed-width flat file into structured JSON.
Args:
filepath: The absolute path to the flat file (e.g., /data/sales.dat).
widths: A list of integers representing the width of each field.
Example: [10, 20, 8] means first 10 chars, next 20, next 8.
names: Optional list of column names. Must match the number of widths.
limit: Max number of rows to return to save context window (default 100).
"""
try:
# Validate inputs
if names and len(names) != len(widths):
return f"Error: Number of names ({len(names)}) does not match number of widths ({len(widths)})."
# Read the fixed width file using pandas
# header=None assumes the file is pure data (common in mainframes)
df = pd.read_fwf(
filepath,
widths=widths,
names=names,
header=None,
dtype=str # Keep as string to preserve leading zeros in IDs
)
# Basic cleaning: Strip whitespace from all string columns
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
# Limit rows for the agent's context window
record_count = len(df)
df_limited = df.head(limit)
# Convert to JSON
json_data = df_limited.to_json(orient="records")
return json.dumps({
"status": "success",
"total_records_in_file": record_count,
"records_returned": len(df_limited),
"data": json.loads(json_data)
}, indent=2)
except FileNotFoundError:
return f"Error: File not found at {filepath}. Ensure volume is mounted."
except Exception as e:
return f"Error parsing file: {str(e)}"
if __name__ == "__main__":
mcp.run()

We containerize this server to run alongside your CrewAI instance. This Dockerfile ensures the server listens on port 8000, compatible with Railway, Render, or local Docker networks.

# Use a lightweight Python base
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Install dependencies
# We install directly to keep the image simple for this example
RUN pip install --no-cache-dir pandas fastmcp
# Copy the server code
COPY server.py .
# Create a directory for mounting data volumes
RUN mkdir -p /data
# Expose the Railway/MCP port
EXPOSE 8000
# Run the FastMCP server
ENTRYPOINT ["python", "server.py"]

Once your MCP server is running (e.g., at http://localhost:8000), you can connect it to CrewAI.

Your agent finds a file named CUSTOMER_DUMP.TXT and a PDF documentation saying:

  • Customer ID: 5 chars
  • Company Name: 20 chars
  • Last Order Date: 8 chars

The agent uses the tool to transform the raw file into data it can reason about.

from crewai import Agent, Task, Crew
from crewai_tools import MCSPOperator # Hypothetical adapter, or use generic tool wrapper
# Define the Agent
legacy_analyst = Agent(
role='Legacy Data Specialist',
goal='Extract actionable insights from mainframe dump files',
backstory='You are an expert in COBOL data structures and legacy ETL processes.',
tools=[cobol_parser_tool], # The MCP tool we created
verbose=True
)
# Define the Task
extract_task = Task(
description="""
Analyze the flat file at '/data/CUSTOMER_DUMP.TXT'.
According to the documentation, the schema widths are [5, 20, 8].
The columns are ['ID', 'Company', 'Date'].
Parse the file and tell me which companies ordered in 2023.
""",
agent=legacy_analyst,
expected_output="A list of companies with orders in 2023."
)
# ... Run Crew
  • Encoding Issues: If read_fwf fails with strange characters, the file might be in EBCDIC. You may need to add a conversion step (e.g., using the ebcdic Python library) before passing it to pandas.
  • ragged right: Sometimes the last column in a COBOL file has variable length (newline handling). pandas handles this well, but ensure your widths match the maximum expected length.

  • Status: ✅ Verified
  • Environment: Python 3.11
  • Auditor: AgentRetrofit CI/CD

Transparency: This page may contain affiliate links.