Skip to content

SEW Optimizer Tutorial

This tutorial will guide you through the process of setting up and running the SEW (Self-Evolving Workflow) optimizer in EvoAgentX. We'll use the HumanEval benchmark as an example to demonstrate how to optimize a multi-agent workflow.

1. Overview

The SEW optimizer is a powerful tool in EvoAgentX that enables you to:

  • Automatically optimize multi-agent workflows (prompts and workflow structure)
  • Evaluate optimization results on benchmark datasets
  • Support different workflow representation scheme (Python, Yaml, BPMN, etc.)

2. Setting Up the Environment

First, let's import the necessary modules for setting up the SEW optimizer:

from evoagentx.config import Config
from evoagentx.models import OpenAILLMConfig, OpenAILLM
from evoagentx.workflow import SEWWorkFlowGraph 
from evoagentx.agents import AgentManager
from evoagentx.benchmark import HumanEval 
from evoagentx.evaluators import Evaluator 
from evoagentx.optimizers import SEWOptimizer 
from evoagentx.core.callbacks import suppress_logger_info

Configure the LLM Model

Similar to other components in EvoAgentX, you'll need a valid OpenAI API key to initialize the LLM.

llm_config = OpenAILLMConfig(model="gpt-4o-mini", openai_key=OPENAI_API_KEY)
llm = OpenAILLM(config=llm_config)

3. Setting Up the Components

Step 1: Initialize the SEW Workflow

The SEW workflow is the core component that will be optimized. It represents a sequential workflow that aims to solve the code generation task.

sew_graph = SEWWorkFlowGraph(llm_config=llm_config)
agent_manager = AgentManager()
agent_manager.add_agents_from_workflow(sew_graph)

Step 2: Prepare the Benchmark

For this tutorial, we'll use a modified version of the HumanEval benchmark that splits the test data into development and test sets:

class HumanEvalSplits(HumanEval):
    def _load_data(self):
        # load the original test data 
        super()._load_data()
        # split the data into dev and test
        import numpy as np 
        np.random.seed(42)
        num_dev_samples = int(len(self._test_data) * 0.2)
        random_indices = np.random.permutation(len(self._test_data))
        self._dev_data = [self._test_data[i] for i in random_indices[:num_dev_samples]]
        self._test_data = [self._test_data[i] for i in random_indices[num_dev_samples:]]

# Initialize the benchmark
humaneval = HumanEvalSplits()

The SEWOptimizer will evaluate the performance on the development set by default. Please make sure the benchmark has a development set properly set up. You can either: - Use a benchmark that already provides a development set (like HotPotQA) - Split your dataset into development and test sets (like in the HumanEvalSplits example above) - Implement a custom benchmark with development set support

Step 3: Set Up the Evaluator

The evaluator is responsible for assessing the performance of the workflow during optimization. For more detailed information about how to set up and use the evaluator, please refer to the Benchmark and Evaluation Tutorial.

def collate_func(example: dict) -> dict:
    # convert raw example to the expected input for the SEW workflow
    return {"question": example["prompt"]}

evaluator = Evaluator(
    llm=llm, 
    agent_manager=agent_manager, 
    collate_func=collate_func, 
    num_workers=5, 
    verbose=True
)

4. Configuring and Running the SEW Optimizer

The SEW optimizer can be configured with various parameters to control the optimization process:

optimizer = SEWOptimizer(
    graph=sew_graph,           # The workflow graph to optimize
    evaluator=evaluator,       # The evaluator for performance assessment
    llm=llm,                   # The language model
    max_steps=10,             # Maximum optimization steps
    eval_rounds=1,            # Number of evaluation rounds per step
    repr_scheme="python",     # Representation scheme for the workflow
    optimize_mode="prompt",   # What aspect to optimize (prompt/structure/all)
    order="zero-order"        # Optimization algorithm order (zero-order/first-order)
)

Running the Optimization

To start the optimization process:

# Optimize the SEW workflow
optimizer.optimize(dataset=humaneval)

# Evaluate the optimized workflow
with suppress_logger_info():
    metrics = optimizer.evaluate(dataset=humaneval, eval_mode="test")
print("Evaluation metrics: ", metrics)

# Save the optimized SEW workflow
optimizer.save("debug/optimized_sew_workflow.json")

For a complete working example, please refer to sew_optimizer.py.