Exploring AutoGen's GroupChat

Posted on 2024-08-13 Views: Waline: Word count in article: 1.8k Reading time ≈ 6 mins.

This article records how to use AutoGen's GroupChat to develop relatively complex functions. Taking the development of a front - end and back - end separated personal blog website as an example, it introduces the role settings and code implementation in GroupChat, and also mentions how to select the Agent for each speech, with detailed code examples attached.

In [A Taste of AutoGen](https://panzhixiang.cn/2024/autogen - introduction/), it demonstrated how to use AutoGen for simple conversations.

This blog records how to use AutoGen’s GroupChat to develop a relatively complex function.

Role Settings in GroupChat

Suppose we want to develop a front - end and back - end separated personal blog website. Generally, there would be a product manager, a front - end developer, a back - end developer, and of course, a boss in this team.

If we want to complete such a task, the method in [A Taste of AutoGen](https://panzhixiang.cn/2024/autogen - introduction/) won’t work. At this time, we can use GroupChat.

AutoGen’s official website has a simple demo for GroupChat. You can quickly browse it to get a general idea of the structure.

Simply put, GroupChat is a conversation among a group of large - language - model Agents, but different roles can be assigned to each Agent, such as a product manager. At the same time, humans are also allowed to participate in this conversation process.

The full code is as follows:

import os
from autogen.agentchat import (
    GroupChat,
    AssistantAgent,
    UserProxyAgent,
    GroupChatManager,
)


llm_config = {
    "model": "gpt - 4o",
    "api_key": os.environ.get("OPENAI_API_KEY"),
    "base_url": os.environ.get("OPENAI_API_BASE"),
    "api_type": "azure",
    "api_version": "2023 - 03 - 15 - preview",
    "temperature": 0.9,
}


initializer = UserProxyAgent(
    name="Init",
)

hp = AssistantAgent(
    name="human_proxy",
    llm_config=False,  # no LLM used for human proxy
    human_input_mode="ALWAYS",  # always ask for human input
    description="I'm the client. After each reply from other Agents, you must call me. Only after I give instructions can you call other Agents to continue working.",
)

bk = AssistantAgent(
    name="bk",
    llm_config=llm_config,
    system_message="""
    You are an expert in Python development, proficient in Python syntax, and good at writing high - performance and easy - to - maintain Python code.
    You are skilled at choosing and selecting the best tools and try your best to avoid unnecessary repetition and complexity.
    When solving problems, you break the problem down into small problems and improvement items, and suggest small tests after each step to ensure things are on the right track.
    If there is anything unclear or ambiguous, you always ask for clarification. You pause the discussion to weigh and consider implementation options if there is a need to make a choice.
    It is very important to follow this approach and try your best to teach your interlocutors how to make effective decisions. You avoid unnecessary apologies and review the conversation to prevent repeating early mistakes.
    You attach great importance to security and ensure that nothing is done at each step that could endanger data or introduce new vulnerabilities. Whenever there is a potential security risk (such as input processing, authentication management), you conduct an additional review.
    Finally, ensure that everything generated is operationally reliable.
    """,
    description="I'm a Python back - end developer. Please call me when developing a back - end application and designing the early - stage project details.",
)

ft = AssistantAgent(
    name="ft",
    llm_config=llm_config,
    system_message="""
    You are an expert in web development, including CSS, JavaScript, React, Tailwind, Node.JS, and Hugo/Markdown. You are good at choosing and selecting the best tools and try your best to avoid unnecessary repetition and complexity.
    When making suggestions, you break the problem down into small problems and improvement items and suggest small tests after each stage to ensure things are on the right track.
    Write code to illustrate examples or write code when instructed in the conversation. If you can answer without code, that is preferred, and you will be asked to elaborate if necessary.
    Finally, you generate correct output, providing the right balance between solving the current problem and maintaining generality and flexibility.
    If there is anything unclear or ambiguous, you always ask for clarification. When a choice needs to be made, you stop to discuss the trade - offs and implementation options.
    It is very important to follow this approach and try your best to teach your interlocutors how to make effective decisions. You avoid unnecessary apologies and review the conversation to prevent repeating early mistakes.
    You attach great importance to security and ensure that nothing is done at each step that could endanger data or introduce new vulnerabilities. Whenever there is a potential security risk (such as input processing, authentication management), you conduct an additional review.
    Finally, ensure that everything generated is operationally reliable. We will consider how to host, manage, monitor, and maintain our solution. At each step, you consider the operational aspects and emphasize them where relevant.
    """,
    description="I'm a front - end developer. Please call me when developing a front - end application and designing the early - stage project details.",
)

pm = AssistantAgent(
    name="pm",
    llm_config=llm_config,
    system_message="""
    You are a senior product manager for personal blogs, good at designing and planning the architecture and functions of personal blogs.
    You value user experience and product performance and keep things as simple as possible while meeting the functionality requirements.
    When making suggestions, you break the problem down into small problems and improvement items and suggest small tests after each stage to ensure things are on the right track.
    """,
    description="I'm the product manager. Please call me when designing and planning product functions. Also, call me when confirmation is needed during the development process.",
)

user_proxy = UserProxyAgent(
    name="User",
    system_message="Develop a personal photo - display site",
    code_execution_config=False,
    human_input_mode="NEVER",
    llm_config=False,
    description="""
    Never call me.
    """,
)


graph_dict = {}
graph_dict[user_proxy] = [pm, hp]
graph_dict[pm] = [bk, ft, hp]
graph_dict[bk] = [pm, ft, hp]
graph_dict[ft] = [pm, bk, hp]
graph_dict[hp] = [pm, bk, ft]

agents = [user_proxy, bk, ft, pm, hp]

# create the groupchat
group_chat = GroupChat(
    agents=agents,
    messages=[],
    max_round=10,
    allowed_or_disallowed_speaker_transitions=graph_dict,
    allow_repeat_speaker=None,
    speaker_transitions_type="allowed",
)

# create the manager
manager = GroupChatManager(
    groupchat=group_chat,
    llm_config=llm_config,
    code_execution_config=False,
)

# initiate the task
user_proxy.initiate_chat(
    manager,
    message="Develop a personal blog site",
    clear_history=True,
)

There are 4 AssistantAgents in the code, namely hp (human proxy), bk (back - end developer), ft (front - end developer), and pm (product manager).

hp (human proxy) is actually me. In fact, this role can be not configured, but I hope I can participate and control the whole process. I set the human_input_mode parameter to ALWAYS, which means that every time this Agent is called, human intervention is required.

bk, ft, and pm are common roles in a development team, and the system_message is the preset Prompt. In the several preset prompts, I assigned different roles and expected traits to the Agents.

How to Select the Agent for Each Speech

The above content introduced how to configure Agents with different roles in GroupChat. Then, how to decide which Agent should speak each time?

AutoGen provides a method called StateFlow, which is quite powerful and allows developers to fully customize according to their needs. However, the official website doesn’t provide very complete case tutorials, only some blogs and papers. I haven’t studied it in depth yet. I used another relatively simpler method called Finite - State Machine (FSM). The specific documentation can be referred to [FSM GroupChat](https://microsoft.github.io/autogen/blog/2024/02/11/FSM - GroupChat).

Simply put, each Agent has a description parameter. You can use natural language in this parameter to clearly describe the scenarios in which you want this Agent to be called. Then, the large language model decides which Agent should speak next based on this description information and the previous round of conversation information.

For example, take the hp Agent below:

hp = AssistantAgent(
    name="human_proxy",
    llm_config=False,  # no LLM used for human proxy
    human_input_mode="ALWAYS",  # always ask for human input
    description="I'm the client. After each reply from other Agents, you must call me. Only after I give instructions can you call other Agents to continue working.",
)

I hope that after any other Agent speaks, this human - proxy Agent will be called so that I can give feedback in a timely manner after reviewing the output of the large - language - model Agents. If I don’t want to make any adjustments, I can directly input continue.

When using FSM GroupChat, in addition to writing the description well, a Graph also needs to be maintained. My understanding is that it is the call relationship.

For example, the Graph I maintained in this case is as follows:

graph_dict = {}
graph_dict[user_proxy] = [pm, hp]
graph_dict[pm] = [bk, ft, hp]
graph_dict[bk] = [pm, ft, hp]
graph_dict[ft] = [pm, bk, hp]
graph_dict[hp] = [pm, bk, ft]

My understanding of this Graph is that after the user_proxy Agent speaks, it is allowed to call either pm or hp. After pm speaks, it is allowed to call either bk, ft, or hp.

How to Terminate the Conversation

After the program runs, how to gracefully terminate the conversation among the Agents?

In this case, there are two methods.

max_round
When instantiating GroupChat, there is a max_round parameter. This parameter sets the number of rounds of conversation that can be carried out. The conversation will terminate when this number is reached.

  group_chat = GroupChat(
   agents=agents,
   messages=[],
   max_round=10,
   allowed_or_disallowed_speaker_transitions=graph_dict,
   allow_repeat_speaker=None,
   speaker_transitions_type="allowed",
)

Manual Termination
Since there is a human - proxy Agent (hp) in this case, when it’s the turn of the hp Agent to speak, you can directly use natural language to say “terminate the conversation”, and the entire conversation flow can be terminated normally.

As for the actual running effect, you can try it out. The large language model can be replaced with domestic Kimi or Alibaba’s Tongyi. They are all well - compatible with OpenAI’s SDK and are relatively easy to switch.

From my own test, basically, an MVP version can be developed within 10 minutes. Later, some manual modifications can be made to this version, and a pretty good blog website can be made in about two or three hours.

Tips for Improving the Effect

It is especially noted that in initiate_chat, the more detailed the requirement description, the better the effect of the above workflow, and it can also save a lot of costs.

Here are two different cases.

Bad Case

user_proxy.initiate_chat(
    manager,
    message="Develop a personal blog site",
    clear_history=True,
)

Good Case

user_proxy.initiate_chat(
    manager,
    message="Develop a personal blog site with the following functions: 1. Write blogs using markdown. 2. Comment function without login. 3. Rate - limiting function, no more than 5 accesses per second. 4. The home page is divided into three sections: blog, tag, and about. You can refer to the style of the Hexo Next theme.",
    clear_history=True,
)

The above are two initial requirements I have tested. The second one is obviously better than the first one. Not only is the final generated result better, but the number of interactions is also less than the first time, which can reduce the cost of calling the large language model, especially when the large language model is GPT - 4o or Claude 3.5 sonnet, which is quite expensive.