This guide demonstrates how to build an intelligent code research tool that can analyze and explain codebases using Codegen’s and LangChain. The tool combines semantic code search, dependency analysis, and natural language understanding to help developers quickly understand new codebases.
This example works with any public GitHub repository - just provide the repo name in the format owner/repo
Overview
The process involves three main components:
- A CLI interface for interacting with the research agent
- A set of code analysis tools powered by Codegen
- An LLM-powered agent that combines the tools to answer questions
Let’s walk through building each component.
First, let’s import the necessary components and set up our research tools:
from codegen import Codebase
from codegen.extensions.langchain.agent import create_agent_with_tools
from codegen.extensions.langchain.tools import (
ListDirectoryTool,
RevealSymbolTool,
SearchTool,
SemanticSearchTool,
ViewFileTool,
)
from langchain_core.messages import SystemMessage
We’ll create a function to initialize our codebase with a nice progress indicator:
def initialize_codebase(repo_name: str) -> Optional[Codebase]:
"""Initialize a codebase with a spinner showing progress."""
with console.status("") as status:
try:
status.update(f"[bold blue]Cloning {repo_name}...[/bold blue]")
codebase = Codebase.from_repo(repo_name)
status.update("[bold green]✓ Repository cloned successfully![/bold green]")
return codebase
except Exception as e:
console.print(f"[bold red]Error initializing codebase:[/bold red] {e}")
return None
Then we’ll set up our research tools:
# Create research tools
tools = [
ViewFileTool(codebase), # View file contents
ListDirectoryTool(codebase), # Explore directory structure
SearchTool(codebase), # Text-based search
SemanticSearchTool(codebase), # Natural language search
RevealSymbolTool(codebase), # Analyze symbol relationships
]
Each tool provides specific capabilities:
ViewFileTool
: Read and understand file contents
ListDirectoryTool
: Explore the codebase structure
SearchTool
: Find specific code patterns
SemanticSearchTool
: Search using natural language
RevealSymbolTool
: Analyze dependencies and usages
Step 2: Creating the Research Agent
Next, we’ll create an agent that can use these tools intelligently. We’ll give it a detailed prompt about its role:
RESEARCH_AGENT_PROMPT = """You are a code research expert. Your goal is to help users understand codebases by:
1. Finding relevant code through semantic and text search
2. Analyzing symbol relationships and dependencies
3. Exploring directory structures
4. Reading and explaining code
Always explain your findings in detail and provide context about how different parts of the code relate to each other.
When analyzing code, consider:
- The purpose and functionality of each component
- How different parts interact
- Key patterns and design decisions
- Potential areas for improvement
Break down complex concepts into understandable pieces and use examples when helpful."""
# Initialize the agent
agent = create_agent_with_tools(
codebase=codebase,
tools=tools,
chat_history=[SystemMessage(content=RESEARCH_AGENT_PROMPT)],
verbose=True
)
Step 3: Building the CLI Interface
Finally, we’ll create a user-friendly CLI interface using rich-click:
import rich_click as click
from rich.console import Console
from rich.markdown import Markdown
@click.group()
def cli():
"""🔍 Codegen Code Research CLI"""
pass
@cli.command()
@click.argument("repo_name", required=False)
@click.option("--query", "-q", default=None, help="Initial research query.")
def research(repo_name: Optional[str] = None, query: Optional[str] = None):
"""Start a code research session."""
# Initialize codebase
codebase = initialize_codebase(repo_name)
# Create and run the agent
agent = create_research_agent(codebase)
# Main research loop
while True:
if not query:
query = Prompt.ask("[bold cyan]Research query[/bold cyan]")
result = agent.invoke(
{"input": query},
config={"configurable": {"thread_id": 1}}
)
console.print(Markdown(result["messages"][-1].content))
query = None # Clear for next iteration
You can use the tool in several ways:
- Interactive mode (will prompt for repo):
- Specify a repository:
python run.py research "fastapi/fastapi"
- Start with an initial query:
python run.py research "fastapi/fastapi" -q "Explain the main components"
Example research queries:
- “Explain the main components and their relationships”
- “Find all usages of the FastAPI class”
- “Show me the dependency graph for the routing module”
- “What design patterns are used in this codebase?”
The agent maintains conversation history, so you can ask follow-up questions
and build on previous findings.
Advanced Usage
You can extend the agent with custom tools for specific analysis needs:
from langchain.tools import BaseTool
from pydantic import BaseModel, Field
class CustomAnalysisTool(BaseTool):
"""Custom tool for specialized code analysis."""
name = "custom_analysis"
description = "Performs specialized code analysis"
def _run(self, query: str) -> str:
# Custom analysis logic
return results
# Add to tools list
tools.append(CustomAnalysisTool())
Customizing the Agent
You can modify the agent’s behavior by adjusting its prompt:
CUSTOM_PROMPT = """You are a specialized code reviewer focused on:
1. Security best practices
2. Performance optimization
3. Code maintainability
...
"""
agent = create_agent_with_tools(
codebase=codebase,
tools=tools,
chat_history=[SystemMessage(content=CUSTOM_PROMPT)],
)