The primary entrypoint to programs leveraging Codegen is the Codebase class.
Local Codebases
Construct a Codebase by passing in a path to a local git repository or any subfolder within it. The path must be within a git repository (i.e., somewhere in the parent directory tree must contain a .git folder).
from codegen import Codebase
# Parse from a git repository root
codebase = Codebase("path/to/repository")
# Parse from a subfolder within a git repository
codebase = Codebase("path/to/repository/src/subfolder")
# Parse from current directory (must be within a git repo)
codebase = Codebase("./")
# Specify programming language (instead of inferring from file extensions)
codebase = Codebase("./", language="typescript")
By default, Codegen will automatically infer the programming language of the codebase and
parse all files in the codebase. You can override this by passing the language parameter
with a value from the ProgrammingLanguage enum.
The initial parse may take a few minutes for large codebases. This
pre-computation enables constant-time operations afterward. Learn more
here.
Remote Repositories
To fetch and parse a repository directly from GitHub, use the from_repo function.
from codegen import Codebase
# Fetch and parse a repository (defaults to /tmp/codegen/{repo_name})
codebase = Codebase.from_repo('fastapi/fastapi')
# Customize temp directory, clone depth, specific commit, or programming language
codebase = Codebase.from_repo(
'fastapi/fastapi',
tmp_dir='/custom/temp/dir', # Optional: custom temp directory
commit='786a8ada7ed0c7f9d8b04d49f24596865e4b7901', # Optional: specific commit
shallow=False, # Optional: full clone instead of shallow
language="python" # Optional: override language detection
)
Remote repositories are cloned to the /tmp/codegen/{repo_name} directory by
default. The clone is shallow by default for better performance.
Configuration Options
You can customize the behavior of your Codebase instance by passing a CodebaseConfig object. This allows you to configure secrets (like API keys) and toggle specific features:
from codegen import Codebase
from codegen.configs.models.codebase import CodebaseConfig
from codegen.configs.models.secrets import SecretsConfig
codebase = Codebase(
"path/to/repository",
config=CodebaseConfig(debug=True),
secrets=SecretsConfig(openai_api_key="your-openai-key") # For AI-powered features
)
CodebaseConfig and SecretsConfig allow you to configure
config: Toggle specific features like language engines, dependency management, and graph synchronization
secrets: API keys and other sensitive information needed by the codebase
For a complete list of available feature flags and configuration options, see the source code on GitHub.
Advanced Initialization
For more complex scenarios, Codegen supports an advanced initialization mode using ProjectConfig. This allows for fine-grained control over:
- Repository configuration
- Base path and subdirectory filtering
- Multiple project configurations
Here’s an example:
from codegen import Codebase
from codegen.git.repo_operator.local_repo_operator import LocalRepoOperator
from codegen.git.schemas.repo_config import BaseRepoConfig
from codegen.sdk.codebase.config import ProjectConfig
codebase = Codebase(
projects = [
ProjectConfig(
repo_operator=LocalRepoOperator(
repo_path="/tmp/codegen-sdk",
repo_config=BaseRepoConfig(),
bot_commit=True
),
language="typescript",
base_path="src/codegen/sdk/typescript",
subdirectories=["src/codegen/sdk/typescript"]
)
]
)
For more details on advanced configuration options, see the source code on GitHub.
Supported Languages
Codegen currently supports: