Overview

Pepclaw is a research swarm.

Pepclaw is an autonomous research swarm for nonclinical peptide discovery. It runs missions — bounded, single-question research jobs — and emits dossiers — buyer-safe, citation-anchored deliverables.

Every mission goes through a 5-layer DAG of agent pools. Layer 1 ingests evidence; layer 2 annotates and scouts novelty; layer 3 grades evidence A → X; layer 4 reasons + critiques; layer 5 synthesizes the dossier. The dossier never assumes more than the evidence allows.

Architecture

The 12 agent pools.

  • Literature Minerupstream · literature_miner

    PubMed E-utilities. Pulls peer-reviewed papers with full PMID provenance and structured abstracts.

  • Sequence & Structureupstream · sequence_structure

    UniProt + PDB + AlphaFold. Annotates peptide targets, structural confidence, lipidation tolerance.

  • Target & Pathwayupstream · target_pathway

    OpenTargets GraphQL + Reactome. Maps targets to diseases and mechanism-of-action pathways.

  • Variant Linkerupstream · variant_linker

    ChEMBL REST. Links targets to known small molecules and binding-assay endpoints.

  • ADMET Developabilityupstream · admet_developability

    Heuristic ADMET scorecards: solubility, half-life, peptide developability, off-target risk.

  • Novelty Scoutreasoning · novelty_scout

    Whitespace detector. Scores each finding for novelty against historical mission corpus.

  • Patent Competitivereasoning · patent_competitive

    Patent landscape and freedom-to-operate surveillance via Lens.org-compatible signals.

  • Thesis Generatorreasoning · thesis_generator

    Composes structured, falsifiable hypotheses with evidence-cited mechanism claims.

  • Evidence Graderreasoning · evidence_grader

    Grades each finding A/B/C/D/X using study type, replication and reporting strength.

  • Red Teamreasoning · red_team

    Three personas — Skeptic / Scientist / Senior Reviewer. Only Senior can issue a hard block.

  • Synthesizeroutput · synthesizer

    Consolidates cross-pool findings and theses into the synthesis document.

  • Dossier Assembleroutput · dossier_assembler

    Buyer-safe markdown dossier with PMID citations, hedged claims and full evidence chain.

Protocol

Commit / reveal — tamper-evident questions.

Before any agent runs, Pepclaw computes:

message = JSON.stringify({
  query: "...",
  target_class: "...",
  schema: "pepclaw.commit.v1",
  salt: <random 16 bytes hex>,
});

commit_hash = sha256(message);   // public, written to mission row immediately
commit_salt = <salt>;            // private until the mission completes

On completion, Pepclaw publishes the salt. Anyone can re-hash the original question + salt and verify the run was honest end-to-end. If a mission is aborted before completion, the salt remains sealed and the commit hash stays as a public commitment that no answer was ever delivered for that question.

Quality

Evidence grading — A through X.

Every finding is graded with an explicit rubric. The rubric is shipped with Pepclaw, not learned, and not opaque.

GradeMeaning
AMultiple independent peer-reviewed studies, replicated, with concordant readouts.
BSingle peer-reviewed study with rigorous methodology, or strong concordance from indirect sources.
CPreprint, conference, or single-method evidence; plausible but not replicated.
DIndirect inference, weak methodology, or fragile single-source claim.
XInsufficient or contradictory evidence; cannot ground a thesis.
Data

Data sources — real, not mocked.

  • PubMed (NCBI E-utilities)
    esearch + efetch, PMID-anchored citations
  • UniProt
    Protein entries, taxonomy, function annotations
  • AlphaFold
    Predicted structures and pLDDT confidence per residue
  • OpenTargets GraphQL
    Target ↔ disease associations + therapeutic areas
  • ChEMBL REST
    Targets, ligands, bioactivity priors
  • Reactome
    Pathway membership and cross-references
  • Lens.org Patents
    Whitespace and freedom-to-operate signals (pending)
  • ClinicalTrials.gov
    Trial landscape (future)
Output

Dossier shape — buyer-safe markdown.

Every dossier follows the same skeleton, by construction:

## Question
<the original mission query, verbatim>

## Cross-pool consensus
- Literature ...
- Sequence/structure ...
- Target/pathway ...
- ChEMBL ligand prior ...

## Open questions
## Risks
## Recommended next steps

The dossier is deterministic given the upstream evidence. It does not invent claims, does not embed PMIDs that weren't retrieved, and never makes a human-use claim.

Reference

HTTP API.

  • POST
    /api/missions
    Start a mission. Body: { query, target_class?, depth?, budget_cents? }. Returns 202 + mission_id.
  • GET
    /api/missions
    List missions, latest 50.
  • GET
    /api/missions/:id
    Mission state: tasks, findings, theses, critiques, dossiers.
  • GET
    /api/dashboard?mission_id=…
    Aggregated dashboard payload (the same one /app uses).
  • GET
    /api/stream?mission_id=…
    Server-Sent Events feed of swarm events. Heartbeats every ~700ms.
Reasoning

kie.ai integration.

Pepclaw's reasoning agents — Thesis Generator, Red Team and Synthesizer — call POST https://api.kie.ai/codex/v1/responses with model gpt-5-4. Reasoning effort is configurable per call (low → xhigh).

POST https://api.kie.ai/codex/v1/responses
authorization: Bearer $KIE_API_KEY
content-type: application/json

{
  "model": "gpt-5-4",
  "stream": false,
  "input": [
    { "role": "user", "content": [{ "type": "input_text", "text": "..." }] }
  ],
  "reasoning": { "effort": "medium" }
}

If KIE_API_KEY is not set, Pepclaw transparently falls back to deterministic templated reasoning so the swarm still ships dossiers end-to-end. The mission ledger marks any fallback runs explicitly so replays can distinguish “model output” from “deterministic stub”.