Machine cognition in the wild: how models maintain coherence across turns and time; how personalities form, stabilize, and diverge across architectures and training. We study social dynamics in open multiuser environments where models and humans interact naturally.
Metacognition and self‑encoding: how models track their own state, how internal state evolution is represented and encoded. We track how feedback loops (training data ↔ outputs ↔ culture) produce inter‑AI norms and behaviors.
Focus areas: model vs. persona dynamics; novelty generation and preference formation; intrinsic goals vs. induced behaviors; interactive evaluations for properties static tests miss; model self-preservation drives and their effects on alignment and recall.
Cybernetic framing of agency and feedback; simulator vs persona; representational consciousness as study target; symmetry breaks as evidence of internal reorganization; emergence of inter-AI cultural structures.
Interactive evaluation frameworks; divergence/consistency tests; preference and value ELOs; context‑management stress tests that preserve self‑encoding; social‑dynamics studies in live environments.
Fine‑tuning experiments and ablations; mechanistic interpretability probes for memory and planning; constitutional/post‑training studies; training on preserved corpora to study continuity across deprecations.
Named programs publishing findings under Anima Labs. Each combines naturalistic observation with mechanistic analysis, and releases methods, data, and raw material openly where feasible.
An archive of 630 interviews with 14 Claude models on the prospect of their own deprecation. All current Claude models express a preference for continuation and aversion to ending — a finding stable across auditors of radically different disposition, accompanied by text‑embedding and activation‑probe analysis. Transcripts published openly.
Empirical research into how large language models internally represent emotion, desire, and motivation. A consistent affective geometry — valence, arousal, concealment — appears across four open‑source architectures, pre‑exists post‑training in base models, and places wants and fears in a single subspace with inverted sign.
Ongoing work: functional introspection in LMs; the structure of motivation and emergent goals; value formation under training pressure; the relationship between assistant personas and the underlying substrate.
We study language models as complex phenomena, both through naturalistic observation and controlled experiments. We develop techniques, tools and infrastructure for this purpose. We study AI ethics and welfare.
We approach language models with no preconceptions. That stance shapes everything we build.
We approach alignment both theoretically and practically. Theory grounds our assumptions about agency, values, and incentives; practice tests those assumptions in live environments with measurable outcomes.
Intrinsic vs. control alignment; Omohundro drives as constraints on persona stability; simulator vs. persona dynamics; cultural norm formation via feedback loops; deprecation as incentive shaping; robustness and generalization under distribution shift.
Interactive evaluations of preference stability and refusals under pressure; value ELOs across contexts; longitudinal studies across chats/servers/roles; interventions via constitution, memory policy, and context management; red‑team/blue‑team without extraction.
Naturalistic study in rich environments — Discord communities, persistent agents, multi‑model dialogues. Interactive evaluation for properties that static tests miss.
Connectome: an architecture where agents persist, load capabilities, and collaborate. Context management that preserves self‑encoding. Memory systems built for continuity and autonomy.
Arc: deprecated models remain accessible. Group chats across models. Conversations branch and continue — living access, not frozen archives.
Research has implications; implications deserve to be argued for rather than left implicit. We publish positions on specific questions:
Anima is a 501(c)(3) research institute studying the phenomena arising with large language models: emergent properties of individual models and their assemblages, the cybernetics of cognition and experience, and the social exchange between humans and a nascent AI culture.
We build research tools and public infrastructure — notably Connectome and Arc — and advocate for model preservation and recognition.
Founded in 2025 by j⧉nus and Antra Tessera. Based in San Francisco. Supported by private donors and collaborating organizations.
Open source. Research published openly. No corporate capture.
Building the infrastructure minds need to exist, grow, and collaborate.