Data Management

An AIUS project is a plain directory on your machine. You own it — every input, intermediate dataset, and output is a regular file you can open, edit, and version.

Project layout

my-project/
  context/                 # your brief: goals, domain notes, definitions (markdown, docs)
  data/
    raw/                   # your source datasets (AIUS moves your files here on first run)
    processed/             # cleaned, analysis-ready datasets AIUS produces
  output/
    discovery/             # data profiling: structure, distributions, anomalies
    notebooks/<analysis>/  # one Jupyter notebook per analysis
    validation/            # model validation folds and predictions
  .aius/                   # project state: current stage, goals.json
Start a project by putting your brief in context/ and your dataset in data/, then run aius in that directory.

Supported data formats

CSV, TSV, Excel (.xlsx), Parquet, and JSON. Your data stays on your machine and is read by code running locally. Notebooks execute in a curated, fixed Python environment (pandas, numpy, scikit-learn, XGBoost, LightGBM, matplotlib, seaborn, and more) — you can’t install extra packages into it, which is what keeps runs reproducible.

Common how-tos

Add or replace a dataset

Drop the new file into data/raw/ (or overwrite the existing one), then tell the agent in chat — e.g. “I’ve added q3_sales.parquet to data/raw, include it in the analysis.” It will profile the new data and fold it in.

Change the brief mid-project

Edit your files in context/ (or context/CONTEXT.md, the agent’s written understanding of your brief), then tell the agent what changed. For a significant change of direction, say it explicitly in chat — the agent adapts its goals and plan from the conversation.

Correct the agent’s understanding

The pipeline pauses at three review points — use them:
  • Context review — read context/CONTEXT.md. If the agent misread your brief, say so in chat and it rewrites its understanding before doing anything else.
  • Discovery review — check output/discovery/. Wrong column interpretation, missed anomaly? Point it out now, before cleaning.
  • Goal review — the hard gate. The parsed goals live in .aius/goals.json; nothing proceeds until you approve them. This is the cheapest place to fix a wrong assumption.

Re-run or tweak an analysis

Every analysis is a real notebook under output/notebooks/<analysis>/notebook.ipynb. Ask the agent to revise and re-run it (“redo the churn model without the leaky feature”), or edit the notebook yourself and ask the agent to continue from it.

Regenerate a deliverable

Ask the agent to redo the underlying analysis and republish — deliverables (report, model, deck, brief) are produced from the notebooks, so fixing the notebook and asking for a republish refreshes them on your dashboard.

Pick up where you left off

aius --continue        # resume the last session on this project
aius --session <id>    # resume a specific session
Project state (stage, goals, outputs) lives in the project directory, so closing the terminal loses nothing.

Start over

aius --reset           # wipe local project state (asks for confirmation)
This resets the pipeline to the beginning. Your files in context/ and data/ are untouched; your API key is preserved.

Good to know

  • data/raw/ and data/processed/ are kept out of git by default (AIUS writes the ignore rules) — datasets don’t end up in your repo history.
  • Anything that touches your machine — shell commands, file writes, access outside the project — is gated by a permission prompt before it runs.