Data Management

An AIUS project is a plain directory on your machine. You own it — every input, intermediate dataset, and output is a regular file you can open, edit, and version.

Project layout

my-project/
  context/                 # your brief: goals, domain notes, definitions (markdown, docs)
  data/
    raw/                   # your source datasets (AIUS moves your files here on first run)
    processed/             # cleaned, analysis-ready datasets AIUS produces
  output/
    discovery/             # data profiling: structure, distributions, anomalies
    notebooks/<analysis>/  # one Jupyter notebook per analysis
    validation/            # model validation folds and predictions
  .aius/                   # project state: current stage, goals.json

Start a project by putting your brief in context/ and your dataset in data/, then run aius in that directory.

Supported data formats

CSV, TSV, Excel (.xlsx), Parquet, and JSON. Your data stays on your machine and is read by code running locally. Notebooks execute in a curated, fixed Python environment (pandas, numpy, scikit-learn, XGBoost, LightGBM, matplotlib, seaborn, and more) — you can’t install extra packages into it, which is what keeps runs reproducible.

Common how-tos

Add or replace a dataset

Drop the new file into data/raw/ (or overwrite the existing one), then tell the agent in chat — e.g. “I’ve added q3_sales.parquet to data/raw, include it in the analysis.” It will profile the new data and fold it in.

Change the brief mid-project

Edit your files in context/ (or context/CONTEXT.md, the agent’s written understanding of your brief), then tell the agent what changed. For a significant change of direction, say it explicitly in chat — the agent adapts its goals and plan from the conversation.

Correct the agent’s understanding

The pipeline pauses at three review points — use them:

Context review — read context/CONTEXT.md. If the agent misread your brief, say so in chat and it rewrites its understanding before doing anything else.
Discovery review — check output/discovery/. Wrong column interpretation, missed anomaly? Point it out now, before cleaning.
Goal review — the hard gate. The parsed goals live in .aius/goals.json; nothing proceeds until you approve them. This is the cheapest place to fix a wrong assumption.

Re-run or tweak an analysis

Every analysis is a real notebook under output/notebooks/<analysis>/notebook.ipynb. Ask the agent to revise and re-run it (“redo the churn model without the leaky feature”), or edit the notebook yourself and ask the agent to continue from it.

Regenerate a deliverable

Ask the agent to redo the underlying analysis and republish — deliverables (report, model, deck, brief) are produced from the notebooks, so fixing the notebook and asking for a republish refreshes them on your dashboard.

Pick up where you left off

aius --continue        # resume the last session on this project
aius --session <id>    # resume a specific session

Project state (stage, goals, outputs) lives in the project directory, so closing the terminal loses nothing.

Start over

aius --reset           # wipe local project state (asks for confirmation)
aius --delete          # delete the remote project AND reset this directory

aius --reset resets the pipeline to the beginning. Your files in context/ and data/ are untouched; your API key is preserved. aius --delete goes further — it removes this directory’s project from your AIUS account entirely and resets the directory to its bare inputs (drops .aius, .venv, .git, and generated artifacts). Use it to fully retire a project. It always asks you to type the project name to confirm — the remote delete can’t be skipped with --yes — and your login is preserved.

Good to know

data/raw/ and data/processed/ are kept out of git by default (AIUS writes the ignore rules) — datasets don’t end up in your repo history.
Anything that touches your machine — shell commands, file writes, access outside the project — is gated by a permission prompt before it runs.

​Data Management

​Project layout

​Supported data formats

​Common how-tos

​Add or replace a dataset

​Change the brief mid-project

​Correct the agent’s understanding

​Re-run or tweak an analysis

​Regenerate a deliverable

​Pick up where you left off

​Start over

​Good to know