Data Management
An AIUS project is a plain directory on your machine. You own it — every input, intermediate dataset, and output is a regular file you can open, edit, and version.Project layout
context/ and your dataset in data/, then run aius in that directory.
Supported data formats
CSV, TSV, Excel (.xlsx), Parquet, and JSON.
Your data stays on your machine and is read by code running locally. Notebooks execute in a curated, fixed Python environment (pandas, numpy, scikit-learn, XGBoost, LightGBM, matplotlib, seaborn, and more) — you can’t install extra packages into it, which is what keeps runs reproducible.
Common how-tos
Add or replace a dataset
Drop the new file intodata/raw/ (or overwrite the existing one), then tell the agent in chat — e.g. “I’ve added q3_sales.parquet to data/raw, include it in the analysis.” It will profile the new data and fold it in.
Change the brief mid-project
Edit your files incontext/ (or context/CONTEXT.md, the agent’s written understanding of your brief), then tell the agent what changed. For a significant change of direction, say it explicitly in chat — the agent adapts its goals and plan from the conversation.
Correct the agent’s understanding
The pipeline pauses at three review points — use them:- Context review — read
context/CONTEXT.md. If the agent misread your brief, say so in chat and it rewrites its understanding before doing anything else. - Discovery review — check
output/discovery/. Wrong column interpretation, missed anomaly? Point it out now, before cleaning. - Goal review — the hard gate. The parsed goals live in
.aius/goals.json; nothing proceeds until you approve them. This is the cheapest place to fix a wrong assumption.
Re-run or tweak an analysis
Every analysis is a real notebook underoutput/notebooks/<analysis>/notebook.ipynb. Ask the agent to revise and re-run it (“redo the churn model without the leaky feature”), or edit the notebook yourself and ask the agent to continue from it.
Regenerate a deliverable
Ask the agent to redo the underlying analysis and republish — deliverables (report, model, deck, brief) are produced from the notebooks, so fixing the notebook and asking for a republish refreshes them on your dashboard.Pick up where you left off
Start over
context/ and data/ are untouched; your API key is preserved.
Good to know
data/raw/anddata/processed/are kept out of git by default (AIUS writes the ignore rules) — datasets don’t end up in your repo history.- Anything that touches your machine — shell commands, file writes, access outside the project — is gated by a permission prompt before it runs.