Support for data systems under pressure: memory limits, long runtimes, unstable types, and cost growth. Reviews also cover automation, LLM systems, and twin builds when they sit inside a data workflow. Output is a short plan plus specific actions, based on evidence from logs and runs.
Measured fixes for large data workflows
Support focused on pipelines under pressure: memory limits, long runtimes, unstable types, and cost growth.
Output is a short plan plus specific actions, based on evidence from logs and runs.
Working style
Short scope. Clear checks. Measured changes.
No long calls without artefacts.
pipeline review
cost & runtime
type stability
1) Pipeline review
Fast review of an existing pipeline to locate failure points and slow steps.
Memory growth and retention checks
Type drift and schema stability checks
I/O bottlenecks (CSV parsing, joins, writes)
Failure modes: retries, timeouts, partial outputs
diagnosis
stability
2) Cost & runtime audit
Reduce total run time and spend, without changing the meaning of outputs.
Compute time per stage (where time is spent)
Storage and file format choices (CSV vs Parquet)
Small-file issues and partition decisions
Repeat-run waste (unneeded recompute)
cost
runtime
3) Validation pack
Lightweight checks to confirm outputs can be trusted after changes.
Row counts, null rates, and basic distributions
Join sanity checks and key integrity
Sample stability checks across runs
Run notes for reproducibility
correctness
repeatability
4) LLM system review Review a retrieval or assistant setup (RAG or tool use) with checks for quality, safety, and cost.
Data source selection and indexing rules Retrieval checks: recall risk, empty hits, duplication Answer checks: groundedness, format rules, refusal rules Cost and latency limits: caching, rate limits, fallbacks llm evaluation
5) Automation review Design or review automation that runs safely: guardrails, fallbacks, monitoring, and audit traces.
Risk list and guardrails (what must never happen) Fallback logic (manual review, safe defaults) Monitoring and alerts (failure, drift, cost) Run logs and change control automation monitoring
6) Twin modelling review Review a digital or geomatic twin build: data feeds, state updates, spatial joins, and validation rules.
State update loop (feeds, timing, id rules) Spatial handling (tiling, joins, indexing) Validation (ground truth, bounds, update rate) Link to prediction and planning outputs twins spatial
What is delivered?
A short report: failure points, measured bottlenecks, and a ranked change list.
When relevant, include a template checklist for future runs.
What if data is sensitive?
Work can be based on logs, schema summaries, and small redacted samples.
No raw data is required by default.
How is success measured?
Runtime reduction, peak memory reduction, fewer retries, and stable outputs
(row counts, null rates, and key distributions).
Can this work across domains?
Yes. Focus stays on pipeline behaviour: file formats, joins, sampling, memory,
and correctness checks.