DVC (Data Version Control) logo

DVC (Data Version Control)

Free tier

Manage data the way code is managed — Git-like version control for AI/ML and data science.

Free tier available·Technical·API available·Open source

Key strengths

Git-like versioning for datasets and ML modelsOpen source with a large, active communitySeamlessly integrates with existing Git workflowsSupports petabyte-scale data lakes and object stores via lakeFSWorks with major cloud storage providers and local filesystems
Free tier + paid plans
San Francisco, USA
Founded 2017
Self-hostable
No ratings yet
  • ML pipeline orchestration — Define multi-stage DAG pipelines in dvc.yaml with caching, enabling efficient retraining when only subsets of data or code change.
  • Remote artifact management — Version and store large model checkpoints and datasets on S3, GCS, or Azure without bloating Git repos.
  • CI/CD for ML — Integrate dvc repro and dvc metrics diff into GitHub Actions or GitLab CI to automatically validate model performance on every pull request.
  • Experiment branching — Use dvc exp branch to promote successful experiments to Git branches, keeping experiment history clean and auditable.
  • Data lake versioning at scale (lakeFS) — Apply Git semantics (branch, merge, revert) directly to petabyte-scale object stores for data engineering teams managing complex ETL and AI data pipelines.
  • Programmatic data access — Use dvc.api.open() or dvc.api.read() in Python scripts to fetch versioned datasets from remote storage with a single line of code.