DVC (Data Version Control) logo

DVC (Data Version Control)

Free tier

Manage data the way code is managed — Git-like version control for AI/ML and data science.

Free tier available·Technical·API available·Open source

Key strengths

Git-like versioning for datasets and ML modelsOpen source with a large, active communitySeamlessly integrates with existing Git workflowsSupports petabyte-scale data lakes and object stores via lakeFSWorks with major cloud storage providers and local filesystems
Free tier + paid plans
San Francisco, USA
Founded 2017
Self-hostable
No ratings yet

Developer & Technical Documentation

DVC exposes a rich CLI and Python API for integrating into ML pipelines and CI/CD workflows:

  • Pipeline DAGs — Define stages with inputs, outputs, and commands in dvc.yaml. DVC builds a dependency graph and only re-runs stages whose dependencies have changed, enabling efficient, reproducible pipelines.
  • Experiment Tracking — Use dvc exp run, dvc exp show, and dvc exp diff to branch, run, and compare experiments without cluttering your Git history.
  • Remote Storage Backends — Out-of-the-box support for AWS S3, Google Cloud Storage, Azure Blob Storage, SSH/SFTP, HDFS, HTTP, and local paths. Configure via dvc remote add and dvc remote modify.
  • Python API — Access DVC programmatically via import dvc.api to open versioned data files directly in your scripts, enabling clean integration with training code.
  • VS Code Extension — Provides a GUI for managing experiments, visualizing pipeline DAGs, and comparing metrics without leaving the editor.
  • lakeFS Integration — For enterprise-scale needs, lakeFS layers a full Git branching model on top of S3-compatible object stores, enabling atomic commits, zero-copy branching, and data CI/CD at petabyte scale.