DVC (Data Version Control)
Free tierManage data the way code is managed — Git-like version control for AI/ML and data science.
Free tier available·Technical·API available·Open source
Key strengths
Git-like versioning for datasets and ML modelsOpen source with a large, active communitySeamlessly integrates with existing Git workflowsSupports petabyte-scale data lakes and object stores via lakeFSWorks with major cloud storage providers and local filesystems
Free tier + paid plans
San Francisco, USA
Founded 2017
Self-hostable
No ratings yet
DVC is a command-line tool and VS Code extension that acts as a Git extension, storing metadata and pointers in your Git repository while offloading large data files and model artifacts to remote storage (S3, GCS, Azure Blob, SSH, and more). It enables reproducible ML pipelines through a DAG-based pipeline system, and tracks experiments with lightweight branching semantics. lakeFS, its enterprise counterpart, provides a full Git-like branching model directly on top of object stores (S3-compatible, Azure Data Lake, GCS) for teams managing complex, large-scale data infrastructure.
