An open-source software framework that supports data-intensive distributed applications. It supports running applications on large clusters of commodity hardware.