While Apache Spark was only released in 2014, it has quickly become the go to solution for processing quantities of data that are so large they cannot fit onto a single machine – otherwise known as “Big Data”. While running Spark on a single machine is a great way to become familiar with the API, the true power is only seen when Spark is deployed in a cluster environment. To enable Spark to use a cluster of computers, Spark requires you to setup and configure a cluster management application. Currently, Spark supports three such managers – a Standalone architecture, Yarn and Mesos. To get a feel for each of these, I setup a test cluster and documented my experience with each, while providing a high-level overview of what each manager brings to the table. I also experimented with a modern cluster manager – Mesosphere’s DC/OS, which builds on-top of Mesos to support a wide range of cluster applications (e.g. HDFS, Cassandra, Kafka, etc).
It’s been a few years since test-driven development changed the way I come think about software development. While I still strongly believe in TDD’s benefits, experience has taught me that it’s not without its challenges. Maintaining tests has a cost. Despite writing tests while we code, corner cases are not always identified. As the complexity of our code increases, so does the possibility that we’ll fail to account for a key scenario or that a legacy function we’re using won’t behave as expected. In a recent project, we utilized property-based testing as a means of mitigating these challenges – in particular, providing greater coverage with fewer tests and the ability to discover hidden corner cases. This created the perfect opportunity to explore how property-based testing compares with the more well known xUnit style of tests.