Splice Machine this week announced it has open sourced its Spark-powered relational SQL database system.
The company has set up a cloud-based sandbox for developers to put its new open source Splice Machine 2.0 Community Edition to the test. The company also announced the release of a cluster version and the launch of a developer community site.
Splice Machine is a relational database management system, or RDBMS, designed for high-speed performance. The version 2.0 release integrates Apache Spark — a fast, open source engine for large-scale data processing — into its existing Hadoop-based architecture. The result is a flexible hybrid database that enables businesses to handle OLAP and OLTP workloads simultaneously.
The Splice Machine V2.0 sandbox, which is powered by Amazon Web Services, allows developers to initiate a cluster in minutes. They can choose the number of nodes in the cluster and specify each node’s type to accommodate a range of tests from small to enterprise-scale.
The Splice Machine is available in a free, full-featured Community edition and a licensed Enterprise edition. The Enterprise edition license includes 24/7 support and includes DevOps features such as backup and restore, LDAP support, Kerberos support, encryption, and column-level access privileges.
The company launched a website to support the growing Splice Machine community, which includes tutorials, videos, a developer forum, a GitHub repository, a StackOverflow tag and a Slack channel. These resources are rich with code to help developers, data scientists and DevOps learn to use Splice Machine.
“We look forward to having tens of thousands of users using and contributing to our software to help build the next exciting applications in the marketplace, and providing a rich community of experts to serve our customers,” said Splice Machine CEO Monte Zweben.
Iffy Road Ahead
Just qualifying as an open source project is only the beginning. It is sometimes difficult to capture the attention of developers who already are contributing time and code to the other 30-plus projects recognized by Apache, noted King.
That means that Splice Machine must make a concerted effort to get on the Apache radar. The company already works with Hadoop and Spark developers, so it stands a good chance of success, said King.
“In a worst case scenario, the Splice Machine project inspires a collective meh, and then it’s back to the drawing board,” he said. “In the best of all possible outcomes, the new project will improve Splice Machine’s solutions and expose them to a wider audience, resulting in new sales and services opportunities.”
The company faces a number of challenges along the way. The biggest challenge, according to King, is standing out in the increasingly crowded field of Apache-related projects. “Overcoming that will likely require Splice Machine to perform some serious evangelizing and professional networking with target developers and groups.”
Splice Machine 2.0 features include a scale-out architecture on commodity hardware with proven autosharding on HBase and Spark. In-memory technology provides better performance for OLAP queries with in-memory technology from Apache Spark.
Resource Isolation allows allocation of CPU and RAM resources to operational and analytical workloads, which enables prioritization of queries for workload scheduling. A management console with a Web user interface allows users to see the queries currently running, and provides the ability to drill down into each job to see the progress of the queries and identify potential bottlenecks.
Virtual table interfaces allow developers and data scientists to use SQL with data that is external to the database, such as Amazon S3, HDFS or Oracle. Compaction Optimization of storage files is managed in Spark rather than Hbase, to provide significant performance enhancements and operational stability. Apache Kafka-enabled Streaming enables the ingestion of real-time data streams.
What It Does
Splice Machine provides real-time operational and analytical applications while simplifying the Lambda architecture. That means businesses no longer have to manage the complexity of integrating multiple compute engines to ingest, serve or analyze data.
Splice Machine’s Lambda-in-a-Box architecture provides developers and data scientists with a way to store their data all in one place and just write SQL. The new architecture includes the ability to access external data and libraries with ease.
The Splice Machine RDBMS can execute federated queries on data in external databases and files using virtual table interfaces. It also can execute prebuilt Spark libraries for machine learning, stream analysis, data integration and graph modeling. So far, there are more than 130 libraries.
Why Open Source?
Splice Machine moved to open source for many of the same reasons that have driven other software companies to do so, Zweben told LinuxInsider. Among them are broader adoption, an insurance program for users who want to avoid product lock-in, faster development and quality.
“The critical risk for all open source companies is the same. We all need to innovate and develop features that are valuable to our customer base to properly monetize our Enterprise edition,” Zweben said.
Strong support from other FOSS (free and open source software) developers will likely be forthcoming, he added. Many Hadoop, Spark and HBase community leaders already have expressed excitement about joining the new community.
They will be instrumental in “mentoring us along the Apache process, and facilitating the growth and adoption of Splice Machine in the marketplace,” Zweben said.
Attaining the Goal
Making it easy for developers to get Splice Machine and test it at scale will be essential for digital marketers, financial institutions, life science and cybersecurity companies that need to process mixed OLTP and OLAP workloads, and that prefer technologies with a vibrant community, according to Zweben.
The open source community provides a lifespan that exceeds single company product longevity, he noted. An open source community provides a rich source of skill sets to help develop, expand, customize and operate the technology.
The open source version of Splice Machine is distributed under the Apache 2.0 license. It comes with both engines and most of its other features, including Apache Kafka streaming support.
However, the open source version leaves out a few enterprise-level options such as encryption, Kerberos support, column-level access control and backup/restore functionality.
“I believe Splice Machine’s move to open source is mainly about accelerating uptake among developers who often help drive their organization’s commercial engagements,” said Charles King, principal analyst at Pund-IT
“Splice Machine initially offered a free version of its relational database solution that was missing a few enterprise features,” he told LinuxInsider. “Becoming a recognized open source project is less ambiguous for users and potential customers. It also means that Splice Machine will benefit from contributions users make to the project.”