Juttle’s Cross Platform Optimization Strategy

Juttle can analyze data that is in any of several data stores. It has processors like reduce, sort, head, and tail that operate over a sequence of data points. For instance, head n emits the first n points it receives and drops the rest. reduce count() increments a counter for each data point and returns the total. The built-in implementations of these processors are written in Node.js.

These Node.js processors are functionally correct, but they are often not the most efficient way to operate on data stored in a given database. For instance, the fastest way to count the records in a SQL table is SELECT COUNT(*) FROM my_table. But to count the records in a table using the Node.js implementation of reduce count(), we’d have to SELECT * from my_table, build a Juttle data point from every record, and perform the count in Javascript. It would be much faster if Juttle knew it could use SELECT COUNT to calculate the result of reduce count(). The Juttle Optimizer is the piece in Juttle’s architecture that turns Juttle programs into efficient queries like this.

Continue reading “Juttle’s Cross Platform Optimization Strategy”

Orestes: a Time Series Database Backed by Cassandra and Elasticsearch

I used to work at a data analysis startup called Jut. Jut’s vision was to bring all your data together in a single environment. This enabled integrated analysis using our programming language, Juttle. It was challenging because there are many different types of data. Different data types require different models for optimal storage and querying. At the highest level, Jut divided all data into two kingdoms: metrics and events. Today I’ll cover the design and implementation of the metrics side, which was covered by a database named Orestes that we built.

Continue reading “Orestes: a Time Series Database Backed by Cassandra and Elasticsearch”

How I fixed Elasticsearch

Putting mappings in their place

After my uproarious success fixing node.js, I was ready to fix something else. I checked node.js’s open issues, but I didn’t see any that I’d be able to jump on. So I turned my attention elsewhere. Another technology I have a fair amount of experience with is Elasticsearch – I wrote the Juttle Elastic Adapter, after all.

I moseyed on over to the Elasticsearch issues list, where I found issue 15381: Config index.mapper.dynamic:false is not honored. I didn’t know what that meant, but the issue had the “adoptme” and “low-hanging fruit” labels, so it seemed like a good place to start contributing to Elasticsearch.

Continue reading “How I fixed Elasticsearch”

Pushing the performance limits of node.js

Building a data analysis platform in Javascript

Historical note: This was originally published as a post on Jut’s blog. Nobody wanted to pay for the product it describes, so Jut has gone in a very different direction of late, and Jut’s blog is a 404 at the moment. As a technical piece, though, I think it merits keeping alive.

We love node.js and Javascript. We love them so much, in fact, that when Jut decided to build a streaming analytics platform from scratch, we put node.js at the center of it all. This decision has brought us several benefits, but along with those came a few unique scaling challenges. With some careful programming, we’ve been able to largely overcome node.js’s limitations: I’ll share with you some of the tricks we used.

Continue reading “Pushing the performance limits of node.js”