Juttle’s Cross Platform Optimization Strategy

Juttle can analyze data that is in any of several data stores. It has processors like reduce, sort, head, and tail that operate over a sequence of data points. For instance, head n emits the first n points it receives and drops the rest. reduce count() increments a counter for each data point and returns the total. The built-in implementations of these processors are written in Node.js.

These Node.js processors are functionally correct, but they are often not the most efficient way to operate on data stored in a given database. For instance, the fastest way to count the records in a SQL table is SELECT COUNT(*) FROM my_table. But to count the records in a table using the Node.js implementation of reduce count(), we’d have to SELECT * from my_table, build a Juttle data point from every record, and perform the count in Javascript. It would be much faster if Juttle knew it could use SELECT COUNT to calculate the result of reduce count(). The Juttle Optimizer is the piece in Juttle’s architecture that turns Juttle programs into efficient queries like this.

Continue reading “Juttle’s Cross Platform Optimization Strategy”

Orestes: a Time Series Database Backed by Cassandra and Elasticsearch

I used to work at a data analysis startup called Jut. Jut’s vision was to bring all your data together in a single environment. This enabled integrated analysis using our programming language, Juttle. It was challenging because there are many different types of data. Different data types require different models for optimal storage and querying. At the highest level, Jut divided all data into two kingdoms: metrics and events. Today I’ll cover the design and implementation of the metrics side, which was covered by a database named Orestes that we built.

Continue reading “Orestes: a Time Series Database Backed by Cassandra and Elasticsearch”

How I fixed Node.js again

Yo dawg, I heard you like requires…

As usual, I was browsing the node.js issues page when I noticed #4467. The project owner jasnell had identified a small script that unexpectedly crashed with the message console.error is not a function. Huh? Last I checked, console.error is totally a function. I had to get to the bottom of this one. It turned out to be quite a rabbit hole, taking me through some of node.js’s most important features. Hold on to your hat!

Continue reading “How I fixed Node.js again”

How I fixed Atom

When good regexes go bad

Atom is the hot new up-and-comer in the world of text editing. It is my editor of choice for building software, and it’s open source, so I decided to check out its issues to see how I could contribute. I came across a strange bug: the Atom user speter had written a line of text that, when you pressed Enter at its end, caused Atom to calculate for half an hour before writing a new line. I was pretty stunned that such a simple and common operation could perform so atrociously, so I decided to jump in and figure out what was going on.

Continue reading “How I fixed Atom”

How I fixed libuv

A deep dive into the foundation of Node

It was another lazy, rainy winter afternoon, so once again I was looking to make the world of node.js a better place. Browsing the issues, I was struck by the bizarreness of #4291. The user anseki had written a simple script to read the next line the user types into the terminal. If you resized the window running the program while it was waiting for input, the program crashed with a segmentation fault.

Segmentation faults can happen in C when a program makes an invalid memory access. So somehow, resizing the window triggered an invalid C memory access from this Javascript program. Most untoward!

Continue reading “How I fixed libuv”

How I fixed Elasticsearch

Putting mappings in their place

After my uproarious success fixing node.js, I was ready to fix something else. I checked node.js’s open issues, but I didn’t see any that I’d be able to jump on. So I turned my attention elsewhere. Another technology I have a fair amount of experience with is Elasticsearch – I wrote the Juttle Elastic Adapter, after all.

I moseyed on over to the Elasticsearch issues list, where I found issue 15381: Config index.mapper.dynamic:false is not honored. I didn’t know what that meant, but the issue had the “adoptme” and “low-hanging fruit” labels, so it seemed like a good place to start contributing to Elasticsearch.

Continue reading “How I fixed Elasticsearch”

How I fixed Node.js

My first open-source contribution

It was the lazy Saturday after Thanksgiving, and to burn off some of the extra calories from the preceding days I decided to break into the open-source world. I’d worked with node.js at Jut, so I was acquainted with the platform, though I had no experience with the source code. So I wandered into the issue tracker for the node.js repository and settled on issue 4049, a memory leak in the ChildProcess module.
Continue reading “How I fixed Node.js”