Juttle’s Cross Platform Optimization Strategy

Juttle can analyze data that is in any of several data stores. It has processors like reduce, sort, head, and tail that operate over a sequence of data points. For instance, head n emits the first n points it receives and drops the rest. reduce count() increments a counter for each data point and returns the total. The built-in implementations of these processors are written in Node.js.

These Node.js processors are functionally correct, but they are often not the most efficient way to operate on data stored in a given database. For instance, the fastest way to count the records in a SQL table is SELECT COUNT(*) FROM my_table. But to count the records in a table using the Node.js implementation of reduce count(), we’d have to SELECT * from my_table, build a Juttle data point from every record, and perform the count in Javascript. It would be much faster if Juttle knew it could use SELECT COUNT to calculate the result of reduce count(). The Juttle Optimizer is the piece in Juttle’s architecture that turns Juttle programs into efficient queries like this.

Continue reading “Juttle’s Cross Platform Optimization Strategy”

Orestes: a Time Series Database Backed by Cassandra and Elasticsearch

I used to work at a data analysis startup called Jut. Jut’s vision was to bring all your data together in a single environment. This enabled integrated analysis using our programming language, Juttle. It was challenging because there are many different types of data. Different data types require different models for optimal storage and querying. At the highest level, Jut divided all data into two kingdoms: metrics and events. Today I’ll cover the design and implementation of the metrics side, which was covered by a database named Orestes that we built.

Continue reading “Orestes: a Time Series Database Backed by Cassandra and Elasticsearch”

How does Bluebird promisify work?

High-performance code generation in Javascript

In describing the ConcurrencyMaster, I referred to Bluebird promisify as a magic function. Of course, it’s not really magical, it’s just a computer program. It only seemed magical because I didn’t understand how it worked. So this week, I’ve taken the opportunity to fill this hole in my knowledge by studying the internal workings of promisify. I like to learn about projects by fixing bugs in them, but Bluebird has no open bugs. Instead, we’ll just run through the working code and see what makes the magic happen. Here goes!

Continue reading “How does Bluebird promisify work?”

Object Hash Set

The compact hash set for Node.js objects

In today’s adventure, we’ll explore my single favorite piece of software that I’ve worked on: the Object Hash Set. The Object Hash Set is a Node.js addon, which means it’s C++ code that a Node.js program can reference. An Object Hash Set accepts objects with the add method, and its contains method tells whether a given object has already been added. The Object Hash Set considers two objects equal if all their keys and values are the same. It is designed to be as space-efficient as possible.

Continue reading “Object Hash Set”

How I fixed Node.js again

Yo dawg, I heard you like requires…

As usual, I was browsing the node.js issues page when I noticed #4467. The project owner jasnell had identified a small script that unexpectedly crashed with the message console.error is not a function. Huh? Last I checked, console.error is totally a function. I had to get to the bottom of this one. It turned out to be quite a rabbit hole, taking me through some of node.js’s most important features. Hold on to your hat!

Continue reading “How I fixed Node.js again”

How I fixed Atom

When good regexes go bad

Atom is the hot new up-and-comer in the world of text editing. It is my editor of choice for building software, and it’s open source, so I decided to check out its issues to see how I could contribute. I came across a strange bug: the Atom user speter had written a line of text that, when you pressed Enter at its end, caused Atom to calculate for half an hour before writing a new line. I was pretty stunned that such a simple and common operation could perform so atrociously, so I decided to jump in and figure out what was going on.

Continue reading “How I fixed Atom”

How I fixed libuv

A deep dive into the foundation of Node

It was another lazy, rainy winter afternoon, so once again I was looking to make the world of node.js a better place. Browsing the issues, I was struck by the bizarreness of #4291. The user anseki had written a simple script to read the next line the user types into the terminal. If you resized the window running the program while it was waiting for input, the program crashed with a segmentation fault.

Segmentation faults can happen in C when a program makes an invalid memory access. So somehow, resizing the window triggered an invalid C memory access from this Javascript program. Most untoward!

Continue reading “How I fixed libuv”

How I fixed Elasticsearch

Putting mappings in their place

After my uproarious success fixing node.js, I was ready to fix something else. I checked node.js’s open issues, but I didn’t see any that I’d be able to jump on. So I turned my attention elsewhere. Another technology I have a fair amount of experience with is Elasticsearch – I wrote the Juttle Elastic Adapter, after all.

I moseyed on over to the Elasticsearch issues list, where I found issue 15381: Config index.mapper.dynamic:false is not honored. I didn’t know what that meant, but the issue had the “adoptme” and “low-hanging fruit” labels, so it seemed like a good place to start contributing to Elasticsearch.

Continue reading “How I fixed Elasticsearch”