I like creating software. At any given point in time I will have one idea that I'm working on. I'm not a big fan of just building some random stuff, but I want to solve real problems with software.
Layerstore was probably the coolest thing I built at my spare time. It took me about 6 months to build entire platform from nothing. Seeing uptrend in microservices and Docker, I figured that there should be a marketplace for selling software packaged in docker images. So I built that. Unfortunately, I decided to kill the idea after Docker released the Docker Store. The Docker Store is exactly what I envisioned, but with already existing user base and brand recognition. There's no need to try and fight the losing fight.
It is a bit disappointing that Layerstore had to be shut down, but I really enjoyed the process, it was a good learning experience and it confirmed that my intuition is worth something. One thing that I regret from all these side projects is that I haven't captured anything while I was in the process of building them. I only produced one blog post on Layerstore architecture after I shut it down, but there could have been so much more.
My boy Gary Vaynerchuk says that the easiest way to produce content is by documenting, not creating.
I have a new idea on my mind. I codenamed it Bluebook. I think that there's something broken about how we test software systems. Microservices as much as I love and hate them, are already creating new challenges around testing systems as a whole. I want to build a platform for managing and running system and integration tests that are more easy to maintain and understand than some custom continuous delivery pipeline composed of scripts. This time I will try to document how software is created and evolves over time. As with any side gigs, my time is limited to mostly weekends and I'll try to do my best to stay consistent with releasing updates every couple weeks.
Stay tuned.continue reading
When video games are written, the game engine would be written in some low level language such as C or C++ to achieve the best performance. You would need to get direct access to the hardware and such languages are perfect for that. Building software in such languages requires good understanding of OS, memory management and just internals in general. To make game development more accessible, for instance to UI or level designers, we can introduce a scripting language that can hook into the core engine and perform some actions as the game state changes. The idea of scripting your core is amazing.
We can see similar uses in other areas. NGINX web server has support for Lua scripting which allows you to hook into request processing logic. From there you can filter or log requests based on some custom rules. Varnish HTTP cache has some similar functionality which lets you decide how to cache your content. My friend at my old job scripted report generation logic in a C service, which allowed us to get arbitrary aggregations from raw data available in the memory. There are plenty of use cases for scripting the core of your technology.
In this article I want to take a look at how we can embed Lua language into Go applications.continue reading
As an engineer, I like the idea of microservices. The microservices architecture is the ultimate playground for distributed systems. Despite all the nice things that come with microservices architecture, I want to argue that startups really don't need microservices. I was lucky enough to see SOA, monoliths and microservices throughout my professional career. Today, microservices are receiving lots of buzz, and many big and small companies are jumping on this overhyped bandwagon. Microservices is a good thing, it's a great example of what the future might look like; however, if you're just starting out with your tech company, you do not need them.continue reading
If you are running Docker as part of your infrastructure you probably are also hosting a private Docker registry for storing private Docker images. Vanilla installation is pretty good, you just put the Docker Distribution in a private VPC and you are good to go. Let's imagine a scenario where you wanted to build a public registry with custom access control to the images, something similar to Docker Hub. How would you do that? Good news is that I built exactly that when I was building Layerstore and in this article I'm going to show you how you can do it yourself.
Before we go into nitty-gritty details let me give you some background on Layerstore. Layerstore was Docker marketplace where anyone could sell Docker images either as individual images or as image bundles. The entire life cycle of a sale might look something similar to this:
- Seller reserves image identifier. This identifier will be used to push and pull images from the registry.
- Seller receives read and write permissions to the reserved image identifier.
- Seller uploads the image with
docker pushcommand, configures product page and sets the price.
- Purchaser buys the product and receives read access to the image.
- Purchaser downloads image onto his servers with
We are going to explore these steps in detail in a moment. Of course I am going to skip irrelevant product parts and concentrate mostly on Docker registry and services surrounding it.continue reading
Eye catching chart showing memory utilization before and after the fix.
Sometime ago I wrote a worker that periodically polls third party service for data. We started noticing that the worker process gets killed by the kernel for reaching memory limits. The container for the worker was given 512MB and that should be more than enough for the job it was doing. The amount of data it fetches can go anywhere from 25MB to a 100MB and it uses this data to sync some internal state of our systems with the data provided by the third party. I was able to find weird memory consumption patterns and refactor the code to take memory usage from ~50% to ~13% and stop getting worker process OOM killed. This post is about the tools I used to find memory problems in a Python application.continue reading
Docker has gained lots of popularity in the recent years. Thanks to the movement towards microservices, we are able get docker infrastructure from all major web service providers like AWS or Google Cloud.
This post is more of a tutorial style post in which I'm planning walk you through how we can bootstrap fully operating docker infrastructure from scratch in AWS using Terraform. Managing infrastructure by hand is terrible. I'm not going to go into details why is that, but I believe Terraform is going to be one of those tools that will stick with us for a while, especially once it gets more mature. We're going to use Terraform to build our infrastructure.continue reading
Locks are very important in distributed systems. Sometimes we want to make sure that only one job runs at a time, but we want to have the system highly available (e.g highly available cron server). Of course there are many other uses cases for distributed locks, which I'm not going to talk about. In this post I am going to show you an example of how to implement distributed locks on DynamoDB.continue reading
I had an opportunity to work on an interesting infrastructure challenge. It goes something like this: we need to be able to persist incoming data stream which consists of approximately 200 thousand messages/second, we also need to guarantee data availability and redundancy. This is a typical scale of data I used to deal with at Chartbeat on a daily basis. When working with such high traffic you're most likely going to run into the questions to which you might not know the answers right away.
- How many servers do we need to to handle such traffic?
- Do we need to store the data and how can we do that?
- If we must store the data, for how long are we going to need access to it?
- How much the new infrastructure is going cost us?
These are just a few questions that you will have to answer in order to pick the right tools for the job. In this post I will try to provide the answers to some of these questions and also show you a sample infrastructure setup that can be used to handle large amounts of traffic while abiding our requirements.continue reading
At Chartbeat we are thinking about adding probabilistic counters to our infrastructure, HyperLogLog (HLL) in particular. One of the challenges with something like this is to make it redundant and have somewhat good performance. Since HyperLogLog is a relatively new approach to cardinality approximation there are not many off the shelf solutions, so why not try and implement HLL in Cassandra?continue reading
This is my first post on machine learning, and hopefully not the last one. The main goal of these posts is to serve as a quick reference for simple machine learning problems and their solutions, meanwhile allowing me to get a better understanding of the field itself. That said, don't take anything for granted.continue reading
In one of my fixes that I was working at work I had to implement row level
locking in Django. Current stable, 1.3, version of Django does not have
built-in capability for row level locking on InnoDB tables. The good news are
that the development version already has an update in QuerySet API that will
let you use
select_for_update method to acquire a write lock on rows matching
your query. If you can use development version for your project you may stop
reading and go upgrade Django, otherwise I will see you at the bottom of the
Hut for macOS
Design and prototype web APIs and services.