Pinu planeet

March 07, 2017

Four Years RemainingThe Difficulties of Self-Identification

Ever since the "Prior Confusion" post I was planning to formulate one of its paragraphs as the following abstract puzzle, but somehow it took me 8 years to write it up.

According to fictional statistical studies, the following is known about a fictional chronic disease "statistite":

  1. About 30% of people in the world have statistite.
  2. About 35% of men in the world have it.
  3. In Estonia, 20% of people have statistite.
  4. Out of people younger than 20 years, just 5% have the disease.
  5. A recent study of a random sample of visitors to the Central Hospital demonstrated that 40% of them suffer from statistite.

Mart, a 19-year Estonian male medical student is standing in the foyer of the Central Hospital, reading these facts from an information sheet and wondering: what are his current chances of having statistite? How should he model himself: should he consider himself as primarily "an average man", "a typical Estonian", "just a young person", or "an average visitor of the hospital"? Could he combine the different aspects of his personality to make better use of the available information? How? In general, what would be the best possible probability estimate, given the data?

March 02, 2017

Ingmar TammeväliMetsade kaitseks…

Pole siiani olnud vajadust kiruda ja vanduda, sellest meil tuhandeid delfiilikuid.

Aga nüüd pidin lausa tehnikakauge postituse tegema, ma pole metsanduse spetsialist, aga mida minu silmad näevad on jube.
Kuna seda jubedust näevad juba enamik inimesi, siis leidsin oleks aeg ka sõna võtta.

Hetkel alanud mingi sõda Eesti metsa vastu, sisuliselt kus ka ei sõida on läbustatud metsakrundid, kus ei kasva enam midagi.
Tekkinud mingi X firmad, mis käivad mööda kinnisturegistreid ja metsateatisi ning teevad omanikele pressingut telefoni kaudu, et müüge müüge.

Sisuliselt ilma irooniata meie ilusad metsad näevad juba välja nagu ebaõnnestunud brasiilia vahatus lamba peal…

Mul tekkinud küsimus:
* miks lubatakse suured metsamassiivid maha raiuda nii, et ei pea midagi asemele istutama.
Minu ettepanek, enne kui tohib üldse raiega alustada, siis metsaametnik ntx valla poolne teeb hindamise ja kui tehakse raie, siis panditasu on 35% summa metsa väärtusest.
Ehk emakeeli, istutad uue metsa asemele (6 kuu jooksul), saad 35% raha tagasi, ei istuta, oled rahast ilma

* metsaveotraktoritega lõhutakse ära külateed, samuti metsaveoautodega. See taastamise nõue oli vist 6-7 kuu jooksul naljanumber, enamus metsafirmasid ei tee seda ja ametnikud suht hambutud. Politsei ei viitsi nendega tegeleda, emakeeli … neil pole ressursse.

* miks langetati kuusikute vanusepiiri, mida tohib raiuda

Ehk kogu teksti sisu see, et lp. poliitikud kui teil mingitki austust Eestimaa väärtuste vastu, lõpetage see maffia stiilis metsade majandamine, see pole majandamine vaid lageraie !

OECD: Eestist intensiivsemalt raiub oma metsi vaid üks arenenud tööstusriik

February 23, 2017

Kuido tehnokajamEkraanikattega akna sulgemine ESC klahvivajutusega

Üllataval kombel ei leidnud selleks lihtsat lahendust, tuleb Javascriptiga jännata Visuaalselt näeb välja nii, et klikid kugugi ja avaneb ekraaninkattega aken. Kasutame ModalPopupExtender-it mille ees näitab UserControli sisu <asp:Label runat="server" ID="HForModal" style="display: none" /> <asp:Panel runat="server" ID="P1" ScrollBars="Auto" Wrap="true"  Width="80%" CssClass="modalPopup">

Raivo LaanemetsChrome 56 on Slackware 14.1

Chrome 56 on Slackware 14.1 requires the upgraded mozilla-nss package. Without the upgraded package you get errors on some HTTPS pages, including on itself:

Your connection is not private.

with a detailed error code below:


The error comes from a bug in the NSS package. This is explained here in more detail. Slackware maintainers have released upgrades to the package. Upgrading the package and restarting Chrome fixes the error.

February 18, 2017

Anton ArhipovJava EE meets Kotlin

Here's an idea - what if one tries implementing Java EE application with Kotlin programming language? So I though a simple example, a servlet with an injected CDI bean, would be sufficient for a start.

Start with a build script:

<script src=""></script>

And the project structure is as follows:

Here comes the servlet:

<script src=""></script>

What's cool about it?

First, it is Kotlin, and it works with the Java EE APIs - that is nice! Second, I kind of like the ability to set aliases for the imported classes: import javax.servlet.annotation.WebServlet as web, in the example.

What's ugly about it?

Safe calls everywhere. As we're working with Java APIs, we're forced to use safe calls in Kotlin code. This is ugly.

Next, in Kotlin, the field has to be initialized. So initializing the 'service' field with the null reference creates a "nullable" type. This also forces us to use either the safe call, or the !! operator later in the code. The attempt to "fix" this by using the constructor parameter instead of the field failed for me, the CDI container could not satisfy the dependency on startup.

Alternatively, we could initialize the field with the instance of HelloService. Then, the container would re-initialize the field with the real CDI proxy and the safe call would not be required.


It is probably too early to say anything for sure, as the demo application is so small. One would definitely need to write much more code to uncover the corner cases. However, some of the outcomes are quite obvious:

  • Using Kotlin in Java web application appears to be quite seamless.
  • The use of Java APIs creates the need for safe calls in Kotlin, which doesn't look very nice.

February 06, 2017

TransferWise Tech BlogWhen to Adopt the Next Cool Technology?

What should be the criteria for an organization to decide when is it a good time to update its toolbox?

Recently there has been a lot of discussion about the fatigue around JavaScript and frontend tools in general. Although it seems to be more painful on frontend the problem is not specific to frontend neither is it anything new or recent. There are two sides to this. One is the effect it has on one's personal development. Other side is how it affects organizations. More specifically how should an organization decide when is it a good time to bring in new tool/framework/language X?

When we recently discussed this topic my colleague Jordan Valdma came up with the following formula to decide when adoption makes sense:

new features + developer coolness > cost of adoption

Cost of Adoption

Introducing anything new means loss of efficiency until you have mastered it well enough. Following the model of Shu-Ha-Ri (follow-detach-fluent) it may be relatively easy to get to the first level - "following". However, it is only when moving to the next levels when one starts cashing in more of the potential value. That means looking beyond the specific feature set of the tool, searching for ways how to decouple oneself from it and employ it for something more principal. One of my favorite examples is using hexagonal architecture with Ruby on Rails.

New Features

By new features I mean the things that are actually valuable for your product. There are many aspects for any new thing that are hard to measure and are quite subjective. These should not go here. For example, "allows to write code that is more maintainable". This is very hard to prove and seems more like something that one may choose to believe or not. However, there are also things like "supports server-side rendering". If we know our product could take advantage of this then this is good objective reason for adoption.

Developer Coolness

I think when it comes to new/cool technologies it is always good to be pragmatic. In an organization that is heavily business/outcome oriented it may seem that there should be no room for non-rational arguments like how someone feels about some new language/library.

However, it is quite dangerous to completely ignore the attractiveness aspect of technology. There are two points to keep in mind. First, all good devs like to expand their skill set. Second, technologies that have certain coolness about them tend to build stronger communities around them hence have the potential of growing even more compelling features.

February 01, 2017

TransferWise Tech BlogBuilding TransferWise or the road to a product engineer

Soon it is my 5 years anniversary at TransferWise. I looked back. I wrote down what has come to my mind.

I was hired as an engineer. I thought I was hired to write the code and that is what I started doing. Simple duty. Take a task from a ticketing system, implement it and move on to the next one. Easy. One of my first tickets was following: "add a checkbox to the page with a certain functionality". Easy. I did that and then Kristo asks for a call and asks me a very simple question: "Why have you done it?". I've tried to reply something but other questions followed... You know how I felt? I felt miserable, confused and disoriented. I remember I said clearly "I feel very stupid.". Kristo replied "It is fine." Then we have had a long chat and I spent next couple weeks on that task. I talked to people trying to understand why that checkbox is needed and what does that mean after all. I designed new layout for the page. I implemented the solution. Since then I kept coding. I still believed that it was my duty and this is what I was hired for. But you guess what? Kristo kept asking questions. Slowly but steady it came to my mind that it is not the coding that I am suppose to be doing. I found my self doing a variety of activities. Talking to customers and analysing their behavior. Supporting new joiners and building the team. Designing pages. Building a vision. Many other things and of course writing the code.

At some point I understood. This stopped being easy. It has become very hard and challenging. All variety of questions were floating through my head including following. "Why at all I am hired?". "What I should be doing?". "Am I valuable?". "What is my value?". "What was my impact lately?" An example from my own life helped me to clear this out. I have a piece of land and I went to build a house. I researched the topic. I earned necessary money to fund it. I chose an architecture plan. I found workers. I organised building materials delivery. If I am to be asked about it I will clearly say: "I am building a house". I also realised. What if the workers whom I've found will be asked as well? Their reply will be exactly the same: "I am building a house". This fact amazed me. Our activities are quite different but all together we are building that house.

This analogy helped me massively. I got to a simple conclusion. I am here to build and grow TransferWise. Building TransferWise is what expected from me. Building TransferWise means variety of different activities. It may be putting bricks together to create a wall. It may be designing interior and exterior. It may be organising materials delivery. It may be talking to others who have build houses and are living in those. It may be finding and hiring builders. It might be visiting builders in a hospital when they get sick.

It also helped me to understand why am I doing it after all. With my own house it is easy because it is me who will be living there :) Apparently all the other houses in the world are constructed for someone to live there. I can’t imagine builders going for: “Let’s start building walls and then we will figure out how many floors we can get to and see if anyone will happen to live in that construction.” It will always start from consideration of people, their needs and their wishes. In case of TransferWise from thinking of customers who will be using it.

That said. I was foolish when I was evaluating myself by an engineering tasks I've finished. I was foolish to think that what I'm used to be doing is what I should be doing. Nowadays my aim is to make things happen. My aim is to figure out what needs to be done and do it. My measurement of myself is not the lines of code or a number of meetings I've had. It is not about the number of bricks I’ve placed. My goal is to have people living in the houses I’ve build. My goal is to see them living a happy life there. My goal is to see a happy TransferWise customers.

Eventually my title changed from an engineer to a product engineer and then to a product manager. I am not skilled to do my job and constantly do mistakes. But I try and keep trying. My life has become easy again. I found a better way to be an engineer.

January 22, 2017

Anton ArhipovTwitterfeed #4

Welcome to the fourth issue of my Twitterfeed. I'm still quite irregular on posting the links. But here are some interesting articles that I think are worth sharing.

News, announces and releases

Atlassian aquired Trello. OMG! I mean... happy for Trello founders. I just hope that the product would remain as good as it was.

Docker 1.13 was released. Using compose-files to deploy swarm mode services is really cool! The new monitoring and build improvements are handy. Also Docker is now AWS and Azure-ready, which is awesome!

Kotlin 1.1 beta was published with a number of interesting new features. I have mixed feelings, however. For instance, I really find type aliases an awesome feature, but the definition keyword, "typealias", feels too verbose. Just "alias" would have been much nicer.
Meanwhile, Kotlin support was announced for Spring 5. I think this is great - Kotlin suppot in the major frameworks will definitely help the adoption.

Is there anyone using Eclipse? [trollface] Buildship 2.0 for Eclipse is available, go grab it! :)

Resonating articles

RethinkDB: Why we failed. Probably the best post-mortem that I have ever read. You will notice a strange kvetch at first about the tough market and how noone wants to pay. But then reading forward the author honestly lists what was really wrong. Sad that it didn't take off, it was a great project.

The Dark Path - probably the most contradicting blog post I've read recently. Robert Martin takes his word on Swift and Kotlin. A lot of people, the proponents of strong typing, reacted to this blog post immediately. "Types are tests!", they said. However, I felt like Uncle Bob just wrote this articles to repeat his point about tests: "it doesn't matter if your programming language strongly typed or not, you should write tests". No one would disagree with this statement, I believe. However, the followup article was just strange: "I consider the static typing of Swift and Kotlin to have swung too far in the statically type-checked direction." OMG, really!? Did Robert see Scala or Haskell? Or Idris? IMO, Swift and Kotlin hit the sweet spot in regards to type system that would actually _help_ the developers without getting in the way. Quite a disappointing read, I have to say..

Java 9

JDK 9 is feature complete. Those are great news. Now, it would be nice to see how will the ecosystem survive with all the issues related to reflective access. Workarounds exist, but there should be a proper solution without such hacks. Jigsaw caused a lot of concerns here and there but the bet is that in the long run, the benefits will outweigh the inconveniences.


The JVM is not that heavy
15 tricks for every web dev
Synchronized decorators
Code review as a gateway
How to build a minimal JVM container with Docker and Alpine Linux
Lagom, the monolith killer
Reactive Streams and the weird case of backpressure
Closures don’t mean mutability.
How do I keep my git fork up to date?

Predictions for 2017

Since it is the beginning of 2017, it is trendy to make predictions for the trends of the upcoming year. Here are some prediction by the industry thought leaders:

Adam Bien’s 2017 predictions
Simon Ritter’s 2017 predictions
Ted Neward’s 2017 predictions

January 04, 2017

TransferWise Tech BlogEffective Reuse on Frontend

In my previous post I discussed cost of reuse and some strategies how to deal with it on the backend. What about frontend? In terms of reuse both are very similar to each other. When we have more than just a few teams regularly contributing to frontend we need to start thinking how we approach reuse across different contexts/teams.

Exposing some API of our microservice to other teams makes it a published interface. Once this is done we cannot change it that easily anymore. Same happens on frontend when a team decides to "publish" some frontend component to be reused by other teams. The API (as well as the look) of this component becomes part of the contract exposed to the outside world.

Hence I believe that:

We should split web frontend into smaller pieces — microapps — much the same way as we split backend into microservices. Development and deployment of these microapps should be as independent of each other as possible.

This aligns quite well with the ideas of Martin Fowler, James Lewis and Udi Dahan who suggest that "microservice" is not a backed only concept. Instead of process boundaries it should be defined by business capabilities and include its own UI if necessary.

Similarly to microservices we want to promote reuse within each microapp while we want to be careful with reuse across different microapps/teams.

January 02, 2017

Raivo LaanemetsNow, 2017-01, summary of 2016 and plans for 2017

This is an update on things related to this blog and my work.

Last month


  • Added an UX improvement: external links have target="_blank" to make them open in a new tab. The justification can be found in this article. It is implemented using a small piece of script in the footer.
  • Updated the list of projects to include work done in 2016.
  • Updated the visual style for better readability. The article page puts more focus on the content and less on the related things.
  • Updated the CV.
  • Found and fixed some non-valid HTML markup on some pages.
  • Wrote announcements to the last of my Open Source projects: DOM-EEE and Dataline.

I also discovered that mail notifications were not working. The configuration was broken for some time and I had disabled alerts on the blog engine standard error stream. I have fixed the mail configuration and monitor the error log for mail sending errors.


I built an Electron-based desktop app. I usually do not build desktop applications and consider them a huge pain to build and maintain. This was a small project taking 2 weeks and I also used it as a chance to evaluate the Vue.js framework. Vue.js works very well with Electron and was very easy to pick up thanks to the similarities with the KnockoutJS library. I plan to write about the both in separate articles.

The second part of my work included a DXF file exporter. DXF is a vector drawing format used by AutoCAD and industrial machines. My job was to convert and combine SVG paths from an online CAD editor into a single DXF file for a laser cutter.

During filing my annual report I was positively surprised that I need to file very little paperwork. It only required a balance sheet + a profit/loss statement + 3 small additional trivial reports. On the previous years I had to file a much more comprehensive report now required from mid-size (Estonian scale) companies with about 250 employees.


I have made some changes to my setup:

  • Logging and monitoring was moved to an OVH VPS.
  • Everything else important is moved away from the home server. Some client systems are still waiting to be moved.

The changes were necessary as I might travel a bit in 2017 and it won't be possible to fix my own server at home when an hardware failure occurs. I admit it was one of the stupidest decisions to run my own server hardware.

Besides these changes:

  • now redirects to I am not maintaining a separate company homepage anymore. This gives me more free time for the important things.
  • Rolled out SSL to my every site/app where I enter passwords. All the new certs are from Lets Encrypt and are renewed automatically.
  • I am now monitoring my top priority web servers through UptimeRobot.
  • The blog frontend is monitored by Sentry.

Other things

The apartment buildings full-scale renovations were finally accepted by the other owners and the contract has been signed with the building company. The constructions start ASAP. I have been looking for possible places to rent a quiet office space as the construction noise likely makes work in the home office impossible.

Yearly summary and plans

2016 was incredibly busy and frustrating year for me. A project at the beginning of the year was left partially unpaid after it turned out to be financially unsuccessful for the client. The project did not have a solid contract and a legal action against the client would have been very difficult. This put me into a tight situation where I took more work than I could handle to compensate my financial situation. As the work accumulated:

  • I was not able to keep up with some projects. Deadlines slipped.
  • I was not able to accept better and more paying work due to the existing work.
  • Increasing workload caused health issues: arm pains, insomnia.

In the end of the year I had to drop some projects as there was no other ways to decrease the work load. Last 2 weeks were finally pretty OK.

In 2017 I want to avoid such situations. Financially I'm already in a much better position. I will be requiring a bit stricter contracts from my clients and select projects more carefully.

Considering technology, I do not see year 2017 bring many changes. My preferred development platforms are still JavaScript (browsers, Node.js, Electron, PhantomJS) and SWI-Prolog.

December 28, 2016

Anton ArhipovTwitterfeed #3

Welcome to the third issue of my Twitterfeed. Over two weeks since the last post I've accumulated a good share of links to the news and blog posts, so it is a good time "flush the buffer".

Let's start with something more fundamental than just the news about frameworks and programming languages. "A tale of four memory caches" is a nice explanation of how browser caching works. Awesome read, nice visuals, useful takeaways. Go read it!

Machine Learning seems is becoming more and more popular. So here's a nicely structured knowledge-base at your convenience: "Top-down learning path: Machine Learning for Software Engineers".

Next, let's see what's new about all the reactive buzz. The trend is highly popular so I've collected a few links to the blog posts about RxJava and related.

First, "RxJava for easy concurrency and backpressure" is my own writeup about the beauty of the RxJava for a complex problem like backpressure combined with concurrent task scheduling.

Dávid Karnok published benchmark results for the different reactive libraries.

"Refactoring to Reactive - Anatomy of a JDBC migration" explains how reactive approach can be introduced incrementally into the legacy applications.

The reactive approach is also suitable for the Internet of Things area. So here's the article about Vert.x being used for IoT world.

IoT is actually not only about the devices but also about the cloud. Arun Gupta published a nice write up about using the AWS IoT Button with AWS Lambda and Couchbase. Looks pretty cool!

Now onto the news related to my favourite programming tool, IntelliJ IDEA!

IntelliJ IDEA 2017.1 EAP has started! Nice, but I'm not amused. Who needs those emojis anyway?! I hope IDEA developers will find something more useful in the bug tracker to fix and improve.

Andrey Cheptsov experiments with code folding in IntelliJ IDEA. The Advanced Expressions Folding plugin is available for download - give it a try!

Claus Ibsen announced that the work has started on Apache Camel IntelliJ plugin.

Since we are at the news about IntelliJ IDEA, I think it makes sense to see what's up with Kotlin as well. Kotlin 1.0.6 has been released, which is the new bugfix and tooling update. Seems like Kotlin is getting more popularity and people try to use it in conjunction with popular frameworks like Spring Boot and Vaadin.

Looks like too many links already so I'll stop here. I should start posting those more often :)

December 22, 2016

Raivo LaanemetsAnnouncement: Dataline chart library

Some time ago I built a small library to draw some line charts using the HTML5 canvas. I have been using it in some projects requiring simple responsive line charts. It can do this:

  • Draws min/max/zero line.
  • Draws min/max labels.
  • Single line.
  • Width-responsive.

Multiple lines, ticks, x-axis labels etc. are not support. There are other libraries that support all of these. It has no dependencies but requires ES5, canvas and requestAnimationFrame support. The library is extremely lightweight and uses very few resources.


This is the HTML code containing the canvas and input data. The data is embedded directly by using the data-values attribute:

<canvas class="chart" id="chart"
<script src="dataline.js"></script>

And the CSS code to set the chart size:

.chart { width: 100%; height: 200px; }

Live rendering output:

<canvas class="chart" data-values="1,2,3,-1,-3,0,1,2" id="chart" style="width: 100%; height: 200px;"></canvas> <script src=""></script> <script>Dataline.draw('chart');</script>

The source code of the library, documentation, and the installation instructions can be found in the project repository.

December 21, 2016

Kuido tehnokajamMärkeruudu ärakaotamine CheckBoxListi grupeerimisel

ASP.NET CheckBoxList komponendil mõnikord vaja nimekirja grupeerida erinevatel põhjustel Kogu trikk põhineb CSS3 kasutamisel, kuna see võib tekitada segadusi projekti piirides, siis mõtekas kasutada "inline CSS", näiteks vajaliku userControli sees <style>     #CheckBoxListOtsinguMajad input:disabled {         display: none;     } </style> reegli mõju piirame CheckBoxListOtsinguMajad

December 17, 2016

Raivo LaanemetsAnnouncement: DOM-EEE

DOM-EEE is a library to extract structured JSON data from DOM trees. The EEE part in the name means Extraction Expression Evaluator. The library takes a specification in the form of a JSON document containing CSS selectors and extracts data from the page DOM tree. The output is also a JSON document.

I started developing the library while dealing with many web scraping projects. There have been huge differences in navigation logics, page fetch strategies, automatic proxying, and runtimes (Node.js, PhantomJS, browser userscripts) but the data extraction code has been similar. I tried to cover these similarities in this library while making it working in the following environments:

  • Browsers (including userscripts)
  • PhantomJS
  • Cheerio (Node.js)
  • jsdom (Node.js)
  • ES5 and ES6 runtimes

The library is a single file that is easy to inject into any of these environments. As the extraction expressions are kept in the JSON format, and the output is a JSON document, any programming platform supporting JSON and HTTP can be coupled to PhantomJS, an headless web browser with a built-in server to drive the scraping process.

Example usage

This example uses cheerio, a jQuery implementation for Node.js:

var cheerio = require('cheerio');
var eee = require('eee');
var html = '<ul><li>item1</li><li>item2 <span>with span</span></li></ul>';
var $ = cheerio.load(html);
var result = eee($.root(),
        items: {
            selector: 'li',
            type: 'collection',
            extract: { text: { selector: ':self' } },
            filter: { exists: 'span' }
    { env: 'cheerio', cheerio: $ });

This code will print:

{ items: [ { text: 'item2 with span' } ] }


There is a number of similar projects. Most of them assume a specific runtime environment or try to do too much to be portable. Some examples:

  • artoo.js (client side).
  • noodle (Node.js, not portable enough).
  • x-ray (not portable, coupled with HTTP and pagination and 100 other things).


Full documentation of the JSON-based expression language and further examples can be found in the project's code repository.

December 09, 2016

Raivo LaanemetsHello world from DXF

Last week I worked on a code to convert SVG to DXF. SVG (Scalable Vector Graphics) is a vector graphics format supported by most browsers. DXF (Drawing Exchange Format) is another vector format, mostly used by CAD applications. Our CAD editor at Scale Laser, a startup focusing on model railroad builders, uses a SVG-based drawing editor in the browser but the software controlling the actual cutting hardware uses DXF. There were no usable generic SVG to DXF converters available and we had to write our own. We only deal with SVG <path> elements and do not have to support other SVG elements.

DXF is fairly well specified through a 270-line PDF file here. The low-level data serialization format feels ancient compared to more structured XML and JSON. Also, it is quite hard to put together a minimal DXF file which can be opened by the most programs claiming DXF compatibility or that can be opened with AutoCAD itself. AutoCAD is the original program to use the DXF format.

I have put together a minimal file by trial-and-error. I kept adding stuff until I got the file loading in AutoCAD. The file follows and I explain the parts of it.


A DXF file consists of sections. The most important section is ENTITIES that contains graphical objects. Another important section is HEADER:


All sections are made up using group code-value pairs. A such pair is formatted like:


The group code specifies either the type of the value (string, float, etc) or its semantic meaning (X coordinate) or both the type of the value and the meaning. A section begins with the SECTION keyword and section's name. A section ends with the ENDSEC keyword.

I found it was necessary to specify the file/AutoCAD version in the header. Without it, some tools, including AutoCAD, would give errors upon opening the file. This is accomplished by two code-value pairs:


This corresponds to the versions R11 and R12.


After the header comes the actual content section ENTITIES. It contains a rectangle made up of 4 lines (snippet truncated to show a single line only):


Graphical objects are specified one after another, without any further structure. A line starts with the LINE keyword and ends with the start of another object or with the section end. The line object here has the following properties.

The layer index (group code 8, value 0). I was not able to make the file display on most viewers without it:


The line color (group code 62, value 8 - gray). Nothing was visible in some viewers without setting it:


After that come the start and end coordinates of the line (X1, Y1, X2, Y2 as 10, 20, 11, 21 respectively):


DXF coordinates have no units such as pixel, mm etc. Interpretetion of units seems to be implicit and application-specific. For example, our laser software assumes mm as the unit.

Rendering output

This is the rendering output in de-caff, a simple Java-based DXF viewer:

Minimal DXF rectangle in de-caff

This is the rendering output in AutoCAD 2017:

Minimal DXF rectangle in AutoCAD

The full file containing the rectangle and the header section can be downloaded from here.

December 06, 2016

TransferWise Tech BlogProduct Engineering Principles @ TransferWise

Product Engineering Principles @ TransferWise

At TransferWise, we have seen phenomenal growth in the last few years - growth in users, transaction volumes, and team. Our engineering team has grown from 50 to 120 in the last 18 months. With this kind of growth, it’s easy for the company culture to evolve and degrade very quickly unless we reiterate and be mindful of our key principles and values we operate on.

I am often asked by engineers at conferences, potential hires, startup founders and others in the industry what are the key principles we organize around while building the TransferWise product. I thought I'd pen down some thoughts on this.

Before we hit the main principles we follow, here’s a quick primer on how our teams are organized. As of today, TransferWise has about 25 teams - each of which is autonomous and independent, that focus on customer-centric KPIs which eventually drive our growth. A mouthful but let’s break this down.

Teams are autonomous: Teams own their decisions. They decide how to evolve the product by talking to customers and by looking at data. The teams seek input and are challenged by others in the company on their decisions but eventually they are the final decision makers.

Teams are independent: Teams own their destiny. We try to keep cross team dependencies to a minimum. While there are some cases where a team may need help from another team to get something done, we try and avoid this as much as possible.

Teams focus on customer-centric KPIs: Teams solve customer problems. They are organized around a specific customer (team organized around a specific region, say the US) or a specific problem all customers face (making payments instant across the world). Given teams are autonomous and independent, they can pick what they want to work on but everything they work on has to drive a customer metric. Our product moves money around the world. Our customers tell us constantly that they care about their money moving super fast, conveniently, for cheap. So everything a team does looks to optimize these factors. The team should be able to stand up in front of the entire company and explain how their work impacts the customer and which metric it moves.

Now that we’ve got the team setup out of the way, let’s talk about how Product Engineering works at TransferWise. Here are the key principles we follow.

Hire smart people and empower them

Our product is a function of the people who build it. That means how we hire and who we hire has a massive impact on what ends up becoming our live product. For product engineering, our hiring process includes the following steps:

  • Take home test
  • Technical interview with 2 engineers
  • Optional follow-up technical interview
  • Product interview with a product manager and an engineer
  • Final interview with our VP of engineering or co-founder

While this interview loop may seem familiar, most candidates comment about the product interview being a unique experience. In the product interview, we focus on your ability to put yourself in the shoes of a customer, understand what customer problem you are solving, how you would build something to get validation on an idea and then iterate on it to deliver a stellar customer experience. Read Mihkel’s post on demystifying product interviews to get more details on what we cover and why.

Once hired, product engineers are empowered to move mountains. Engineers chose which problem to solve, why, what the customer impact will be and the prioritization of their tasks. Of course, this should be in line with team goals and not solely based on individual goals.

Weak code ownership

As mentioned above, we believe in teams being independent. A big part of this is that teams don’t have dependencies on other teams. But how does this work in a large organization with an ever evolving and growing product?

Let’s take an example. As our product expands across the world, every country has different rules on what data we are required to verify on our customers. Let’s say as we launch in Australia. There is a new regulatory requirement to check some additional data on Australian customers. This requires Team Australia - an autonomous and independent team focused on the Australian customers - to make a change to our verification engine. But the verification engine is owned by the Verification team. In a lot of organizations, Team Australia would request the Verification team to pick up this change on their roadmap. But the Verification team also has a lot of such requests from other teams. They also have their own projects to improve the core verification engine to support all our different regions. So what usually ends up happening in other organizations is Team Australia can’t move as fast as they desire as they are dependent on the Verification team and their priorities.

This is why we follow the weak code ownership principle. In this setup, every part of the code is owned by a team but a team is allowed to change any part of other team's code. Sounds like chaos but there are some basic enforcement rules around this. The owning team sets the rules that other teams have to follow to play in their codebase.

In the above example, instead of the Verification team making the requested change, Team Australia is empowered to make the change in the verification codebase. But they have to follow the rules set by the Verification team to commit to their code base. These rules are up to the owning team to decide on. They could be something like below:

  • Before taking on any major change, the team making the change must go through a design discussion on said changes with the owning team.
  • The team making the change has to follow certain design patterns
  • No code will be accepted without adequate test coverage
  • All changes have to go through peer-reviewed pull requests

This setup allows product engineering teams to be independent and helps teams remove dependencies on other teams and allows teams to iterate at the pace they desire.

We compare this setup to an internal open source project. Owners define the rules to play with and own the codebase and others can commit and make changes as long as they follow the rules. As an additional benefit of this setup, owning teams are incentivized to make their code more modular and add relevant tests so that another team cannot easily break things. This leads to code readability and higher quality.

Product engineers focus on customer impact

In a lot of companies engineers never talk to real customers. Some engineers we talk to during our interview process don’t really know who they are building the product for.

Information flow in a lot of companies from customer to engineer:
Product Engineering Principles @ TransferWise

Information is lost with every person introduced along the way.

At TransferWise, being a product engineer means you get close to customers. You talk directly to customers, look at data to make decisions and understand how the features you build are evolving and being used by different customers in production. We use Looker and Mixpanel as our analytics engine and this is available to everyone in the company. Anyone can run queries and slice and dice the data the way they desire.

Product engineers also take customer calls, chats and respond directly to customer emails. Here’s an example of a challenge our co-founder Kristo set out to inspire engineers to take more calls and get closer to our customers.

Product Engineering Principles @ TransferWise

The resulting picture speaks for itself. :-)

Product Engineering Principles @ TransferWise

No one else can build your product but you

Given how involved engineers are in analyzing the data, talking to customers, understanding the reason to make a change, and how fast our iteration cycles are, we believe that we cannot just write down our specifications and have someone outside our company build the product. We don’t do offshoring, outsourcing or use consultants to build our product. This doesn’t mean we don’t have independent workers (i.e. non-salaried employees who work at TransferWise engineering). We do. Some of them have been with us for a long time and are critical contributors. But they are embedded within our teams and operate the same way any other employee does. They get close to our customers, take part in all decisions.

Some rules, more common-sense

We have a few rules that are standard across the entire product engineering organization. We believe teams should be able to pick the tools to get the job done within certain limits (more below on limits). All our teams run sprints but it’s up to them to define their sprint cadence. It has just happened that most teams run a one week sprint but now we are seeing some teams looking to move to a two-week sprint as their projects get more complex. Similarly, some teams follow scrum to the book, while some do kanban and others run their own variation on scrum.

That said, we have a few common sense rules:

  • Product engineers to own their code in production. This means managing your own releases, monitoring your code in production, getting alerts when something goes wrong and being present if your system is having an issue. We believe this incentivizes the right behavior. When you know you will be alerted at 2AM when something goes wrong in production, the quality of code that gets shipped tends to be better.
  • We have weekly technical exchange sessions called “TeX”. It’s a forum where product engineers share knowledge on various technical topics. These can range from design discussions, changes made to a specific part of our system, new technologies we should be investigating.
  • We are a JVM shop. We are open to other languages. We have some PHP, Node running around but our main stack has always been a JVM with our monolith application written in Groovy on Grails and our microservices written in Java 8 on Spring Boot. We believe language wars are good conversations over beers but try to avoid them at work and get on with building product.
  • If you want to introduce a new language or that shiny new technology to our system, it’s simple! Do a TeX and invite your fellow engineers. Explain to them the specific benefits of introducing this technology. Do an honest pro and con analysis and explain why it’s worth the rest of the engineers to go through the learning curve to pick this technology up. This is crucial! As we have weak code ownership people need to be able to make changes to parts of the system they don’t own. So new technologies introduced not only impact the specific team owning the service but also impact other engineering teams.

Honest blameless postmortems

This one is probably our favorite principle. Everyone makes mistakes and when you move fast, things break. The key is how we recover from these mistakes, what we learn and how we prevent them in the future.

In most companies, an individual isn’t really incentivized to ship fast and learn with the fear of breaking things. One is rarely rewarded for taking massive risks to improve something tenfold as the risk of breaking something is much higher. People tend to get dinged on their annual reviews when they break something leading to a production outage. So what ends up happening is people triple check their work and put in more and more safeguards for that one possible failure that can happen.

We want to build a culture where we aren’t afraid to fail, but are accountable for our failures and make the cost of a failure smaller and smaller.

One major tool we use to reflect, learn and be accountable for our failures is public honest blameless postmortems.

Vincent wrote a great post on what makes a good postmortem. The intent of the post mortem is to go deep into why something happened, how many customers were impacted, what were our learnings and what measures did we put in place to make sure this doesn’t happen again. People challenge postmortems publicly if the postmortem isn’t deep enough or doesn’t have real learnings.
Culturally this is one of the most empowering and powerful tools we have in the company. We started this in product engineering but this has evolved where we do public postmortems across the company on most teams.


Like any model of organizing, this model has challenges too. Below are a few challenges we have learned along the way:

  • Duplication of effort: With autonomous independent teams, we can have some overlap in work done by different teams. We try and counter this by having a few people in the organization who spend time across teams and have a view on what different teams are building. This would include engineering leads who spend time with different teams and get an understanding of successes and challenges each team has. So when a team starts building a new service with similarities to another service being worked on by another team, we try to consolidate effort and get both teams on the same page to hopefully not duplicate effort.
  • Collective decision making: Sometimes it’s just hard to get the whole team to align on a decision taking varied opinions into consideration. We counter this some of this by running small teams so there are fewer people who need to get on the same page. Also when teams get stuck they seek out help from others in the organization who have been in a similar situation before or could help them break a gridlock.
  • Singularity in vision: Given we have devolved decision making to teams, there’s no one person who calls all the shots. We have a company mission but teams can decide their own way to achieve the mission. This can be especially unnerving to some folks given they can't just go over to one person and ask for direction or say "I am doing this as the CEO wants it."
  • Communication: With teams being independent and working on their specific problems, we tend to run the spectrum of teams that over communicate to make sure others know what they are working on and to those who under communicate. TransferWise runs primarily on Slack. We have specific channels for sharing things cross team. We also have horizontal virtual teams called guilds where engineers get together to work on a problem that cuts across the organization. For example, we have a performance guild which has representatives from different teams. This is a group of individuals who are interested in building tooling, standards, and practices to help all our teams improve the performance of our systems. They focus on building the required monitoring, alerting for everyone to use. That said, we are still learning how to improve communication across teams as our organization grows.

Why do we operate this way?

As a start up we have a major weapon - speed! When people closest to the customers are making the decisions, we can run more tests and iterate quicker as compared to a setup where teams rely on managers to make decisions. In the latter, managers become bottlenecks slowing down decision making. Additionally, they usually aren’t as close to the day to day operations and the customer feedback loop to make an informed decision. In our setup, we believe we get more fail and pass signals faster.

We fundamentally believe companies that iterate faster, fail faster and learn faster will succeed in the long run. That means to learn faster than others we need to optimize for speed with checks for building a high-quality customer experience that our customers love. This is the main reason for our setup.

We realize that running this way has its drawbacks as listed above but we believe we can take these drawbacks and solve for them while we optimize for speed.

This is, of course, something that has worked for us so far and we will have more learnings as our product and company evolves. We will share those learnings as we go along. We would love to hear your thoughts on what you optimize for, how you do it, and scenarios where you think the above setup doesn’t work.

Thanks to Harald, Jordan, Martin Magar, Taras, Vincent for their input.

December 05, 2016

Anton ArhipovTwitterfeed #2

So this is the second issue of my Twitterfeed, the news that I noticed in Twitter. Much more sophisticated compared to the first post, but still no structure and no definite periodicity.


Java Annotated Monthly - December 2016. Nice collection of articles about Java 9, Java 8, libraries and frameworks, etc. With this, my Twitterfeed is now officially meta! 😃

RebelLabs published Java Generics cheat sheet. Print it out and put at the wall in your office!

Server side rendering with Spring and React. Interesting approach to UI rendering with React. Some parts of the UI are rendered at the server side, and some data is then rendered at the client side.

One year as a Developer Advocate. Vlad Mihalcea reflects on his achievements from the first year in the role of a Developer Advocate for Hibernate. Well done!

IDEA 2016.2 Icon Pack. IDEA 2016.3 update came with the new icons and some people don’t really like those. There is now a plugin to replace the new icons with the old icons. Enjoy!

Oh, and talking about IntelliJ IDEA, there is another great blog post related to 2016.3 release. Alasdair Nottingham writes about Liberty loos applications support in IDEA: Faster application development with IntelliJ IDEA 2016.3

Reactive programming vs Reactive systems. Jonas Boner and Viktor Klang make it clear, what is the difference between the two. "Messages have a clear (single) destination, while events are facts for others to observe".

Good Programmers Write Bug-Free Code, Don’t They? Yegor Bugayenko has a good point about the relation of good programming to a bug-free code.

Cyclops Java by Example: N-Queens. A coding kata for N-Queens problem using "cyclop's for-comprehensions".

Zero downtime deployment with the database. The name says it all.

RxJava 2.0 interview with David Karnok about the major release. Here comes support for Reactive Streams specification!

Reactor by Example. Reactor is very similar to RxJava, but it is also in the core of Spring Framework’s 5.0 reactive programming model.

An explanation of the different types of performance testing. I think this is quite important to make the difference.


Spec-ulation by Rich Hickey. As usual, must watch!

Microservices evolution: how to break your monolithic database. Microservices are becoming mainstream, it seems. So we need best practices for building microservices based systems.

November 29, 2016

TransferWise Tech BlogWhy Over-Reusing is Bad

One of the Holy Grails of software development has always been reuse. In the following post I want to focus on reuse of application logic in its different forms. I will not cover reuse on more abstract levels like ideas or patterns. This will be a two part series where I explore this topic from both backend and frontend development perspective.

I believe in the following idea:

It does not matter how fast you can build the first version of your product. It only matters how fast you can change it later.

On the backend the predominant paradigm right now is microservices. I believe that one of the reasons why this approach is so successful is that it gets reuse quite right. Microservices are really good blocks for reuse as they align very well with the idea of splitting big system into multiple smaller bounded contexts. As per the concept of bounded context reusing any parts of the internal logic between different microservices is considered an anti-pattern.

Changing Shared Code is Inherently Hard

But what is so bad about sharing code? Isn't it something good that we should always strive for? Before answering lets take a step back. Why do we want to reuse stuff? Because we want to be able to change fast. Now here comes the paradox — by reusing code that has been already written we are able to save some coding time but everything that is shared inherently becomes itself harder to change. This is so because once our reusable thingy is out there we need to keep all the consumers in mind when changing it. As the number of consumers grows the harder it becomes to juggle between different requirements, nuances of each context. Essentially the risk of reusing the wrong abstraction grows over time. It is just so easy to introduce these tiny additional parameters that enable reusing maybe not all but perhaps something like 80% of the original logic or 60% or 40%.

The Knowledge Cap

Knowledge cap is another thing to keep in mind. As software development is about building knowledge then any piece that is built by someone else means we will have a potential cap in our team's knowledge. This happens even when this someone else is another team in the same organisation. Often this loss of knowledge is quite OK and totally fine - we don't want every team to re-implement their versions of AND/OR gates. However, ideally all the assets that are at the core of what the team is doing should be developed by the team itself.

Frequency of Change

In general we can say that reuse makes more sense for more peripheral/infrastructure things like accessing database or doing http calls. However, if some part of our infrastructure code needs to be changed very frequently then it might still make sense to roll out our own technical solution. Ideally high frequency of change means that it is somehow tied to the unique value proposition of our product and extra implementation effort makes sense anyway. So frequency of change should be at least as important (if not more) factor in deciding whether to reuse vs build ourselves.

Clear Boundaries

In case we need to reuse something then the best thing we can do is to make the boundaries of our code and the reused code as clear as possible. This is the reason why microservices offer superior form of reuse compared to components running in the same process. It requires much more discipline to keep the connection points few and explicit when something resides in the same process as opposed to something that lives on the other side of the network.

So reuse by itself is not bad. However, reuse on the wrong level of granularity or forcefully trying to reuse everything that looks similar at first sight can be much more harmful than duplication. Reuse has a cost as well.

November 28, 2016

TransferWise Tech BlogThe TransferWise Stack - heartbeat of our little revolution

The TransferWise Stack - heartbeat of our little revolution

As any tech startup that's passed its five-year mark, TransferWise has come quite a way from the first lines of code that powered it. Our product is a living organism, with changing needs. What was right yesterday isn't necessarily so today. Nor might our current technology choices withstand the test of time. We hope they do - but maybe they don't. And it's okay.

At conferences and meetups people often walk up to our engineers to ask what languages, tools and technologies we're using. Fairly so, as we haven't done a stellar job of telling our tech-savvier customers and fellow developers much about that. Hopefully we can rectify that a bit by taking time now to reflect in writing.

We'd love to hear back from you about the decisions you would have made differently.

Brief history

Once upon a time, in 2010, there was a founder who wanted to solve a problem. He knew a bit of Java and wanted to get moving quick. At that time, Groovy and Grails seemed to have brought some of the Ruby on Rails flare to the JVM world. Boom, here's a quick way to bootstrap! By end of 2013, about a dozen engineers were working on the codebase.

In early 2014, the existing Grails-powered system wasn't cutting it anymore for some workloads. It had been quick and easy to deliver new features but the team had made some system design shortcuts on the way. The time had come to extract some critical batch processing into a separate component. We've been following the path of moving code out ever since.

By late 2016, TransferWise has about 120 engineers working in two dozen teams. Measured by lines of code, more than half our business logic lives in separate services. We're looking at ways to scale the system across data centers and coping with the next 80 engineers joining during 2017. Let's get into the details of how we intend to enable these moves.

Microservices - Spring Boot and Spring Cloud

Few contenders got to the starting line when picking between the possible groundworks for our service stack. We were sure to stay on the JVM. We wanted something that would promise good support for future "cloud" concerns. These included service discovery, centralized config and transaction tracing.

Many people in the team trust the quality of thinking going into the Spring ecosystem, so Spring Boot quickly gained popularity. It provides a good, extensible platform for building and integrating services. We like its annotation-based autowiring of dependencies, YAML configuration files and reasonable conventions. We use a custom Spring Initializr to bootstrap new services. That helps us to make sure all the needed bootstrap config is in place and nobody gets a headache trying to manually bring in the right dependencies. It's all running on Java 8.

Spring Cloud adds integration for many useful tools in a service-oriented environment. Some are Spring projects, like the Config Server. Some leverage battle tested open source components like Netflix Eureka, Zuul and Hystrix.

We are actively adopting Config Server and Eureka, and use Hystrix and Zuul in appropriate places.

Grails and Groovy

Grails and Groovy currently power all our business logic that's not extracted into microservices. That makes up a bit under half of our lines of code. We've taken a clear direction to deprecate this codebase by end of next year.

When Groovy came along, it brought with itself a chance to write nice, succinct code and leverage the whole Java ecosystem. It continues to be a good language for DSL-s, for instance. Groovy used to have more benefits over Java, like functional constructs, easy collection processing and other boilerplace-reducing tidbits. Java gives us better compile-time checking and less runtime overhead. Hence we've gone with Java in our micro-services.

Neither is Grails to blame for anything. It allowed the early team to quickly iterate on a product in its infancy. The convenience features of Grails served against it over the years. Grails hides complexity away from the developer. It makes it easier to shoot oneself in the foot when trying to deliver value. By taking a decision to focus on scalability of the codebase sooner, we would have been able to postpone migration by another year or so. Yet, in our opinion, Grails makes sense as a platform for a single moderately-sized web app - rather than for dozens of microservices. This made the transition inevitable in any case.

It's worth noting that latest Grails version, 3.x is, also, built on top of Spring Boot. As we're quite happy with plain Spring Boot, we are not currently considering it.

Persistence layer - MySQL, PostgreSQL

We're a financial service company. This instructs us to always prioritise consistency over availability. BASE is fun and eventual consistency sounds like a cool topic to wrap one's head around. We want our persistence to meet the ACID criteria - Atomic, Consistent, Isolated and Durable.

TransferWise started with MySQL. The widespread adoption, ease of setup and loads of people with some basic experience made it a good choice. With growth, more questions have come up about our DB engine, like:

  • does it support our analytical workloads?
  • is it easy to build High Availability?

MySQL still holds most of our data in production. Yet, migrating our analytical datastore to PostgreSQL is already underway as our tests show it to be a better fit for our models and tooling. Our financial crime team relies on many nifty PostgreSQL features. We foresee Postgres to also be the platform of choice for our future service datastores. Mature availability and resilience features it offers out of the box drives this. We like Postgres.

It's likely that we'll be adopting NoSQL for some use cases down the road, where referential integrity doesn't add value.

Messaging & Kafka

A big part of our business logic swirls around the state changes of a transfer. There's many different things that need to happen around the changes - like fraud checks and analytics updates. Earlier, these were all done synchronously together with the state change.

As the business has grown in complexity, that obvious and simple approach doesn't scale so well. A good rule of thumb in designing data flows is that we can make every signal that doesn't need an immediate response asynchronous. If you can take a document to another department and not wait around for them to process it, you can process it asynchronously in the code as well. We want to unbundle the reactions to an event from the transactional changes of an event. That makes it easier to scale the different parts and isolate potential problems.

In the first iteration of our payout handling, we experimented with Artemis as a messaging platform. We didn't become confident about running Artemis in a HA setup in production, and now most of our messaging has moved to Apache Kafka.

The front end

That's all fine, but Kafka doesn't let you create smooth user experiences!

In 2014 TransferWise web UIs still used good old jQuery. We were not happy with the testability of that codebase. We also knew we'd need to modularize in a while due to team growth. In September that year, we launched our revamped transfer setup page built using AngularJS. We've now adopted Angular for almost all parts of our website.

We use Bower and NPM for dependency management. Grunt automates the builds. Jasmine and Protractor power the tests and Karma + PhantomJS run the tests. There's also webpack and Babel doing their part. Some parts of the code already use TypeScript. Whew.

Front-end, of course, isn't only code. Practices matter more than using the latest-and-greatest tools.

The teams in TransferWise are quite independent in how they evolve the product in their areas of ownership. This has, at times, meant trouble governing the visual identity and making sure things look neat, clean and consistent across the site.

To help with that we've taken a battle tested approach of implementing our style guide on top of Bootstrap. And, for the record, we think that LESS is more.

We believe in TDD, which is particularly important in places with less compile-time safety - like Javascript. Ürgo has blogged before about some of the thinking we apply. There's more to share. Stay tuned for future posts covering different aspects of our experience in detail.

Running in production

So, how do we keep it together? How do we make sure our systems are serving our customers well?

We want to keep our feedback loops short and tight, to keep signal-to-noise ratio high. It means that people building the systems must also operate them. If you own a part of the codebase, you also own the service it provides. This means being responsible for functional and non-functional characteristics of a feature. Ownership applies from feature creation to deprecation. To enable that, we need a shared toolkit for troubleshooting, monitoring and alerting.

Infrastructure monitoring and some operational metrics run on Zabbix. We are adopting Graphite/Grafana to keep an eye on business operations aspects of the system.

We use New Relic to check the HTTP endpoints. It's not cheap but works well, thanks to its instrumentation capabilities. We've defined performance and error alerts for our higher volume endpoints. Respective teams get alerted via VictorOps rotations if performance degrades. New Relic also provides some tracing features to see the time spent on different steps of request processing.

When an exception happens in production, it gets sent to Rollbar. Rollbar groups and counts the occurrences and sends alerts for spikes or new items. In general it allows us to spot glitches in the system and estimate how many customers they affect.

To analyze a specific problem or identify patterns in logs, we use the Elastic stack. The stack consists of LogStash, ElasticSearch and Kibana. They are, respectively, used to parse, index and search/visualize logs.

Slack helps us to channel alerts and comms into one place.

There's a video and slides covering some of this from Baltic DevOps 2016 where we were honored to speak.

Vincent has written a great post our way of doing post-mortems. We use them to learn from customer-affecting production issues.

Bits & pieces

Vahur wrapped up some of the other tooling that we use.

PHP - There's a few tools in the company built in PHP, some legacy, some purposefully kept in PHP.

Spark - While being a pretty buzzword-averse bunch of engineers, we do like to adopt the right tech for the right task. In our case the prime problem domain to benefit from machine learning has been our financial crime prevention subsystem. We're building up machine learning models on Apache Spark.

Ansible & Terraform - Our infrastructure is growing and humans make mistakes. That makes infrastructure a prime automation target. We're adopting Terraform for declarative infrastructure management. We use Ansible to configure the instances.


We build TransferWise for our current and future customers. They don't care much about which libraries, frameworks and components we bring in to move their money. It's our job as engineers to pick the tools and practices that allow us to deliver value quickly and sustainably.

Often we've chosen the simplest tool to getting the job done, instead of the most powerful. In other times we've gone with a framework that's not the newest kid on the block but has wide production adoption.

We're all proper geeks. We love new tech. We love to play and learn, pilot and break stuff. We do so in our hobby projects and geek-out sessions over Coke and beer. When building for our customers, we optimize for their happiness over the chance of adopting the next left-pad 0.0.3 ;) And you're right - it's as boring as building the HyperLoop of financial networks for millions of people could ever be.

TransferWise stack on StackShare

Thanks to Peep, Vincent and Ürgo for their help.

November 23, 2016

Four Years RemainingWhat is the Covariance Matrix?

Basic linear algebra, introductory statistics and some familiarity with core machine learning concepts (such as PCA and linear models) are the prerequisites of this post. Otherwise it will probably make no sense. An abridged version of this text is also posted on Quora.

Most textbooks on statistics cover covariance right in their first chapters. It is defined as a useful "measure of dependency" between two random variables:

    \[\mathrm{cov}(X,Y) = E[(X - E[X])(Y - E[Y])].\]

The textbook would usually provide some intuition on why it is defined as it is, prove a couple of properties, such as bilinearity, define the covariance matrix for multiple variables as {\bf\Sigma}_{i,j} = \mathrm{cov}(X_i, X_j), and stop there. Later on the covariance matrix would pop up here and there in seeminly random ways. In one place you would have to take its inverse, in another - compute the eigenvectors, or multiply a vector by it, or do something else for no apparent reason apart from "that's the solution we came up with by solving an optimization task".

In reality, though, there are some very good and quite intuitive reasons for why the covariance matrix appears in various techniques in one or another way. This post aims to show that, illustrating some curious corners of linear algebra in the process.

Meet the normal distribution

The best way to truly understand the covariance matrix is to forget the textbook definitions completely and depart from a different point instead. Namely, from the the definition of the multivariate Gaussian distribution:

We say that the vector \bf x has a normal (or Gaussian) distribution with mean \bf \mu and covariance \bf \Sigma if:

    \[\Pr({\bf x}) =|2\pi{\bf\Sigma}|^{-1/2} \exp\left(-\frac{1}{2}({\bf x} - {\bf\mu})^T{\bf\Sigma}^{-1}({\bf x} - {\bf \mu})\right).\]

To simplify the math a bit, we will limit ourselves to the centered distribution (i.e. {\bf\mu} = {\bf 0}) and refrain from writing out the normalizing constant |2\pi{\bf\Sigma}|^{-1/2}. Now, the definition of the (centered) multivariate Gaussian looks as follows:

    \[\Pr({\bf x}) \propto \exp\left(-0.5{\bf x}^T{\bf\Sigma}^{-1}{\bf x}\right).\]

Much simpler, isn't it? Finally, let us define the covariance matrix as nothing else but the parameter of the Gaussian distribution. That's it. You will see where it will lead us in a moment.

Transforming the symmetric Gaussian

Consider a symmetric Gaussian distribution, i.e. the one with {\bf \Sigma = \bf I} (the identity matrix). Let us take a sample from it, which will of course be a symmetric, round cloud of points:

We know from above that the likelihood of each point in this sample is

(1)   \[P({\bf x}) \propto \exp(-0.5 {\bf x}^T {\bf x}).\]

Now let us apply a linear transformation {\bf A} to the points, i.e. let {\bf y} ={\bf Ax}. Suppose that, for the sake of this example, {\bf A} scales the vertical axis by 0.5 and then rotates everything by 30 degrees. We will get the following new cloud of points {\bf y}:

What is the distribution of {\bf y}? Just substitute {\bf x}={\bf A}^{-1}{\bf y} into (1), to get:

(2)   \begin{align*} P({\bf y}) &\propto \exp(-0.5 ({\bf A}^{-1}{\bf y})^T({\bf A}^{-1}{\bf y}))\\ &=\exp(-0.5{\bf y}^T({\bf AA}^T)^{-1}{\bf y}) \end{align*}

This is exactly the Gaussian distribution with covariance {\bf \Sigma} = {\bf AA}^T. The logic works both ways: if we have a Gaussian distribution with covariance \bf \Sigma, we can regard it as a distribution which was obtained by transforming the symmetric Gaussian by some {\bf A}, and we are given {\bf AA}^T.

More generally, if we have any data, then, when we compute its covariance to be \bf\Sigma, we can say that if our data were Gaussian, then it could have been obtained from a symmetric cloud using some transformation \bf A, and we just estimated the matrix {\bf AA}^T, corresponding to this transformation.

Note that we do not know the actual \bf A, and it is mathematically totally fair. There can be many different transformations of the symmetric Gaussian which result in the same distribution shape. For example, if \bf A is just a rotation by some angle, the transformation does not affect the shape of the distribution at all. Correspondingly, {\bf AA}^T = {\bf I} for all rotation matrices. When we see a unit covariance matrix we really do not know, whether it is the “originally symmetric” distribution, or a “rotated symmetric distribution”. And we should not really care - those two are identical.

There is a theorem in linear algebra, which says that any symmetric matrix \bf \Sigma can be represented as:

(3)   \[{\bf \Sigma} = {\bf VDV}^T,\]

where {\bf V} is orthogonal (i.e. a rotation) and {\bf D} is diagonal (i.e. a coordinate-wise scaling). If we rewrite it slightly, we will get:

(4)   \[{\bf \Sigma} = ({\bf VD}^{1/2})({\bf VD}^{1/2})^T = {\bf AA}^T,\]

where {\bf A} = {\bf VD}^{1/2}. This, in simple words, means that any covariance matrix \bf \Sigma could have been the result of transforming the data using a coordinate-wise scaling {\bf D}^{1/2} followed by a rotation \bf V. Just like in our example with \bf x and \bf y above.

Principal component analysis

Given the above intuition, PCA already becomes a very obvious technique. Suppose we are given some data. Let us assume (or “pretend”) it came from a normal distribution, and let us ask the following questions:

  1. What could have been the rotation \bf V and scaling {\bf D}^{1/2}, which produced our data from a symmetric cloud?
  2. What were the original, “symmetric-cloud” coordinates \bf x before this transformation was applied.
  3. Which original coordinates were scaled the most by \bf D and thus contribute most to the spread of the data now. Can we only leave those and throw the rest out?

All of those questions can be answered in a straightforward manner if we just decompose \bf \Sigma into \bf V and \bf D according to (3). But (3) is exactly the eigenvalue decomposition of \bf\Sigma. I’ll leave you to think for just a bit and you’ll see how this observation lets you derive everything there is about PCA and more.

The metric tensor

Bear me for just a bit more. One way to summarize the observations above is to say that we can (and should) regard {\bf\Sigma}^{-1} as a metric tensor. A metric tensor is just a fancy formal name for a matrix, which summarizes the deformation of space. However, rather than claiming that it in some sense determines a particular transformation \bf A (which it does not, as we saw), we shall say that it affects the way we compute angles and distances in our transformed space.

Namely, let us redefine, for any two vectors \bf v and \bf w, their inner product as:

(5)   \[\langle {\bf v}, {\bf w}\rangle_{\Sigma^{-1}} = {\bf v}^T{\bf \Sigma}^{-1}{\bf w}.\]

To stay consistent we will also need to redefine the norm of any vector as

(6)   \[|{\bf v}|_{\Sigma^{-1}} = \sqrt{{\bf v}^T{\bf \Sigma}^{-1}{\bf v}},\]

and the distance between any two vectors as

(7)   \[|{\bf v}-{\bf w}|_{\Sigma^{-1}} = \sqrt{({\bf v}-{\bf w})^T{\bf \Sigma}^{-1}({\bf v}-{\bf w})}.\]

Those definitions now describe a kind of a “skewed world” of points. For example, a unit circle (a set of points with “skewed distance” 1 to the center) in this world might look as follows:

And here is an example of two vectors, which are considered “orthogonal”, a.k.a. “perpendicular” in this strange world:

Although it may look weird at first, note that the new inner product we defined is actually just the dot product of the “untransformed” originals of the vectors:

(8)   \[{\bf v}^T{\bf \Sigma}^{-1}{\bf w} = {\bf v}^T({\bf AA}^T)^{-1}{\bf w}=({\bf A}^{-1}{\bf v})^T({\bf A}^{-1}{\bf w}),\]

The following illustration might shed light on what is actually happening in this \Sigma-“skewed” world. Somehow “deep down inside”, the ellipse thinks of itself as a circle and the two vectors behave as if they were (2,2) and (-2,2).

Getting back to our example with the transformed points, we could now say that the point-cloud \bf y is actually a perfectly round and symmetric cloud “deep down inside”, it just happens to live in a skewed space. The deformation of this space is described by the tensor {\bf\Sigma}^{-1} (which is, as we know, equal to ({\bf AA}^T)^{-1}. The PCA now becomes a method for analyzing the deformation of space, how cool is that.

The dual space

We are not done yet. There’s one interesting property of “skewed” spaces worth knowing about. Namely, the elements of their dual space have a particular form. No worries, I’ll explain in a second.

Let us forget the whole skewed space story for a moment, and get back to the usual inner product {\bf w}^T{\bf v}. Think of this inner product as a function f_{\bf w}({\bf v}), which takes a vector \bf v and maps it to a real number, the dot product of \bf v and \bf w. Regard the \bf w here as the parameter (“weight vector”) of the function. If you have done any machine learning at all, you have certainly come across such linear functionals over and over, sometimes in disguise. Now, the set of all possible linear functionals f_{\bf w} is known as the dual space to your “data space”.

Note that each linear functional is determined uniquely by the parameter vector \bf w, which has the same dimensionality as \bf v, so apparently the dual space is in some sense equivalent to your data space - just the interpretation is different. An element \bf v of your “data space” denotes, well, a data point. An element \bf w of the dual space denotes a function f_{\bf w}, which projects your data points on the direction \bf w (recall that if \bf w is unit-length, {\bf w}^T{\bf v} is exactly the length of the perpendicular projection of \bf v upon the direction \bf w). So, in some sense, if \bf v-s are “vectors”, \bf w-s are “directions, perpendicular to these vectors”. Another way to understand the difference is to note that if, say, the elements of your data points numerically correspond to amounts in kilograms, the elements of \bf w would have to correspond to “units per kilogram”. Still with me?

Let us now get back to the skewed space. If \bf v are elements of a skewed Euclidean space with the metric tensor {\bf\Sigma}^{-1}, is a function f_{\bf w}({\bf v}) = {\bf w}^T{\bf v} an element of a dual space? Yes, it is, because, after all, it is a linear functional. However, the parameterization of this function is inconvenient, because, due to the skewed tensor, we cannot interpret it as projecting vectors upon \bf w nor can we say that \bf w is an “orthogonal direction” (to a separating hyperplane of a classifier, for example). Because, remember, in the skewed space it is not true that orthogonal vectors satisfy {\bf w}^T{\bf v}=0. Instead, they satisfy {\bf w}^T{\bf \Sigma}^{-1}{\bf v} = 0. Things would therefore look much better if we parameterized our dual space differently. Namely, by considering linear functionals of the form f^{\Sigma^{-1}}_{\bf z}({\bf v}) = {\bf z}^T{\bf \Sigma}^{-1}{\bf v}. The new parameters \bf z could now indeed be interpreted as an “orthogonal direction” and things overall would make more sense.

However when we work with actual machine learning models, we still prefer to have our functions in the simple form of a dot product, i.e. f_{\bf w}, without any ugly \bf\Sigma-s inside. What happens if we turn a “skewed space” linear functional from its natural representation into a simple inner product?

(9)   \[f^{\Sigma^{-1}}_{\bf z}({\bf v}) = {\bf z}^T{\bf\Sigma}^{-1}{\bf v} = ({\bf \Sigma}^{-1}{\bf z})^T{\bf v} = f_{\bf w}({\bf v}),\]

where {\bf w} = {\bf \Sigma}^{-1}{\bf z}. (Note that we can lose the transpose because \bf \Sigma is symmetric).

What it means, in simple terms, is that when you fit linear models in a skewed space, your resulting weight vectors will always be of the form {\bf \Sigma}^{-1}{\bf z}. Or, in other words, {\bf\Sigma}^{-1} is a transformation, which maps from “skewed perpendiculars” to “true perpendiculars”. Let me show you what this means visually.

Consider again the two “orthogonal” vectors from the skewed world example above:

Let us interpret the blue vector as an element of the dual space. That is, it is the \bf z vector in a linear functional {\bf z}^T{\bf\Sigma}^{-1}{\bf v}. The red vector is an element of the “data space”, which would be mapped to 0 by this functional (because the two vectors are “orthogonal”, remember).

For example, if the blue vector was meant to be a linear classifier, it would have its separating line along the red vector, just like that:

If we now wanted to use this classifier, we could, of course, work in the “skewed space” and use the expression {\bf z}^T{\bf\Sigma}^{-1}{\bf v} to evaluate the functional. However, why don’t we find the actual normal \bf w to that red separating line so that we wouldn’t need to do an extra matrix multiplication every time we use the function?

It is not too hard to see that {\bf w}={\bf\Sigma}^{-1}{\bf z} will give us that normal. Here it is, the black arrow:

Therefore, next time, whenever you see expressions like {\bf w}^T{\bf\Sigma}^{-1}{\bf v} or ({\bf v}-{\bf w})^T{\bf\Sigma}^{-1}({\bf v}-{\bf w}), remember that those are simply inner products and (squared) distances in a skewed space, while {\bf \Sigma}^{-1}{\bf z} is a conversion from a skewed normal to a true normal. Also remember that the “skew” was estimated by pretending that the data were normally-distributed.

Once you see it, the role of the covariance matrix in some methods like the Fisher’s discriminant or Canonical correlation analysis might become much more obvious.

The dual space metric tensor

“But wait”, you should say here. “You have been talking about expressions like {\bf w}^T{\bf\Sigma}^{-1}{\bf v} all the time, while things like {\bf w}^T{\bf\Sigma}{\bf v} are also quite common in practice. What about those?”

Hopefully you know enough now to suspect that {\bf w}^T{\bf\Sigma}{\bf v} is again an inner product or a squared norm in some deformed space, just not the “internal data metric space”, that we considered so far. Which space is it? It turns out it is the “internal dual metric space”. That is, whilst the expression {\bf w}^T{\bf\Sigma}^{-1}{\bf v} denoted the “new inner product” between the points, the expression {\bf w}^T{\bf\Sigma}{\bf v} denotes the “new inner product” between the parameter vectors. Let us see why it is so.

Consider an example again. Suppose that our space transformation \bf A scaled all points by 2 along the x axis. The point (1,0) became (2,0), the point (3, 1) became (6, 1), etc. Think of it as changing the units of measurement - before we measured the x axis in kilograms, and now we measure it in pounds. Consequently, the norm of the point (2,0) according to the new metric, |(2,0)|_{\Sigma^{-1}} will be 1, because 2 pounds is still just 1 kilogram “deep down inside”.

What should happen to the parameter ("direction") vectors due to this transformation? Can we say that the parameter vector (1,0) also got scaled to (2,0) and that the norm of the parameter vector (2,0) is now therefore also 1? No! Recall that if our initial data denoted kilograms, our dual vectors must have denoted “units per kilogram”. After the transformation they will be denoting “units per pound”, correspondingly. To stay consistent we must therefore convert the parameter vector (”1 unit per kilogram”, 0) to its equivalent (“0.5 units per pound”,0). Consequently, the norm of the parameter vector (0.5,0) in the new metric will be 1 and, by the same logic, the norm of the dual vector (2,0) in the new metric must be 4. You see, the “importance of a parameter/direction” gets scaled inversely to the “importance of data” along that parameter or direction.

More formally, if {\bf x}'={\bf Ax}, then

(10)   \begin{align*} f_{\bf w}({\bf x}) &= {\bf w}^T{\bf x} = {\bf w}^T{\bf A}^{-1}{\bf x}'\\ & =(({\bf A}^{-1})^T{\bf w})^T{\bf x}'=f_{({\bf A}^{-1})^T{\bf w}}({\bf x}'). \end{align*}

This means, that the transformation \bf A of the data points implies the transformation {\bf B}:=({\bf A}^{-1})^T of the dual vectors. The metric tensor for the dual space must thus be:

(11)   \[({\bf BB}^T)^{-1}=(({\bf A}^{-1})^T{\bf A}^{-1})^{-1}={\bf AA}^T={\bf \Sigma}.\]

Remember the illustration of the “unit circle” in the {\bf \Sigma}^{-1} metric? This is how the unit circle looks in the corresponding \bf\Sigma metric. It is rotated by the same angle, but it is stretched in the direction where it was squished before.

Intuitively, the norm (“importance”) of the dual vectors along the directions in which the data was stretched by \bf A becomes proportionally larger (note that the “unit circle” is, on the contrary, “squished” along those directions).

But the “stretch” of the space deformation in any direction can be measured by the variance of the data. It is therefore not a coincidence that {\bf w}^T{\bf \Sigma}{\bf w} is exactly the variance of the data along the direction \bf w (assuming |{\bf w}|=1).

The covariance estimate

Once we start viewing the covariance matrix as a transformation-driven metric tensor, many things become clearer, but one thing becomes extremely puzzling: why is the inverse covariance of the data a good estimate for that metric tensor? After all, it is not obvious that {\bf X}^T{\bf X}/n (where \bf X is the data matrix) must be related to the \bf\Sigma in the distribution equation \exp(-0.5{\bf x}^T{\bf\Sigma}^{-1}{\bf x}).

Here is one possible way to see the connection. Firstly, let us take it for granted that if \bf X is sampled from a symmetric Gaussian, then {\bf X}^T{\bf X}/n is, on average, a unit matrix. This has nothing to do with transformations, but just a consequence of pairwise independence of variables in the symmetric Gaussian.

Now, consider the transformed data, {\bf Y}={\bf XA}^T (vectors in the data matrix are row-wise, hence the multiplication on the right with a transpose). What is the covariance estimate of \bf Y?

(12)   \[{\bf Y}^T{\bf Y}/n=({\bf XA}^T)^T{\bf XA}^T/n={\bf A}({\bf X}^T{\bf X}){\bf A}^T/n\approx {\bf AA}^T,\]

the familiar tensor.

This is a place where one could see that a covariance matrix may make sense outside the context of a Gaussian distribution, after all. Indeed, if you assume that your data was generated from any distribution P with uncorrelated variables of unit variance and then transformed using some matrix \bf A, the expression {\bf X}^T{\bf X}/n will still be an estimate of {\bf AA}^T, the metric tensor for the corresponding (dual) space deformation.

However, note that out of all possible initial distributions P, the normal distribution is exactly the one with the maximum entropy, i.e. the “most generic”. Thus, if you base your analysis on the mean and the covariance matrix (which is what you do with PCA, for example), you could just as well assume your data to be normally distributed. In fact, a good rule of thumb is to remember, that whenever you even mention the word "covariance matrix", you are implicitly fitting a Gaussian distribution to your data.