Pinu planeet

May 23, 2017

TransferWise Tech BlogWhen Groovy, assert and threads gang together to fool you

At TransferWise, we use Kafka for our messaging needs, to communicate between our micro-services and with our legacy Grails monolith. A few days ago, I faced the strangest problem with the Kafka message processing in Grails.

The code that consumes Kafka messages works in the following way: It fetches the Kafka message, saves it to the database, acknowledges it to Kafka and then processes it in a separate thread. If the processing fails, there is a job that picks up unprocessed messages for a new try. With this approach, we are sure to never lose a message and we can also easily avoid duplicate processing.

The problem was that for some messages, the processing failed on the first attempt and worked on the second. But logs gave no explanation for the failure! There was no error message, no exception. The only thing we knew was it was not hanging, just failing silently.

Then I remembered, we process the message in a separate thread. With Spring or Grails, we are used to see every exception end up in the log file. But this does not apply to code executed in separate threads. Threads will die silently, if we don’t catch and log exceptions explicitly. Was this my problem?

No, that couldn’t be it, the try/catch in the run method was there:

try {
    // ...
} catch (Exception e) {
    log.error("Exception occurred during processing message….”, e)
}

In despair, I had this crazy idea:

[12:22] long shot, but we catch only Exception, maybe it's an exception that doesn't extend Exception...

I didn’t really believe at this theory myself. After all, things extending directly Throwable like Error are for problems we cannot handle at all, OutOfMemoryError, VirtualMachineError, etc. It's not something that would happen regularly for some events only.

Still I added catching Throwable there and almost forgot about it.

Later, a colleague released our Grails app and got alarmed by a very scary new exception:

org.codehaus.groovy.runtime.powerassert.PowerAssertionError: assert serviceUrl

That was caused by my two extra lines:

} catch (Throwable e) {
    log.error("Throwable occurred during processing message... ", e)
}

So what couldn’t happen was happening, an Error was thrown.

Deep in one of the method called by our event processing, there was an assert that was failing:

assert serviceUrl

Wait a minute, in Java assert are DISABLED by default.

But this is not Java code, it’s Groovy. And in Groovy, asserts are ENABLED by default.

And a failed assert throws an AssertionError, which extends Error!

Why did this affect only the first attempt at processing the event? Before hitting the assert, there is a check against the age of the event. Events over 1 minute old didn’t execute this code path at all, therefore they didn’t fail.

So worth remembering:

assert in Groovy != assert in Java

to catch all exceptions, at least in Groovy, catch Throwable

May 06, 2017

Raivo LaanemetsNow, 2017-04

This is an update on things related to this blog and my work. This month post is a bit longer than usually. This post is a log over the last 4 months.

Blogging

I have not written much recently. I have been busy with lots of things and there has not been enough time left for writing.

EBN campaigns

Some time I ago wrote an article about the European Business Number. As the letters are sent in different countries I get a peak in the number of readers. This peak is from Greece. It occurred in February but since then there have been corresponding peaks from multiple countries, including Estonia.

Reset

I had a large list of article topics I wanted to write about. I deleted it as lots of it had become obsolete or irrelevant for me. I now have a new list.

Work

Due to construction noise at my home I was forced to rent an office at the beginning of February. I was not able to find a good place in my home town. All places that were offered were either too large (100+ square meters, too noisy, required long contract, or had no central heating). One of the places asked me to provide services for a room for which I did not have enough time. I finally found a nearest good place in Tartu, 25km from home. The building is managed by Kaanon Kinnisvara and they have lots of office rooms here.

Office table

The room had a soviet-era table and a newer chair. I found the chair too soft for sitting for a longer time and I brought my own. In the room, there is a closet for clothes and some shelves as well. I do not plan to stay long so I have not got more furniture.

Trump influence

My largest project was put on hold again in January. The reasons were raising from the results of the recent US presidental elections and its effect on minorities that makes living in US impossible for some people.

Electron and Vue.js

The Electron-based desktop application I wrote in December saw further development. It was my first Vue.js application and going back allowed to evaluate it further. Electron was upgraded to Node.js 7.6 with native async/await and I took benefits from it by rewriting my promise-based code to employ the new control structures. In my opinion it makes Node.js and anything based on it a superior platform for IO-heavy applications. Vue.js parts of the application were refactored into components which made the codebase easier to maintain and extend. I have a plan to write more about my experience with Vue.js and compare it to React and KnockoutJS. I have used KnockoutJS so far but I consider it too hard to be safely used in large applications. I have recently revisited React and its exellent tool support and much simpler working principles are worth considering.

Electronic voting

My main project during March and April was an electronic voting system for the Estonian Free Party. We built the main part of the system a year ago but it was not possible to use it due to legal reasons. The law does not prohibit electronic voting but neither gives any useful directions how to properly implement it. This was finally solved and we proceeded to implement it. Part of the solution were electronic signatures given by using the ID-card and Mobiil-ID. Up to this point we only had implemented authentication with them. The application was built onto Node.js and the async parts of the code were refactored to use the new async/await language constructs. This made the code cleaner and improvements easier. The voting process itself lasted couple of days. The biggest trouble was solving technical ID-card issues that some users were having. There are lots of cases where things do not work correctly and we are unable to report exact error because things are not under our control (browser plugins, hardware to read cards, etc.).

MySQL refactoring

In one of the older projects I had used UUID's as primary keys. This project was not suitable for them: lots of secondary indexes. The main table had over 500k rows and indexes took 3 times more space than the amount of data in the whole database. In MySQL InnoDB engine, an index points to primary keys and thus all primary key values have to be stored for each index. Furthermore, I had used CHAR(36) datatype for UUID's. This datatype takes 3*36 bytes when the column character set is utf8mb4. In this project I was able to replace UUID primary keys with good old auto_increment integer keys.

In another project I had to replace Knex with a simpler solution: SQL queries in files. Knex is pretty OK for CRUD queries but gets in the way once queries join multiple tables, some of which are derived tables from further subqueries. The query-in-file approach was recommended by Gajus.

Toolset: Sublime Text

I did improve the text editor a bit by finally finding a good set of plugins. I use Sublime Text and the list of plugins:

The plugins are all installable using the Package Control. Besides these plugins I also use a simple highlighter for EJS.

Toolset: Redmine

I tried to convert my Redmine installation text formatting from Textile into Markdown but it did not work well as Redmine Markdown does not support embedded HTML. This script is a good start but would have required too much tweaking for me as I had used lots of advanced Textile. I hope that Redmine adds project-specific text formatting option which would allow me to use Markdown for all the new projects and keep current Textile-based text formatting intact. There is a plugin to do it but I usually try to avoid installing plugins to avoid complicated upgrade when a new version of Redmine comes out.

Toolset: Unison

I set up file syncronization for important parts of my home directory between different computers. I cannot use rsync for this as it is impossible to sync two-way deletes without keeping metadata. The Unison application seems to be an excellent tool for this. In my setup I syncronize each device with a central server to distribute changes between the devices.

New projects

At the moment I'm preparing for one new project. It is a continuation of one of my previous projects.

In May I will be on a vacation and will not work on commercial projects.

I have been thinking about the types of the projects that I work with. Startup-like projects are not anymore my favorite. These types of projects tend to have too many issues with budget and scope. One project having schedule issues means issues for anything else that was queued and scheduled to happen after the project. This is not sustainable and screws up lots of deals. I would be happy to work on a large uncertain project if it was my only one. This is not a situation I'm in today and I do not see myself working in this setting in the near future.

Some recent offerings have been typical enterprise projects to reduce the amount of manual work and boost productivity. This is what I originally wanted to do in 2010 except back then I lacked experience to get potential jobs and close deals. Some of my last projects have been fixed-price deals. I feel that I have enough experience now to make safe fixed-price estimates. I have used a stable development platform (Node.js and some SWI-prolog) for almost 5 years up to this point. Back then I was still experimenting with multiple different platforms and was not sure how much time will programming something take or whether it's practically useful or doable at all.

Hardware/infrastructure

In the last 2 months everything has been migrated away from the server that was running at my home. The migration took some work as surprisingly many things were dependent on the server and my IP address. I sold the server disks and used money to set up a desktop machine at the new office.

I also bought a new laptop. It's an HP 250 G4. It came with a Windows 10 installation which I replaced with Debian 8 (MATE desktop). The laptop has excellent compatibility with Linux, at least on the model with an Intel wireless interface.

My main desktop at home is also running Debian 8. The installation has a tricky part to get UEFI+NVMe working although it was considerably easier than the previous setup with Slackware. The installation process took about 3 hours. I'm running dual boot with Windows 10. The MSI boot selector works well with multiple UEFI boot disks.

I'm now running Debian 8 on all my systems. Together with this I have a dozen of my client servers all running Debian. This lets me to focus on a single Linux distribution.

Open Source

My backup scripts were gone too complex for shell scripts and had become hard to maintain. I rewrote them in Node.js and put the source on GitHub. The script does not embed keys and passwords anymore and is configured with a JSON file instead.

I decided to take over and fix the Node.js interface to the SWI-Prolog. I found a dozen of forks of it to make the codebase buildable on different Node.js versions. So I forked it again, fixed most of the issues and published a new package. Attempts to contact the original authors were unfruitful and I had to publish it under a new name.

Unfortunately I hit multiple restrictions with the bindings. I'm not an expert on native C++ code, especially on 3 different platforms. Instead, I wrote a new version that separates the Node.js and SWI-Prolog processes and communicates them over stdio. This turned out to work much better and I was able to solve the remaining issues (most important ones for me were support for unicode and dict data structures). However, all this comes with the cost of an additional serialization into line-terminated JSON that I use for transmitting the data over stdio.

One of my new projects needs a good PDF output support. I decided to play around with PDFKit a bit and made a simple Markdown to PDF converter. It supports a very limited subset of Markdown but I consider it good enough for producing text documents. The source of the application can be found on GitHub.

A new bicycle

I have finished building a new bicycle. It is put together from new and used parts, sourced from all over the EU. I had plans for building a new racing bike for the last 2 years but did not work on it actively until I acquired an high-end frame. This happened in the last September and since then I have gathered parts and put them together.

Merida Team Issue

A custom part

The headset required a small spacer below the upper headset cap that covers the upper bearing. I did not find a suitable spacer so I had to make my own. It was kinda OK to use 2 BB30 bottom bracket spacers for testing but cap's inner bevel edge would have put too much stress on the spacer's edge and cause issues in the long term. I made a suitable spacer from an aluminum alloy workpiece and accounted for the bevel.

Custom headet spacer

The bike is currently ready for racing which I intend to do during my vacation in May. I will write more about the bike build details in a separate post.

Home construction

The construction work to rebuild homes (including mine) is proceeding fast. 4 buildings out of 5 already have the new outer heat insulation installed. Air ventilation shafts and new water pipes have been installed.

Nooruse 13 construction progress

Noise and parts of work inside the apartments have caused some stress (moving 102 families into temporary homes would have been very costly) but I hope that the rest of the work will be done soon and normal living conditions are restored.

May 02, 2017

Four Years RemainingThe File Download Problem

I happen to use the Amazon cloud machines from time to time for various personal and work-related projects. Over the years I've accumulated a terabyte or so of data files there. Those are mostly useless intermediate results or expired back-ups, which should be deleted and forgotten, but I could not gather the strength for that. "What if those datafiles happen to be of some archaelogical interest 30 years from now?", I thought. Keeping them just lying there on an Amazon machine is, however, a waste of money - it would be cheaper to download them all onto a local hard drive and tuck it somewhere into a dark dry place.

But what would be the fastest way to download a terabyte of data from the cloud? Obviously, large downstream bandwidth is important here, but so should be a smart choice of the transfer technology. To my great suprise, googling did not provide me with a simple and convincing answer. A question posted to StackOverflow did not receive any informative replies and even got downvoted for reasons beyond my understanding. It's year 2017, but downloading a file is still not an obvious matter, apparently.

Unhappy with such state of affairs I decided to compare some of the standard ways for downloading a file from a cloud machine. Although the resulting measurements are very configuration-specific, I believe the overall results might still generalize to a wider scope.

Experimental Setup

Consider the following situation:

  • An m4.xlarge AWS machine (which is claimed to have "High" network bandwidth) located in the EU (Ireland) region, with an SSD storage volume (400 Provisioned IOPS) attached to it.
  • A 1GB file with random data, generated on that machine using the following command:
    $ dd if=/dev/urandom of=file.dat bs=1M count=1024
  • The file needs to be transferred to a university server located in Tartu (Estonia). The server has a decently high network bandwidth and uses a mirrored-striped RAID for its storage backend.

Our goal is to get the file from the AWS machine into the university server in the fastest time possible. We will now try eight different methods for that, measuring the mean transfer time over 5 attempts for each method.

File Download Methods

One can probably come up with hundreds of ways for transferring a file. The following eight are probably the most common and reasonably easy to arrange.

1. SCP (a.k.a. SFTP)

  • Server setup: None (the SSH daemon is usually installed on a cloud machine anyway).
  • Client setup: None (if you can access a cloud server, you have the SSH client installed already).
  • Download command:

    scp -i ~/.ssh/id_rsa.amazon \
             ubuntu@$REMOTE_IP:/home/ubuntu/file.dat .

2. RSync over SSH

  • Server setup: sudo apt install rsync (usually installed by default).
  • Client setup: sudo apt install rsync (usually installed by default).
  • Download command:

    rsync -havzP --stats \
          -e "ssh -i $HOME/.ssh/id_rsa.amazon" \
          ubuntu@$REMOTE_IP:/home/ubuntu/file.dat .

3. Pure RSync

  • Server setup:
    Install RSync (usually already installed):

    sudo apt install rsync

    Create /etc/rsyncd.conf with the following contents:

    pid file = /var/run/rsyncd.pid
    lock file = /var/run/rsync.lock
    log file = /var/log/rsync.log
    
    [files]
    path = /home/ubuntu

    Run the RSync daemon:

    sudo rsync --daemon
  • Client setup: sudo apt install rsync (usually installed by default).
  • Download command:

    rsync -havzP --stats \
          rsync://$REMOTE_IP/files/file.dat .

4. FTP (VSFTPD+WGet)

  • Server setup:
    Install VSFTPD:

    sudo apt install vsftpd

    Edit /etc/vsftpd.conf:

    listen=YES
    listen_ipv6=NO
    pasv_address=52.51.172.88   # The public IP of the AWS machine

    Create password for the ubuntu user:

    sudo passwd ubuntu

    Restart vsftpd:

    sudo service vsftpd restart
  • Client setup: sudo apt install wget (usually installed by default).
  • Download command:

    wget ftp://ubuntu:somePassword@$REMOTE_IP/file.dat

5. FTP (VSFTPD+Axel)

Axel is a command-line tool which can download through multiple connections thus increasing throughput.

  • Server setup: See 4.
  • Client setup: sudo apt install axel
  • Download command:

    axel -a ftp://ubuntu:somePassword@$REMOTE_IP/home/ubuntu/file.dat

6. HTTP (NginX+WGet)

  • Server setup:
    Install NginX:

    sudo apt install nginx

    Edit /etc/nginx/sites-enabled/default, add into the main server block:

    location /downloadme {
        alias /home/ubuntu;
        gzip on;
    }

    Restart nginx:

    sudo service nginx restart
  • Client setup: sudo apt install wget (usually installed by default).
  • Download command:

    wget http://$REMOTE_IP/downloadme/file.dat

7. HTTP (NginX+Axel)

  • Server setup: See 6.
  • Client setup: sudo apt install axel
  • Download command:

    axel -a http://$REMOTE_IP/downloadme/file.dat

8. AWS S3

The last option we try is first transferring the files onto an AWS S3 bucket, and then downloading from there using S3 command-line tools.

  • Server setup:
    Install and configure AWS command-line tools:

    sudo apt install awscli
    aws configure

    Create an S3 bucket:

    aws --region us-east-1 s3api create-bucket \
        --acl public-read-write --bucket test-bucket-12345 \
        --region us-east-1

    We create the bucket in the us-east-1 region because the S3 tool seems to have a bug at the moment which prevents from using it in the eu regions.

    Next, we transfer the file to the S3 bucket:

    aws --region us-east-1 s3 cp file.dat s3://test-bucket-12345
  • Client setup:
    Install and configure AWS command-line tools:

    sudo apt install awscli
    aws configure
  • Download command:

    aws --region us-east-1 s3 cp s3://test-bucket-12345/file.dat .

Results

Here are the measurement results. In case of the S3 method we report the total time needed to upload from the server to S3 and download from S3 to the local machine. Note that I did not bother to fine-tune any of the settings - it may very well be possible that some of the methods can be sped up significantly by configuring the servers appropriately. Consider the results below to indicate the "out of the box" performance of the corresponding approaches.

Although S3 comes up as the fastest method (and might be even faster if it worked out of the box with the european datacenter), RSync is only marginally slower, yet it is easier to use, requires usually no additional set-up and handles incremental downloads very gracefully. I would thus summarize the results as follows:

Whenever you need to download large files from the cloud, consider RSync over SSH as the default choice.

April 26, 2017

TransferWise Tech BlogIllusion of Reuse

Illusion of Reuse

I have quite often seen situation where trying to achieve more reuse actually ends up in building the Big Ball of Mud.

In some cases it is the Enterprise Domain Model that Eric Evans warns us against. Other places which are especially susceptible to this are Transaction Scripts and other procedural thingies including all sorts of Service classes.

An Example

For example, here is a Service called TransferService which has following methods:

  • validate
  • create
  • getPendingWorkItems
  • reInitiateChargebackPayment
  • details
  • extendedDetails
  • getStatus

Even without knowing the details of these methods it looks like we have mixed different contexts together. validate and create are probably something related to setting up a new transfer. getPendingWorkItems, reInitiateChargebackPayment and getStatus seem to deal with problem solving and tracking of the transfer. Having details and extendedDetails could be a sign that we have different representations of a transfer that are probably useful in different contexts.

Now if we look at the usages of this kind of Service then obviously everybody is using it. We have achieved our ultimate goal - it is used across the entire system. So pat on the back and congratulations on a job well done?

Why is This Bad?

First, such class is probably quite big. Everyone who wants to use it for their use case needs to filter out all the irrelevant parts of the API to find the specific thing they actually need.

With size comes higher probability of duplication. It is hard to determine what is already there and what not. Hence we are more likely going to add new stuff that is already there.

Finally, we are less likely to refactor it over time due to the extensive usages and size.

How to Avoid?

First thing is that we have to be able to actually notice this kind of situation. Smells in unit tests are generally quite good indicators of having bad design in production code as well. In my previous post I wrote about some ideas how to use unit tests for improving production code.

It is always useful not to let any class grow too big. I have found ~120 lines to be max size of a test class and I think production class should follow similar limit.

Don't obsess about reuse (also see why over-reuse is bad). It is ok to have some duplication. Especially when we are not sure yet if similarity is accidental or we are indeed dealing with the same concept. Often it seems we have a method that does almost what we need. In that case an easy option is just to parameterize that existing logic - just introduce some if inside the method. Sometimes this is ok. However, the risk is that this may lead to mixing things that evolve due to different forces at different speed. In that case a better alternative is to find something on a lower level that can be reused completely without any parameterization or just go ahead with little bit of duplication.

Establish some high level bounded contexts which each deal with their own specific problems. Don't reuse across these boundaries.

/ Used still from Christopher Nolan's movie Prestige

March 31, 2017

TransferWise Tech BlogScaling our analytics database

Business intelligence is at the core of any great company, and Transferwise is no exception.
When I started my job as a data engineer in July 2016 my initial task was to solve a long running issue with the database used for the analytic queries.

The gordian knot of the analytics database

The original configuration was a MySQL community edition, version 5.6, with an Innodb buffer of 40 GB. The virtual machine’s memory was 70 GB with 18 CPU assigned. The total database size was about 600 GB.

The analysts ran their queries using SQL, Looker and Tableau. In order to get data in almost real time our live database was replicated into a dedicated schema. In order to protect our customer’s personal data a dedicated schema with a set of views was used to obfuscate the personal information. The same schema was used for pre-aggregating some heavy queries. Other schemas were copied from the microservice database on a regular basis.

The frog effect

If you drop a frog in a pot of boiling water, it will of course frantically try to clamber out. But if you place it gently in a pot of tepid water and turn up the heat it will be slowly boiled to death.

The performance issues worsened slowly over time. One of the reasons was the size of the database constantly increasing, combined with the personal data obfuscation.
When selecting from a view, if the dataset returned is large enough, the MySQL optimiser materialises the view on disk and executes the query. The temporary files are removed when the query ends.

As a result, the analytics tools were slow under normal load. In busy periods the database became almost unusable. The analysts had to spend a lot of time tuning the existing queries rather than write new ones.

The general thinking was that MySQL was no longer a good fit. However the new solution had to satisfy requirements that were quite difficult to achieve with a single product change.

  • The data for analytics should be almost real time with the live database
  • The PII(personally identifiable information) should be obfuscated for general access
  • The PII should be available in clear for restricted users
  • The system should be able to scale for several years
  • The systems should offer modern SQL for better analytics queries

The eye of the storm

The analyst team shortlisted a few solutions covering the requirements. These were:

Google BigQuery did not have the flexibility required for the new analytics DB. Redshift had more capability but was years behind snowflake and pure PostgreSQL in terms of modern SQL. So both were removed from the list.

Both PostgreSQL and Snowflake offered very good performance and modern SQL.
But neither of them was able to replicate data from a MySQL database.

Snowflake

Snowflake is a cloud based data warehouse service. It’s based on Amazon S3 and comes with different sizing. Their pricing system is very appealing and the preliminary tests showed Snowflake outperforming PostgreSQL.

The replica between our systems and Snowflake would happen using FiveTran, an impressive multi-technology data pipeline. Unfortunately there was just one little catch.
Fivetran doesn’t have native support for obfuscation.

Customer data security is of the highest priority at TransferWise - If for any reason customer data needs to move outside our perimeter it must always be obfuscated.

PostgreSQL

Foreseeing this issue, I decided to spend time building a proof of concept based on the replica tool pg chameleon. The tool is written in python and uses the python-mysql-replication library to read the MySQL replica protocol and replay the changes into a PostgreSQL database.

The initial tests on a reduced dataset were successful and adding support for the obfuscation in real time required minimal changes.

The initial idea was to use PostgreSQL to obfuscate the data before feeding it into FiveTran.

However, because PostgreSQL’s performance was good with margins for scaling as our data grows, we decided to use just PostgreSQL for our data analytics and keep our customer’s data behind our perimeter.

A ninja elephant

PostgreSQL offers better performance, and a stronger security model with improved resource optimisation.

The issues with the views validity and speed are now just a bad memory.

Analysts can now use the complex analytics functions offered by version PostgreSQL 9.5.
Large tables, previously unusable because of their size, are now partitioned with pg pathman and their data is usable again.

Some code was optimised inside, but actually very little - maybe 10-20% was improved. We’ll do more of that in the future, but not yet. The good thing is that the performance gains we have can mostly be attributed just to PG vs MySQL. So there’s a lot of scope to improve further.
Jeff McClelland - Growth Analyst, data guru

Timing

Procedure MySQL PgSQL PgSQL cached
Daily ETL script 20 hours 4 hours N/A
Select from small table
with complex aggregations
Killed after 20 minutes 3 minutes 1 minute
Large table scan with simple filters 6 minutes 2 minutes 6 seconds

Resources

Resource MySQL PostgreSQL
Storage 940 GB 670 GB
CPU 18 8
RAM 68 GB 48 GB
Shared Memory 40 GB 5 GB

Lessons learned

Never underestimate the resource consumption

During the development of the replica tool the initialisation process required several improvements.

The resources are always finite and the out of memory killer is always happy to remind us this simple, but hard to understand concept. Some tables required a custom slice size because the size of row length triggered the OOM killer when pulling out the data.

However, even after fixing the memory issues the initial copy took 6 days.

Tuning the copy speed with the unbuffered cursors and the row number estimates improved the initial copy speed which now completes in 30 hours, including the time required for the index build.

Strictness is an illusion. MySQL doubly so

MySQL's lack of strictness is not a mystery.

The replica stopped because of the funny way the NOT NULL is managed by MySQL.

To prevent any further replica breakdown the fields with NOT NULL added with ALTER TABLE after the initialisation are created in PostgreSQL as NULLable fields.

MySQL truncates the strings of characters at the varchar size automatically. This is a problem if the field is obfuscated on PostgreSQL because the hashed string could not fit into the corresponding varchar field. Therefore all the character varying on the obfuscated schema are always text.

Idle in transaction can kill your database

Overtime I saw the PostgreSQL tables used for storing the MySQL's row images growing to unacceptable size (10th of GB). This was caused by misbehaving sessions left idle in transaction.

An idle in transaction session holds a database snapshot until it is committed or rolled back. This is bad because the normal vacuuming doesn't reclaim the dead rows which could be seen by the snapshot.

The quick fix was a cron job which removes those sessions. The long term fix was to address why those sessions appeared and fix the code causing the issue.

March 29, 2017

Four Years RemainingBlockchain in Simple Terms

The following is an expanded version of an explanatory comment I posted here.

Alice's Diary

Alice decided to keep a diary. For that she bought a notebook, and started filling it with lines like:

  1. Bought 5 apples.
  2. Called mom.
    ....
  3. Gave Bob $250.
  4. Kissed Carl.
  5. Ate a banana.
    ...

Alice did her best to keep a meticulous account of events, and whenever she had a discussion with friends about something that happened earlier, she would quickly resolve all arguments by taking out the notebook and demonstrating her records. One day she had a dispute with Bob about whether she lent him $250 earlier or not. Unfortunately, Alice did not have her notebook at hand at the time of the dispute, but she promised to bring it tomorrow to prove Bob owed her money.

Bob really did not want to return the money, so that night he got into Alice's house, found the notebook, found line 132 and carefully replaced it with "132. Kissed Dave". The next day, when Alice opened the notebook, she did not find any records about money being given to Bob, and had to apologize for making a mistake.

Alice's Blockchain

A year later Bob's conscience got to him and he confessed his crime to Alice. Alice forgave him, but decided to improve the way she kept the diary, to avoid the risk of forging records in the future. Here's what she came up with. The operating system Linups that she was using had a program named md5sum, which could convert any text to its hash - a strange sequence of 32 characters. Alice did not really understand what the program did with the text, it just seemed to produce a sufficiently random sequence. For example, if you entered "hello" into the program, it would output "b1946ac92492d2347c6235b4d2611184", and if you entered "hello " with a space at the end, the output would be "1a77a8341bddc4b45418f9c30e7102b4".

Alice scratched her head a bit and invented the following way of making record forging more complicated to people like Bob in the future: after each record she would insert the hash, obtained by feeding the md5sum program with the text of the record and the previous hash. The new diary now looked as follows:

  1. 0000 (the initial hash, let us limit ourselves with just four digits for brevity)
  2. Bought 5 apples.
  3. 4178 (the hash of "0000" and "Bought 5 apples")
  4. Called mom.
  5. 2314 (the hash of "4178" and "Called mom")
    ...
    4492
  6. Gave Bob $250.
    1010 (the hash of "4492" and "Gave Bob $250")
  7. Kissed Carl.
    8204 (the hash of "1010" and "Kissed Carl")
    ...

Now each record was "confirmed" by a hash. If someone wanted to change the line 132 to something else, they would have to change the corresponding hash (it would not be 1010 anymore). This, in turn, would affect the hash of line 133 (which would not be 8204 anymore), and so on all the way until the end of the diary. In order to change one record Bob would have to rewrite confirmation hashes for all the following diary records, which is fairly time-consuming. This way, hashes "chain" all records together, and what was before a simple journal became now a chain of records or "blocks" - a blockchain.

Proof-of-Work Blockchain

Time passed, Alice opened a bank. She still kept her diary, which now included serious banking records like "Gave out a loan" or "Accepted a deposit". Every record was accompanied with a hash to make forging harder. Everything was fine, until one day a guy named Carl took a loan of $1000000. The next night a team of twelve elite Chinese diary hackers (hired by Carl, of course) got into Alice's room, found the journal and substituted in it the line "143313. Gave out a $1000000 loan to Carl" with a new version: "143313. Gave out a $10 loan to Carl". They then quickly recomputed all the necessary hashes for the following records. For a dozen of hackers armed with calculators this did not take too long.

Fortunately, Alice saw one of the hackers retreating and understood what happened. She needed a more secure system. Her new idea was the following: let us append a number (called "nonce") in brackets to each record, and choose this number so that the confirmation hash for the record would always start with two zeroes. Because hashes are rather unpredictable, the only way to do it is to simply try out different nonce values until one of them results in a proper hash:

  1. 0000
  2. Bought 5 apples (22).
  3. 0042 (the hash of "0000" and "Bought 5 apples (22)")
  4. Called mom (14).
  5. 0089 (the hash of "0042" and "Called mom (14)")
    ...
    0057
  6. Gave Bob $250 (33).
    0001
  7. Kissed Carl (67).
    0093 (the hash of "0001" and "Kissed Carl (67)")
    ...

To confirm each record one now needs to try, on average, about 50 different hashing operations for different nonce values, which makes it 50 times harder to add new records or forge them than previously. Hopefully even a team of hackers wouldn't manage in time. Because each confirmation now requires hard (and somewhat senseless) work, the resulting method is called a proof-of-work system.

Distributed Blockchain

Tired of having to search for matching nonces for every record, Alice hired five assistants to help her maintain the journal. Whenever a new record needed to be confirmed, the assistants would start to seek for a suitable nonce in parallel, until one of them completed the job. To motivate the assistants to work faster she allowed them to append the name of the person who found a valid nonce, and promised to give promotions to those who confirmed more records within a year. The journal now looked as follows:

  1. 0000
  2. Bought 5 apples (29, nonce found by Mary).
  3. 0013 (the hash of "0000" and "Bought 5 apples (29, nonce found by Mary)")
  4. Called mom (45, nonce found by Jack).
  5. 0089 (the hash of "0013" and "Called mom (45, nonce found by Jack)")
    ...
    0068
  6. Gave Bob $250 (08, nonce found by Jack).
    0028
  7. Kissed Carl (11, nonce found by Mary).
    0041
    ...

A week before Christmas, two assistants came to Alice seeking for a Christmas bonus. Assistant Jack, showed a diary where he confirmed 140 records and Mary confirmed 130, while Mary showed a diary where she, reportedly, confirmed more records than Jack. Each of them was showing Alice a journal with all the valid hashes, but different entries! It turns out that ever since having found out about the promotion the two assistants were working hard to keep their own journals, such that all nonces would have their names. Since they had to maintain the journals individually they had to do all the work confirming records alone rather than splitting it among other assistants. This of course made them so busy that they eventually had to miss some important entries about Alice's bank loans.

Consequently, Jacks and Mary's "own journals" ended up being shorter than the "real journal", which was, luckily, correctly maintained by the three other assistants. Alice was disappointed, and, of course, did not give neither Jack nor Mary a promotion. "I will only give promotions to assistants who confirm the most records in the valid journal", she said. And the valid journal is the one with the most entries, of course, because the most work has been put into it!

After this rule has been established, the assistants had no more motivation to cheat by working on their own journal alone - a collective honest effort always produced a longer journal in the end. This rule allowed assistants to work from home and completely without supervision. Alice only needed to check that the journal had the correct hashes in the end when distributing promotions. This way, Alice's blockchain became a distributed blockchain.

Bitcoin

Jack happened to be much more effective finding nonces than Mary and eventually became a Senior Assistant to Alice. He did not need any more promotions. "Could you transfer some of the promotion credits you got from confirming records to me?", Mary asked him one day. "I will pay you $100 for each!". "Wow", Jack thought, "apparently all the confirmations I did still have some value for me now!". They spoke with Alice and invented the following way to make "record confirmation achievements" transferable between parties.

Whenever an assistant found a matching nonce, they would not simply write their own name to indicate who did it. Instead, they would write their public key. The agreement with Alice was that the corresponding confirmation bonus would belong to whoever owned the matching private key:

  1. 0000
  2. Bought 5 apples (92, confirmation bonus to PubKey61739).
  3. 0032 (the hash of "0000" and "Bought 5 apples (92, confirmation bonus to PubKey61739)")
  4. Called mom (52, confirmation bonus to PubKey55512).
  5. 0056 (the hash of "0032" and "Called mom (52, confirmation bonus to PubKey55512)")
    ...
    0071
  6. Gave Bob $250 (22, confirmation bonus to PubKey61739).
    0088
  7. Kissed Carl (40, confirmation bonus to PubKey55512).
    0012
    ...

To transfer confirmation bonuses between parties a special type of record would be added to the same diary. The record would state which confirmation bonus had to be transferred to which new public key owner, and would be signed using the private key of the original confirmation owner to prove it was really his decision:

  1. 0071
  2. Gave Bob $250 (22, confirmation bonus to PubKey6669).
    0088
  3. Kissed Carl (40, confirmation bonus to PubKey5551).
    0012
    ...
    0099
  4. TRANSFER BONUS IN RECORD 132 TO OWNER OF PubKey1111, SIGNED BY PrivKey6669. (83, confirmation bonus to PubKey4442).
    0071

In this example, record 284 transfers bonus for confirming record 132 from whoever it belonged to before (the owner of private key 6669, presumably Jack in our example) to a new party - the owner of private key 1111 (who could be Mary, for example). As it is still a record, there is also a usual bonus for having confirmed it, which went to owner of private key 4442 (who could be John, Carl, Jack, Mary or whoever else - it does not matter here). In effect, record 284 currently describes two different bonuses - one due to transfer, and another for confirmation. These, if necessary, can be further transferred to different parties later using the same procedure.

Once this system was implemented, it turned out that Alice's assistants and all their friends started actively using the "confirmation bonuses" as a kind of an internal currency, transferring them between each other's public keys, even exchanging for goods and actual money. Note that to buy a "confirmation bonus" one does not need to be Alice's assistant nor register anywhere. One just needs to provide a public key.

This confirmation bonus trading activity became so prominent that Alice stopped using the diary for her own purposes, and eventually all the records in the diary would only be about "who transferred which confirmation bonus to whom". This idea of a distributed proof-of-work-based blockchain with transferable confirmation bonuses is known as the Bitcoin.

Smart Contracts

But wait, we are not done yet. Note how Bitcoin is born from the idea of recording "transfer claims", cryptographically signed by the corresponding private key, into a blockchain-based journal. There is no reason we have to limit ourselves to this particular cryptographic protocol. For example, we could just as well make the following records:

  1. Transfer bonus in record 132 to whoever can provide signatures, corresponding to PubKey1111 AND PubKey3123.

This would be an example of a collective deposit, which may only be extracted by a pair of collaborating parties. We could generalize further and consider conditions of the form:

  1. Transfer bonus in record 132 to whoever first provides x, such that f(x) = \text{true}.

Here f(x) could be any predicate describing a "contract". For example, in Bitcoin the contract requires x to be a valid signature, corresponding to a given public key (or several keys). It is thus a "contract", verifying the knowledge of a certain secret (the private key). However, f(x) could just as well be something like:

    \[f(x) = \text{true, if }x = \text{number of bytes in record #42000},\]

which would be a kind of a "future prediction" contract - it can only be evaluated in the future, once record 42000 becomes available. Alternatively, consider a "puzzle solving contract":

    \[f(x) = \text{true, if }x = \text{valid, machine-verifiable}\]

    \[\qquad\qquad\text{proof of a complex theorem},\]

Finally, the first part of the contract, namely the phrase "Transfer bonus in record ..." could also be fairly arbitrary. Instead of transferring "bonuses" around we could just as well transfer arbitrary tokens of value:

  1. Whoever first provides x, such that f(x) = \text{true} will be DA BOSS.
    ...
  2. x=42 satisifes the condition in record 284.
    Now and forever, John is DA BOSS!

The value and importance of such arbitrary tokens will, of course, be determined by how they are perceived by the community using the corresponding blockchain. It is not unreasonable to envision situations where being DA BOSS gives certain rights in the society, and having this fact recorded in an automatically-verifiable public record ledger makes it possible to include the this knowledge in various automated systems (e.g. consider a door lock which would only open to whoever is currently known as DA BOSS in the blockchain).

Honest Computing

As you see, we can use a distributed blockchain to keep journals, transfer "coins" and implement "smart contracts". These three applications are, however, all consequences of one general, core property. The participants of a distributed blockchain ("assistants" in the Alice example above, or "miners" in Bitcoin-speak) are motivated to precisely follow all rules necessary for confirming the blocks. If the rules say that a valid block is the one where all signatures and hashes are correct, the miners will make sure these indeed are. If the rules say that a valid block is the one where a contract function needs to be executed exactly as specified, the miners will make sure it is the case, etc. They all seek to get their confirmation bonuses, and they will only get them if they participate in building the longest honestly computed chain of blocks.

Because of that, we can envision blockchain designs where a "block confirmation" requires running arbitrary computational algorithms, provided by the users, and the greedy miners will still execute them exactly as stated. This general idea lies behind the Ethereum blockchain project.

There is just one place in the description provided above, where miners have some motivational freedom to not be perfectly honest. It is the decision about which records to include in the next block to be confirmed (or which algorithms to execute, if we consider the Ethereum blockchain). Nothing really prevents a miner to refuse to ever confirm a record "John is DA BOSS", ignoring it as if it never existed at all. This problem is overcome in modern blockchains by having users offer additional "tip money" reward for each record included in the confirmed block (or for every algorithmic step executed on the Ethereum blockchain). This aligns the motivation of the network towards maximizing the number of records included, making sure none is lost or ignored. Even if some miners had something against John being DA BOSS, there would probably be enough other participants who would not turn down the opportunity of getting an additional tip.

Consequently, the whole system is economically incentivised to follow the protocol, and the term "honest computing" seems appropriate to me.

Now that you know how things work, feel free to transfer all your bitcoins (i.e. block confirmation bonuses for which you know the corresponding private keys) to the address 1JuC76CX4FGo3W3i2Xv7L86Vz4chHHg71m (i.e. a public key, to which I know the corresponding private key).

March 27, 2017

Four Years RemainingImplication and Provability

Consider the following question:

Which of the following two statements is logically true?

  1. All planets of the Solar System orbit the Sun. The Earth orbits the Sun. Consequently, the Earth is a planet of the Solar System.
  2. God is the creator of all things which exist. The Earth exists. Consequently, God created the Earth.

implicationI've seen this question or variations of it pop up as "provocative" posts in social networks several times. At times they might invite lengthy discussions, where the participants would split into camps - some claim that the first statement is true, because Earth is indeed a planet of the Solar System and God did not create the Earth. Others would laugh at the stupidity of their opponents and argue that, obviously, only the second statement is correct, because it makes a valid logical implication, while the first one does not.

Not once, however, have I ever seen a proper formal explanation of what is happening here. And although it is fairly trivial (once you know it), I guess it is worth writing up. The root of the problem here is the difference between implication and provability - something I myself remember struggling a bit to understand when I first had to encounter these notions in a course on mathematical logic years ago.

Indeed, any textbook on propositional logic will tell you in one of the first chapters that you may write

    \[A \Rightarrow B\]

to express the statement "A implies B". A chapter or so later you will learn that there is also a possibility to write

    \[A \vdash B\]

to express a confusingly similar statement, that "B is provable from A". To confirm your confusion, another chapter down the road you should discover, that A \Rightarrow B is the same as \vdash A \Rightarrow B, which, in turn, is logically equivalent to A \vdash B. Therefore, indeed, whenever A \Rightarrow B is true, A \vdash B is true, and vice-versa. Is there a difference between \vdash and \Rightarrow then, and why do we need the two different symbols at all? The "provocative" question above provides an opportunity to illustrate this.

The spoken language is rather informal, and there can be several ways of formally interpreting the same statement. Both statements in the puzzle are given in the form "A, B, consequently C". Here are at least four different ways to put them formally, which make the two statements true or false in different ways.

The Pure Logic Interpretation

Anyone who has enough experience solving logic puzzles would know that both statements should be interpreted as abstract claims about provability (i.e. deducibility):

    \[A, B \vdash C.\]

As mentioned above, this is equivalent to

    \[(A\,\&\, B) \Rightarrow C.\]

or

    \[\vdash (A\,\&\, B) \Rightarrow C.\]

In this interpretation the first statement is wrong and the second is a correct implication.

The Pragmatic Interpretation

People who have less experience with math puzzles would often assume that they should not exclude their common sense knowledge from the task. The corresponding formal statement of the problem then becomes the following:

    \[[\text{common knowledge}] \vdash (A\,\&\, B) \Rightarrow C.\]

In this case both statements become true. The first one is true simply because the consequent C is true on its own, given common knowledge (the Earth is indeed a planet) - the antecedents and provability do not play any role at all. The second is true because it is a valid reasoning, independently of the common knowledge.

This type of interpretation is used in rhetorical phrases like "If this is true, I am a Dutchman".

The Overly Strict Interpretation

Some people may prefer to believe that a logical statement should only be deemed correct if every single part of it is true and logically valid. The two claims must then be interpreted as follows:

    \[([\text{common}] \vdash A)\,\&\, ([\text{common}] \vdash B)\,\&\, (A, B\vdash C).\]

Here the issue of provability is combined with the question about the truthfulness of the facts used. Both statements are false - the first fails on logic, and the second on facts (assuming that God creating the Earth is not part of common knowledge).

The Oversimplified Interpretation

Finally, people very unfamiliar with strict logic would sometimes tend to ignore the words "consequently", "therefore" or "then", interpreting them as a kind of an extended synonym for "and". In their minds the two statements could be regarded as follows:

    \[[\text{common}] \vdash A\,\&\, B\,\&\, C.\]

From this perspective, the first statement becomes true and the second (again, assuming the aspects of creation are not commonly known) is false.

Although the author of the original question most probably did really assume the "pure logic" interpretation, as is customary for such puzzles, note how much leeway there can be when converting a seemingly simple phrase in English to a formal statement. In particular, observe that questions about provability, where you deliberately have to abstain from relying on common knowledge, may be different from questions about facts and implications, where common sense may (or must) be assumed and you can sometimes skip the whole "reasoning" part if you know the consequent is true anyway.

Here is an quiz question to check whether you understood what I meant to explain.

"The sky is blue, and therefore the Earth is round." True or false?

March 26, 2017

TransferWise Tech Blog5 Tips for Getting More Out of Your Unit Tests

State of Application Design

5 Tips for Getting More Out of Your Unit Tests

In vast majority of applications I have seen the domain logic is implemented using a set of Service classes (Transaction Scripts). Majority of these are based on the DB structure. Entities are typically quite thin DTOs that have little or no logic.

The main benefit of this kind of architecture is that it is very simple and indeed often good enough as a starting point. However, the problem is that over time when the application gets more complex this kind of approach does not scale too well. Often you end up with Services that call 6 - 8 other Services. Many of these Services have no clear responsibilities but are built in an ad-hoc manner as wrappers of existing Services adding tiny bits of logic needed for some specific new feature.

So how to avoid or dig yourself out from this kind of architecture? One approach I have found very useful is looking at the unit tests when writing them. By listening to what my tests are trying to tell me I will be able to build much better design. This is nothing else but the "Driven" part in TDD which everybody knows but is still quite hard to understand.

Indeed it is quite easy to write tests before production code but at the same time not let these tests have any significant effect on production code. Sometimes there is also this thinking that testing is supposed to be hard in which case it is particularly easy to ignore the "smells" coming from tests.

Following are some rules I try to follow when writing tests. I have found that these ideas help me to avoid fighting my tests and as a result not only are tests better but also the production code.

In the following text I use "spec" to refer to a single test class/file.

Rule 1: when spec is more than 120 lines then split it

When the spec is too long I will not be able to grasp it quickly anymore. Specific number does not matter but I have found around 120 lines to be a good threshold for myself. With very large test file it gets hard to detect duplication/overlap when adding new test methods. Also it becomes harder to understand the behavior being tested.

Rule 2: when test names have duplication it is often a sign that you should split the spec

Typically unit tests are 1:1 mapped to each production class. So tests often need to specify what exact part of the target class is being tested. This is especially common for the above mentioned Services which are often just collections of different kinds of procedures.

Lets say that we have a PaymentMethodService which has tests like:

def "when gets payment methods for EUR then returns single card method"()  
def "when gets payment methods for non-EUR then returns debit and credit as separate methods"()  
def "when gets payment methods then returns only enabled methods"()  
def "when gets payment methods for a known user then orders them based on past usage"()  
def "when gets payment methods for transfer amount > 2000 GBP then returns bank transfer as the first method"()  
...

These tests all repeat when gets payment methods. So maybe we can create a new spec for getting payment methods and we can just dump the duplicating prefix from all of the test names. Result will be:

class GetPaymentMethodsSpec {  
  def "returns only enabled methods"()
  def "when user is known then orders methods based on past usage"()
  def "for transfer amount > 2000 GBP bank transfer is the first method"()
  ...
}

Note that the spec name does not contain name of any production class. If I can find a good name that contains tested class I don't mind but if it gets in the way then I'm willing to let go of the Ctrl+Shift+T. This aligns with the idea of Uncle Bob that test and production code evolve in different directions.

Rule 3: when you have split too long spec then always think whether you should split/extract something in production code as well

If there are many tests for something then it means that the tested behavior is complex. If something is complex then it should be split apart. Often lines of code are not good indicator for complexity as you can easily hide multiple branches/conditions into single line.

From the previous example if we have multiple tests around the ordering of payment methods it may be a good sign that ordering could be extracted into a separate class like PaymentMethodOrder.

Rule 4: when test contains a lot of interactions then introduce some new concept in the production code

When looking at the tests for such Transaction Script Services then often they contain a lot of interactions. This makes writing tests very hard as it should be because there is clearly too much going on at once and we are better off splitting it.

Rule 5: extract new class when you find yourself wanting to stub out a method in tested class

When you think that you need to mock/stub some class partially then this is generally bad idea. What the test is telling you is that you have too much behavior cramped together.

You have 2 choices:

  • don't mock it and use the production implementation
  • if your test becomes too complex or you need too many similar tests then extract that logic out into separate class and test that part of behavior separately

You can also check out my post from few years ago for more tips for writing good unit tests.

Used still from Ridley Scott's Blade Runner

March 20, 2017

Four Years RemainingThe Schrödinger's Cat Uncertainty

Ever since Erwin Schrödinger described a thought experiment, in which a cat in a sealed box happened to be "both dead and alive at the same time", popular science writers have been relying on it heavily to convey the mysteries of quantum physics to the layman. Unfortunately, instead of providing any useful intuition, this example has instead laid solid base to a whole bunch of misconceptions. Having read or heard something about the strange cat, people would tend to jump to profound conclusions, such as "according to quantum physics, cats can be both dead and alive at the same time" or "the notion of a conscious observer is important in quantum physics". All of these are wrong, as is the image of a cat, who is "both dead and alive at the same time". The corresponding Wikipedia page does not stress this fact well enough, hence I thought the Internet might benefit from a yet another explanatory post.

The Story of the Cat

The basic notion in quantum mechanics is a quantum system. Pretty much anything could be modeled as a quantum system, but the most common examples are elementary particles, such as electrons or photons. A quantum system is described by its state. For example, a photon has polarization, which could be vertical or horizontal. Another prominent example of a particle's state is its wave function, which represents its position in space.

There is nothing special about saying that things have state. For example, we may say that any cat has a "liveness state", because it can be either "dead" or "alive". In quantum mechanics we would denote these basic states using the bra-ket notation as |\mathrm{dead}\rangle and |\mathrm{alive}\rangle. The strange thing about quantum mechanical systems, though, is the fact that quantum states can be combined together to form superpositions. Not only could a photon have a purely vertical polarization \left|\updownarrow\right\rangle or a purely horizontal polarization \left|\leftrightarrow\right\rangle, but it could also be in a superposition of both vertical and horizontal states:

    \[\left|\updownarrow\right\rangle + \left|\leftrightarrow\right\rangle.\]

This means that if you asked the question "is this photon polarized vertically?", you would get a positive answer with 50% probability - in another 50% of cases the measurement would report the photon as horizontally-polarized. This is not, however, the same kind of uncertainty that you get from flipping a coin. The photon is not either horizontally or vertically polarized. It is both at the same time.

Amazed by this property of quantum systems, Schrödinger attempted to construct an example, where a domestic cat could be considered to be in the state

    \[|\mathrm{dead}\rangle + |\mathrm{alive}\rangle,\]

which means being both dead and alive at the same time. The example he came up with, in his own words (citing from Wikipedia), is the following:

Schrodingers_cat.svgA cat is penned up in a steel chamber, along with the following device (which must be secured against direct interference by the cat): in a Geiger counter, there is a tiny bit of radioactive substance, so small, that perhaps in the course of the hour one of the atoms decays, but also, with equal probability, perhaps none; if it happens, the counter tube discharges and through a relay releases a hammer that shatters a small flask of hydrocyanic acid. If one has left this entire system to itself for an hour, one would say that the cat still lives if meanwhile no atom has decayed. The first atomic decay would have poisoned it.

The idea is that after an hour of waiting, the radiactive substance must be in the state

    \[|\mathrm{decayed}\rangle + |\text{not decayed}\rangle,\]

the poison flask should thus be in the state

    \[|\mathrm{broken}\rangle + |\text{not broken}\rangle,\]

and the cat, consequently, should be

    \[|\mathrm{dead}\rangle + |\mathrm{alive}\rangle.\]

Correct, right? No.

The Cat Ensemble

Superposition, which is being "in both states at once" is not the only type of uncertainty possible in quantum mechanics. There is also the "usual" kind of uncertainty, where a particle is in either of two states, we just do not exactly know which one. For example, if we measure the polarization of a photon, which was originally in the superposition \left|\updownarrow\right\rangle + \left|\leftrightarrow\right\rangle, there is a 50% chance the photon will end up in the state \left|\updownarrow\right\rangle after the measurement, and a 50% chance the resulting state will be \left|\leftrightarrow\right\rangle. If we do the measurement, but do not look at the outcome, we know that the resulting state of the photon must be either of the two options. It is not a superposition anymore. Instead, the corresponding situation is described by a statistical ensemble:

    \[\{\left|\updownarrow\right\rangle: 50\%, \quad\left|\leftrightarrow\right\rangle: 50\%\}.\]

Although it may seem that the difference between a superposition and a statistical ensemble is a matter of terminology, it is not. The two situations are truly different and can be distinguished experimentally. Essentially, every time a quantum system is measured (which happens, among other things, every time it interacts with a non-quantum system) all the quantum superpositions are "converted" to ensembles - concepts native to the non-quantum world. This process is sometimes referred to as decoherence.

Now recall the Schrödinger's cat. For the cat to die, a Geiger counter must register a decay event, triggering a killing procedure. The registration within the Geiger counter is effectively an act of measurement, which will, of course, "convert" the superposition state into a statistical ensemble, just like in the case of a photon which we just measured without looking at the outcome. Consequently, the poison flask will never be in a superposition of being "both broken and not". It will be either, just like any non-quantum object should. Similarly, the cat will also end up being either dead or alive - you just cannot know exactly which option it is before you peek into the box. Nothing special or quantum'y about this.

The Quantum Cat

"But what gives us the right to claim that the Geiger counter, the flask and the cat in the box are "non-quantum" objects?", an attentive reader might ask here. Could we imagine that everything, including the cat, is a quantum system, so that no actual measurement or decoherence would happen inside the box? Could the cat be "both dead and alive" then?

Indeed, we could try to model the cat as a quantum system with |\mathrm{dead}\rangle and |\mathrm{alive}\rangle being its basis states. In this case the cat indeed could end up in the state of being both dead and alive. However, this would not be its most exciting capability. Way more suprisingly, we could then kill and revive our cat at will, back and forth, by simply measuring its liveness state appropriately. It is easy to see how this model is unrepresentative of real cats in general, and the worry about them being able to be in superposition is just one of the many inconsistencies. The same goes for the flask and the Geiger counter, which, if considered to be quantum systems, get the magical abilities to "break" and "un-break", "measure" and "un-measure" particles at will. Those would certainly not be a real world flask nor a counter anymore.

The Cat Multiverse

There is one way to bring quantum superposition back into the picture, although it requires some rather abstract thinking. There is a theorem in quantum mechanics, which states that any statistical ensemble can be regarded as a partial view of a higher-dimensional superposition. Let us see what this means. Consider a (non-quantum) Schrödinger's cat. As it might be hopefully clear from the explanations above, the cat must be either dead or alive (not both), and we may formally represent this as a statistical ensemble:

    \[\{\left|\text{dead}\right\rangle: 50\%, \quad\left|\text{alive}\right\rangle: 50\%\}.\]

It turns out that this ensemble is mathematically equivalent in all respects to a superposition state of a higher order:

    \[\left|\text{Universe A}, \text{dead}\right\rangle + \left|\text{Universe B}, \text{alive}\right\rangle,\]

where "Universe A" and "Universe B" are some abstract, unobservable "states of the world". The situation can be interpreted by imagining two parallel universes: one where the cat is dead and one where it is alive. These universes exist simultaneously in a superposition, and we are present in both of them at the same time, until we open the box. When we do, the universe superposition collapses to a single choice of the two options and we are presented with either a dead, or a live cat.

Yet, although the universes happen to be in a superposition here, existing both at the same time, the cat itself remains completely ordinary, being either totally dead or fully alive, depending on the chosen universe. The Schrödinger's cat is just a cat, after all.

March 07, 2017

Four Years RemainingThe Difficulties of Self-Identification

Ever since the "Prior Confusion" post I was planning to formulate one of its paragraphs as the following abstract puzzle, but somehow it took me 8 years to write it up.

According to fictional statistical studies, the following is known about a fictional chronic disease "statistite":

  1. About 30% of people in the world have statistite.
  2. About 35% of men in the world have it.
  3. In Estonia, 20% of people have statistite.
  4. Out of people younger than 20 years, just 5% have the disease.
  5. A recent study of a random sample of visitors to the Central Hospital demonstrated that 40% of them suffer from statistite.

Mart, a 19-year Estonian male medical student is standing in the foyer of the Central Hospital, reading these facts from an information sheet and wondering: what are his current chances of having statistite? How should he model himself: should he consider himself as primarily "an average man", "a typical Estonian", "just a young person", or "an average visitor of the hospital"? Could he combine the different aspects of his personality to make better use of the available information? How? In general, what would be the best possible probability estimate, given the data?

March 02, 2017

Ingmar TammeväliMetsade kaitseks…

Pole siiani olnud vajadust kiruda ja vanduda, sellest meil tuhandeid delfiilikuid.

Aga nüüd pidin lausa tehnikakauge postituse tegema, ma pole metsanduse spetsialist, aga mida minu silmad näevad on jube.
Kuna seda jubedust näevad juba enamik inimesi, siis leidsin oleks aeg ka sõna võtta.

Hetkel alanud mingi sõda Eesti metsa vastu, sisuliselt kus ka ei sõida on läbustatud metsakrundid, kus ei kasva enam midagi.
Tekkinud mingi X firmad, mis käivad mööda kinnisturegistreid ja metsateatisi ning teevad omanikele pressingut telefoni kaudu, et müüge müüge.

Sisuliselt ilma irooniata meie ilusad metsad näevad juba välja nagu ebaõnnestunud brasiilia vahatus lamba peal…

Mul tekkinud küsimus:
* miks lubatakse suured metsamassiivid maha raiuda nii, et ei pea midagi asemele istutama.
Minu ettepanek, enne kui tohib üldse raiega alustada, siis metsaametnik ntx valla poolne teeb hindamise ja kui tehakse raie, siis panditasu on 35% summa metsa väärtusest.
Ehk emakeeli, istutad uue metsa asemele (6 kuu jooksul), saad 35% raha tagasi, ei istuta, oled rahast ilma

* metsaveotraktoritega lõhutakse ära külateed, samuti metsaveoautodega. See taastamise nõue oli vist 6-7 kuu jooksul naljanumber, enamus metsafirmasid ei tee seda ja ametnikud suht hambutud. Politsei ei viitsi nendega tegeleda, emakeeli … neil pole ressursse.

* miks langetati kuusikute vanusepiiri, mida tohib raiuda

Ehk kogu teksti sisu see, et lp. poliitikud kui teil mingitki austust Eestimaa väärtuste vastu, lõpetage see maffia stiilis metsade majandamine, see pole majandamine vaid lageraie !

OECD: Eestist intensiivsemalt raiub oma metsi vaid üks arenenud tööstusriik


February 23, 2017

Kuido tehnokajamEkraanikattega akna sulgemine ESC klahvivajutusega

Üllataval kombel ei leidnud selleks lihtsat lahendust, tuleb Javascriptiga jännata Visuaalselt näeb välja nii, et klikid kugugi ja avaneb ekraaninkattega aken. Kasutame ModalPopupExtender-it mille ees näitab UserControli sisu <asp:Label runat="server" ID="HForModal" style="display: none" /> <asp:Panel runat="server" ID="P1" ScrollBars="Auto" Wrap="true"  Width="80%" CssClass="modalPopup">

Raivo LaanemetsChrome 56 on Slackware 14.1

Chrome 56 on Slackware 14.1 requires the upgraded mozilla-nss package. Without the upgraded package you get errors on some HTTPS pages, including on google.com itself:

Your connection is not private.

with a detailed error code below:

NET::ERR_CERT_WEAK_SIGNATURE_ALGORITHM

The error comes from a bug in the NSS package. This is explained here in more detail. Slackware maintainers have released upgrades to the package. Upgrading the package and restarting Chrome fixes the error.

February 18, 2017

Anton ArhipovJava EE meets Kotlin

Here's an idea - what if one tries implementing Java EE application with Kotlin programming language? So I though a simple example, a servlet with an injected CDI bean, would be sufficient for a start.

Start with a build script:

<script src="https://gist.github.com/antonarhipov/db4f4002c6a1813d349b.js"></script>

And the project structure is as follows:

Here comes the servlet:

<script src="https://gist.github.com/antonarhipov/4fbf350a6a0cdb06ff86.js"></script>

What's cool about it?

First, it is Kotlin, and it works with the Java EE APIs - that is nice! Second, I kind of like the ability to set aliases for the imported classes: import javax.servlet.annotation.WebServlet as web, in the example.


What's ugly about it?

Safe calls everywhere. As we're working with Java APIs, we're forced to use safe calls in Kotlin code. This is ugly.


Next, in Kotlin, the field has to be initialized. So initializing the 'service' field with the null reference creates a "nullable" type. This also forces us to use either the safe call, or the !! operator later in the code. The attempt to "fix" this by using the constructor parameter instead of the field failed for me, the CDI container could not satisfy the dependency on startup.


Alternatively, we could initialize the field with the instance of HelloService. Then, the container would re-initialize the field with the real CDI proxy and the safe call would not be required.


Conclusions

It is probably too early to say anything for sure, as the demo application is so small. One would definitely need to write much more code to uncover the corner cases. However, some of the outcomes are quite obvious:

  • Using Kotlin in Java web application appears to be quite seamless.
  • The use of Java APIs creates the need for safe calls in Kotlin, which doesn't look very nice.

February 06, 2017

TransferWise Tech BlogWhen to Adopt the Next Cool Technology?

What should be the criteria for an organization to decide when is it a good time to update its toolbox?

Recently there has been a lot of discussion about the fatigue around JavaScript and frontend tools in general. Although it seems to be more painful on frontend the problem is not specific to frontend neither is it anything new or recent. There are two sides to this. One is the effect it has on one's personal development. Other side is how it affects organizations. More specifically how should an organization decide when is it a good time to bring in new tool/framework/language X?

When we recently discussed this topic my colleague Jordan Valdma came up with the following formula to decide when adoption makes sense:


new features + developer coolness > cost of adoption

Cost of Adoption

Introducing anything new means loss of efficiency until you have mastered it well enough. Following the model of Shu-Ha-Ri (follow-detach-fluent) it may be relatively easy to get to the first level - "following". However, it is only when moving to the next levels when one starts cashing in more of the potential value. That means looking beyond the specific feature set of the tool, searching for ways how to decouple oneself from it and employ it for something more principal. One of my favorite examples is using hexagonal architecture with Ruby on Rails.

New Features

By new features I mean the things that are actually valuable for your product. There are many aspects for any new thing that are hard to measure and are quite subjective. These should not go here. For example, "allows to write code that is more maintainable". This is very hard to prove and seems more like something that one may choose to believe or not. However, there are also things like "supports server-side rendering". If we know our product could take advantage of this then this is good objective reason for adoption.

Developer Coolness

I think when it comes to new/cool technologies it is always good to be pragmatic. In an organization that is heavily business/outcome oriented it may seem that there should be no room for non-rational arguments like how someone feels about some new language/library.

However, it is quite dangerous to completely ignore the attractiveness aspect of technology. There are two points to keep in mind. First, all good devs like to expand their skill set. Second, technologies that have certain coolness about them tend to build stronger communities around them hence have the potential of growing even more compelling features.

February 01, 2017

TransferWise Tech BlogBuilding TransferWise or the road to a product engineer

Soon it is my 5 years anniversary at TransferWise. I looked back. I wrote down what has come to my mind.

I was hired as an engineer. I thought I was hired to write the code and that is what I started doing. Simple duty. Take a task from a ticketing system, implement it and move on to the next one. Easy. One of my first tickets was following: "add a checkbox to the page with a certain functionality". Easy. I did that and then Kristo asks for a call and asks me a very simple question: "Why have you done it?". I've tried to reply something but other questions followed... You know how I felt? I felt miserable, confused and disoriented. I remember I said clearly "I feel very stupid.". Kristo replied "It is fine." Then we have had a long chat and I spent next couple weeks on that task. I talked to people trying to understand why that checkbox is needed and what does that mean after all. I designed new layout for the page. I implemented the solution. Since then I kept coding. I still believed that it was my duty and this is what I was hired for. But you guess what? Kristo kept asking questions. Slowly but steady it came to my mind that it is not the coding that I am suppose to be doing. I found my self doing a variety of activities. Talking to customers and analysing their behavior. Supporting new joiners and building the team. Designing pages. Building a vision. Many other things and of course writing the code.

At some point I understood. This stopped being easy. It has become very hard and challenging. All variety of questions were floating through my head including following. "Why at all I am hired?". "What I should be doing?". "Am I valuable?". "What is my value?". "What was my impact lately?" An example from my own life helped me to clear this out. I have a piece of land and I went to build a house. I researched the topic. I earned necessary money to fund it. I chose an architecture plan. I found workers. I organised building materials delivery. If I am to be asked about it I will clearly say: "I am building a house". I also realised. What if the workers whom I've found will be asked as well? Their reply will be exactly the same: "I am building a house". This fact amazed me. Our activities are quite different but all together we are building that house.

This analogy helped me massively. I got to a simple conclusion. I am here to build and grow TransferWise. Building TransferWise is what expected from me. Building TransferWise means variety of different activities. It may be putting bricks together to create a wall. It may be designing interior and exterior. It may be organising materials delivery. It may be talking to others who have build houses and are living in those. It may be finding and hiring builders. It might be visiting builders in a hospital when they get sick.

It also helped me to understand why am I doing it after all. With my own house it is easy because it is me who will be living there :) Apparently all the other houses in the world are constructed for someone to live there. I can’t imagine builders going for: “Let’s start building walls and then we will figure out how many floors we can get to and see if anyone will happen to live in that construction.” It will always start from consideration of people, their needs and their wishes. In case of TransferWise from thinking of customers who will be using it.

That said. I was foolish when I was evaluating myself by an engineering tasks I've finished. I was foolish to think that what I'm used to be doing is what I should be doing. Nowadays my aim is to make things happen. My aim is to figure out what needs to be done and do it. My measurement of myself is not the lines of code or a number of meetings I've had. It is not about the number of bricks I’ve placed. My goal is to have people living in the houses I’ve build. My goal is to see them living a happy life there. My goal is to see a happy TransferWise customers.

Eventually my title changed from an engineer to a product engineer and then to a product manager. I am not skilled to do my job and constantly do mistakes. But I try and keep trying. My life has become easy again. I found a better way to be an engineer.

January 22, 2017

Anton ArhipovTwitterfeed #4

Welcome to the fourth issue of my Twitterfeed. I'm still quite irregular on posting the links. But here are some interesting articles that I think are worth sharing.

News, announces and releases


Atlassian aquired Trello. OMG! I mean... happy for Trello founders. I just hope that the product would remain as good as it was.

Docker 1.13 was released. Using compose-files to deploy swarm mode services is really cool! The new monitoring and build improvements are handy. Also Docker is now AWS and Azure-ready, which is awesome!

Kotlin 1.1 beta was published with a number of interesting new features. I have mixed feelings, however. For instance, I really find type aliases an awesome feature, but the definition keyword, "typealias", feels too verbose. Just "alias" would have been much nicer.
Meanwhile, Kotlin support was announced for Spring 5. I think this is great - Kotlin suppot in the major frameworks will definitely help the adoption.

Is there anyone using Eclipse? [trollface] Buildship 2.0 for Eclipse is available, go grab it! :)

Resonating articles


RethinkDB: Why we failed. Probably the best post-mortem that I have ever read. You will notice a strange kvetch at first about the tough market and how noone wants to pay. But then reading forward the author honestly lists what was really wrong. Sad that it didn't take off, it was a great project.

The Dark Path - probably the most contradicting blog post I've read recently. Robert Martin takes his word on Swift and Kotlin. A lot of people, the proponents of strong typing, reacted to this blog post immediately. "Types are tests!", they said. However, I felt like Uncle Bob just wrote this articles to repeat his point about tests: "it doesn't matter if your programming language strongly typed or not, you should write tests". No one would disagree with this statement, I believe. However, the followup article was just strange: "I consider the static typing of Swift and Kotlin to have swung too far in the statically type-checked direction." OMG, really!? Did Robert see Scala or Haskell? Or Idris? IMO, Swift and Kotlin hit the sweet spot in regards to type system that would actually _help_ the developers without getting in the way. Quite a disappointing read, I have to say..

Java 9


JDK 9 is feature complete. Those are great news. Now, it would be nice to see how will the ecosystem survive with all the issues related to reflective access. Workarounds exist, but there should be a proper solution without such hacks. Jigsaw caused a lot of concerns here and there but the bet is that in the long run, the benefits will outweigh the inconveniences.

Misc


The JVM is not that heavy
15 tricks for every web dev
Synchronized decorators
Code review as a gateway
How to build a minimal JVM container with Docker and Alpine Linux
Lagom, the monolith killer
Reactive Streams and the weird case of backpressure
Closures don’t mean mutability.
How do I keep my git fork up to date?

Predictions for 2017


Since it is the beginning of 2017, it is trendy to make predictions for the trends of the upcoming year. Here are some prediction by the industry thought leaders:

Adam Bien’s 2017 predictions
Simon Ritter’s 2017 predictions
Ted Neward’s 2017 predictions

January 04, 2017

TransferWise Tech BlogEffective Reuse on Frontend

In my previous post I discussed cost of reuse and some strategies how to deal with it on the backend. What about frontend? In terms of reuse both are very similar to each other. When we have more than just a few teams regularly contributing to frontend we need to start thinking how we approach reuse across different contexts/teams.

Exposing some API of our microservice to other teams makes it a published interface. Once this is done we cannot change it that easily anymore. Same happens on frontend when a team decides to "publish" some frontend component to be reused by other teams. The API (as well as the look) of this component becomes part of the contract exposed to the outside world.

Hence I believe that:

We should split web frontend into smaller pieces — microapps — much the same way as we split backend into microservices. Development and deployment of these microapps should be as independent of each other as possible.

This aligns quite well with the ideas of Martin Fowler, James Lewis and Udi Dahan who suggest that "microservice" is not a backed only concept. Instead of process boundaries it should be defined by business capabilities and include its own UI if necessary.

Similarly to microservices we want to promote reuse within each microapp while we want to be careful with reuse across different microapps/teams.

January 02, 2017

Raivo LaanemetsNow, 2017-01, summary of 2016 and plans for 2017

This is an update on things related to this blog and my work.

Last month

Blogging

  • Added an UX improvement: external links have target="_blank" to make them open in a new tab. The justification can be found in this article. It is implemented using a small piece of script in the footer.
  • Updated the list of projects to include work done in 2016.
  • Updated the visual style for better readability. The article page puts more focus on the content and less on the related things.
  • Updated the CV.
  • Found and fixed some non-valid HTML markup on some pages.
  • Wrote announcements to the last of my Open Source projects: DOM-EEE and Dataline.

I also discovered that mail notifications were not working. The configuration was broken for some time and I had disabled alerts on the blog engine standard error stream. I have fixed the mail configuration and monitor the error log for mail sending errors.

Work

I built an Electron-based desktop app. I usually do not build desktop applications and consider them a huge pain to build and maintain. This was a small project taking 2 weeks and I also used it as a chance to evaluate the Vue.js framework. Vue.js works very well with Electron and was very easy to pick up thanks to the similarities with the KnockoutJS library. I plan to write about the both in separate articles.

The second part of my work included a DXF file exporter. DXF is a vector drawing format used by AutoCAD and industrial machines. My job was to convert and combine SVG paths from an online CAD editor into a single DXF file for a laser cutter.

During filing my annual report I was positively surprised that I need to file very little paperwork. It only required a balance sheet + a profit/loss statement + 3 small additional trivial reports. On the previous years I had to file a much more comprehensive report now required from mid-size (Estonian scale) companies with about 250 employees.

Infrastructure

I have made some changes to my setup:

  • Logging and monitoring was moved to an OVH VPS.
  • Everything else important is moved away from the home server. Some client systems are still waiting to be moved.

The changes were necessary as I might travel a bit in 2017 and it won't be possible to fix my own server at home when an hardware failure occurs. I admit it was one of the stupidest decisions to run my own server hardware.

Besides these changes:

  • infdot.com now redirects to rlaanemets.com. I am not maintaining a separate company homepage anymore. This gives me more free time for the important things.
  • Rolled out SSL to my every site/app where I enter passwords. All the new certs are from Lets Encrypt and are renewed automatically.
  • I am now monitoring my top priority web servers through UptimeRobot.
  • The blog frontend is monitored by Sentry.

Other things

The apartment buildings full-scale renovations were finally accepted by the other owners and the contract has been signed with the building company. The constructions start ASAP. I have been looking for possible places to rent a quiet office space as the construction noise likely makes work in the home office impossible.

Yearly summary and plans

2016 was incredibly busy and frustrating year for me. A project at the beginning of the year was left partially unpaid after it turned out to be financially unsuccessful for the client. The project did not have a solid contract and a legal action against the client would have been very difficult. This put me into a tight situation where I took more work than I could handle to compensate my financial situation. As the work accumulated:

  • I was not able to keep up with some projects. Deadlines slipped.
  • I was not able to accept better and more paying work due to the existing work.
  • Increasing workload caused health issues: arm pains, insomnia.

In the end of the year I had to drop some projects as there was no other ways to decrease the work load. Last 2 weeks were finally pretty OK.

In 2017 I want to avoid such situations. Financially I'm already in a much better position. I will be requiring a bit stricter contracts from my clients and select projects more carefully.

Considering technology, I do not see year 2017 bring many changes. My preferred development platforms are still JavaScript (browsers, Node.js, Electron, PhantomJS) and SWI-Prolog.

December 28, 2016

Anton ArhipovTwitterfeed #3

Welcome to the third issue of my Twitterfeed. Over two weeks since the last post I've accumulated a good share of links to the news and blog posts, so it is a good time "flush the buffer".


Let's start with something more fundamental than just the news about frameworks and programming languages. "A tale of four memory caches" is a nice explanation of how browser caching works. Awesome read, nice visuals, useful takeaways. Go read it!

Machine Learning seems is becoming more and more popular. So here's a nicely structured knowledge-base at your convenience: "Top-down learning path: Machine Learning for Software Engineers".

Next, let's see what's new about all the reactive buzz. The trend is highly popular so I've collected a few links to the blog posts about RxJava and related.

First, "RxJava for easy concurrency and backpressure" is my own writeup about the beauty of the RxJava for a complex problem like backpressure combined with concurrent task scheduling.

Dávid Karnok published benchmark results for the different reactive libraries.

"Refactoring to Reactive - Anatomy of a JDBC migration" explains how reactive approach can be introduced incrementally into the legacy applications.

The reactive approach is also suitable for the Internet of Things area. So here's the article about Vert.x being used for IoT world.

IoT is actually not only about the devices but also about the cloud. Arun Gupta published a nice write up about using the AWS IoT Button with AWS Lambda and Couchbase. Looks pretty cool!

Now onto the news related to my favourite programming tool, IntelliJ IDEA!

IntelliJ IDEA 2017.1 EAP has started! Nice, but I'm not amused. Who needs those emojis anyway?! I hope IDEA developers will find something more useful in the bug tracker to fix and improve.

Andrey Cheptsov experiments with code folding in IntelliJ IDEA. The Advanced Expressions Folding plugin is available for download - give it a try!

Claus Ibsen announced that the work has started on Apache Camel IntelliJ plugin.

Since we are at the news about IntelliJ IDEA, I think it makes sense to see what's up with Kotlin as well. Kotlin 1.0.6 has been released, which is the new bugfix and tooling update. Seems like Kotlin is getting more popularity and people try to use it in conjunction with popular frameworks like Spring Boot and Vaadin.

Looks like too many links already so I'll stop here. I should start posting those more often :)