Thursday, April 19, 2018

Path Count

I've bumped into this brain teaser recently:

"Given two integer numbers describing an n by m graph, where n represents the height and m represents the width, calculate the number of ways you can get from the top left to the bottom right if you can only go right and down"

That's a fine challenge, and prior to knowing recursive types in Haskell, I don’t know how I would have approached the problem.

Let's draw what the paths would look like first.

Given 1x1, the case is pretty simple:

{-
  The is what the matrix would look like:

  (0,1) - (1,1)
    |       |
  (0,0) - (1,0)
-}

The only way to get from the top left to the bottom right is to "walk" the perimeters:

{-
  (0,1) - (1,1) - (1,0)
  (0,1) - (0,0) - (1,0)
-}

This case is so easy, I don’t even bother with devising a solution. Let's look at a 1 by 2 graph:

{-
  (0-1) - (1,1) - (2,1)
    |       |       |
  (0-0) - (1,0) - (2,0)
-}

Following the rules laid out, there are 3 ways to get from the top left to the bottom right point:

{-
  (0-1) - (1,1) - (2,1) - (2,0)
  (0-1) - (1,1) - (1,0) - (2,0)
  (0-1) - (0,0) - (1,0) - (2,0)
-}

The rule "you can only go right and down" tells us something: it's a binary tree. How could I draw up a recursive tree structure for this?

I'd like to make sure the logic is correct, I put all this logic into an HSpec file. How 'bout this?

-- test/PathCountSpec.hs

import Test.Hspec

main :: IO ()
main = hspec spec

type Point = (Int, Int)
data Tree a = Leaf
            | Node a (Tree a) (Tree a)
            deriving (Show, Eq)

spec :: Spec
spec =
    describe "Path Count" $ do
        it "can calculate tree branches for 1x2 matrix" $ do
            let tree =
                    Node (0,1)
                        (Node (1,1)
                            (Node (2,1) Leaf
                                        (Node (2,0) Leaf Leaf))
                            (Node (1,0) (Node (2,0) Leaf Leaf)
                                        Leaf))
                        (Node (0,0)
                            (Node (1,0)
                                (Node (2,0) Leaf Leaf)
                                Leaf)
                            Leaf)
            {-
               Possible paths:
                (0,1) - (1,1) - (2,1) - (2,0)
                (0,1) - (1,1) - (1,0) - (2,0)
                (0,1) - (0,0) - (1,0) - (2,0)
            -}
            pending

The key here, that the number of times Node (2,0) Leaf Leaf appears is the number of different ways I can get from the top left to the bottom right. All I have to do is counting the number of times this sub-tree appears in the tree itself.

I wrote a function (that I put into the HSpec file itself) to do just that:

leafCount :: Tree Point -> Int
leafCount Leaf = 0
leafCount (Node _ Leaf Leaf) = 1
leafCount (Node _ left right) = leafCount left + leafCount right

When I call this function with the provided tree I have in the spec, I should receive 3. This assertion passes:

leafCount tree `shouldBe` 3

I manually had to build up this tree, next I looked at how I could generate it myself based on the top left and bottom right points. I had to make sure I won’t add branches outside of the matrix, what I accomplished with two guards.

buildTree :: Point -> Point -> Tree Point
buildTree (a,b) end@(c,d)
    | a > c = Leaf
    | b < 0 = Leaf
    | otherwise = Node (a, b) (buildTree (a+1,b) end) (buildTree (a,b-1) end)

This logic will keep "walking" right and down until the guards will stop it.

I added another assertion to make sure this still passes:

buildTree (0,1) (2,0) `shouldBe` tree

I wanted to make sure this logic still holds when I pivot the two numbers. This is the next assertion I added:

buildTree (0,2) (1,0) `shouldBe` tree

Tests are passing. All right, I need to set up an outer function that takes two numbers and returns the number of paths.

Here it is:

pathCount :: Int -> Int -> Int
pathCount m n = leafCount $ fromTree m n
    where fromTree m n = buildTree (0,m) (n,0)

This assertion will exercise this function:

pathCount 1 2 `shouldBe` 3

Everything is green!

I added one more test to make sure the 1x1 graph works:

pathCount 1 1 `shouldBe` 2

And finally I added a 2x2 test as well. This one has 6 different paths to get from the top left to the bottom right:

{-
    Possible paths:
    (0,2) - (1,2) - (2,2) - (2,1) - (2,0)
    (0,2) - (1,2) - (1,1) - (2,1) - (2,0)
    (0,2) - (1,2) - (1,1) - (1,0) - (2,0)
    (0,2) - (0,1) - (1,1) - (2,1) - (2,0)
    (0,2) - (0,1) - (1,1) - (1,0) - (2,0)
    (0,2) - (0,1) - (0,0) - (1,0) - (2,0)
-}
pathCount 2 2 `shouldBe` 6

And when I run the spec again, it all works! You can find the solution in this gist.

Sunday, January 28, 2018

Haskell to MySQL via YeshQL (Part 3.)

In the previous blog post we built a console app in Haskell that talks to MySQL via YeshQL. It creates a client record and counts them with a SQL query.
In the final part in this series, we will add automated tests to our application, we will save a user record along with its parent client and make sure all the saving happens in one unit of work.

This is the commit point where we left it at the end of Part 2.

Add Tests

Let’s set up the great testing tool HSpec in our project.

First, replace the content of test/Spec.hs file with this:

{-# OPTIONS_GHC -F -pgmF hspec-discover #-}

This will let auto-discover any spec files we have in the test directory.

Let's add the first test to the project in the test/Hashmir/DataSpec.hs file:

module Hashmir.DataSpec where

import Test.Hspec

main :: IO ()
main = hspec spec

spec :: Spec
spec = do
    describe "Hashmir Data" $ do
        it "runs a test" $ do
            True `shouldBe` True

We are not testing anything real here, we just want to make sure all the building blocks are in place.

Add the test-suite directive to package.yml file:

...

tests:
  hashmir-test:
    source-dirs: test/
    main: Spec.hs
    dependencies:
      - hashmir
      - hspec == 2.*
    other-modules:
      Hashmir.DataSpec

make build should recompile the app, and stack test will run the entire test suite.

When all is good, you should see this:

Hashmir.Data
  Hashmir Data
    runs a test

Finished in 0.0010 seconds
1 example, 0 failures

It wouldn't save much typing, but I like navigating the projects I am working on from a Makefile, I added these changes to run the tests with make test:

...
test: ## Run the specs
  @stack test

.PHONY: help test

Commit point

Verify Client Create Logic

Creating a record in the database is easy, we already verified it when we ran the app. However, making this automated and repeatable shows some challenges. We need to make sure that every test cleans after itself in the DB. We could wrap each and every spec in a transaction and just roll it back, but that would be quite complex. Dropping and rebuilding the database is fast as it is. Sure, it's a couple of hundred milliseconds, but that is negligible for now.

HSpec provides before hooks, we will hook into that.

Let's change the test/Hashmir/DataSpec.hs like this:

module Hashmir.DataSpec where

import Test.Hspec
import System.Process
import qualified Hashmir.Data as D

main :: IO ()
main = hspec spec

resetDB :: IO ()
resetDB = callCommand "make build-db"

spec :: Spec
spec = before resetDB $ do
    describe "Hashmir Data" $ do
        it "creates a Client record" $ do
            clientId <- D.insertClient "TestClient" "testclient"
            clientId `shouldBe` 1

We call resetDB with every single spec, that function makes a system call to rebuild the DB.

When you try executing the test, stack tries to recompile the app, but it presents an error:

test/Hashmir/DataSpec.hs:4:1: error:
    Failed to load interface for ‘System.Process’
    It is a member of the hidden package ‘process-1.4.3.0’.
    Perhaps you need to add ‘process’ to the build-depends in your .cabal file.

Oh-oh. We need to add the process package to our test-suite, let's modify the package.yml like this:

tests:
  hashmir-test:
    source-dirs: test/
    main: Spec.hs
    dependencies:
      - process
      - hashmir
      - hspec == 2.*
    other-modules:
      Hashmir.DataSpec

After adding the process package, regenerating the cabal file, we can now run our first test successfully:

Hashmir.Data
  Hashmir Data
Dropping and rebuilding database hashmir_test
    runs a test

Finished in 0.1378 seconds
1 example, 0 failures

The beauty of this solution is that we can run it over and over again, the test will pass as the checked clientId will always be 1, since the database is recreated every time.

Commit point

Add a User Record Along With Client

Let's add a failing spec for this first. Add the following content to the test/Hashmir/DataSpec.hs file:

    it "creates a Client and a User record" $ do
        clientId <- D.insertClient "TestClient" "testclient"
        userId <- D.insertUser clientId "joe" "joe@example.com" "password1"
        userId `shouldBe` 1

There is no insertUser function, let's add it. We also need to add the SQL template to the YeshQL code. It's very similar to the Client insert script, here are all the changes for that:

[yesh|
    -- name:countClientSQL :: (Int)
    SELECT count(id) FROM clients;
    ;;;
    -- name:insertClientSQL
    -- :client_name :: String
    -- :subdomain :: String
    INSERT INTO clients (name, subdomain) VALUES (:client_name, :subdomain);
    ;;;
    -- name:insertUserSQL
    -- :client_id :: Integer
    -- :login :: String
    -- :email :: String
    -- :password :: String
    INSERT INTO users (client_id, login, email, password)
    VALUES (:client_id, :login, :email, :password);
|]

And the insertUser function like this:

insertUser :: Integer -> String -> String -> String -> IO Integer
insertUser clientId login email password =
    withConn $ insertUserSQL clientId login email password

When I run make test, this is the output printed on the screen:

Hashmir.Data
  Hashmir Data
Dropping and rebuilding database hashmir_test
    creates a Client record
Dropping and rebuilding database hashmir_test
    creates a Client and a User record

Finished in 0.2642 seconds
2 examples, 0 failures

The lines Dropping and rebuilding database hashmir_test is too much noise, let's remove it from the Makefile.

Hashmir.Data
  Hashmir Data
    creates a Client record
    creates a Client and a User record

Finished in 0.2354 seconds
2 examples, 0 failures

This looks much cleaner.

Commit point

Roll Back Transactions When Error Occurs

The happy path of our application is working well: the User and Client records are inserted properly. First, the Client is saved, its id is used for the User record to establish the proper references. But we should treat these two inserts as one unit of work: if the second fails, it should roll back the first insert.

Let's write a test for it. I'll make the created Client's id intentionally wrong by incrementing it by one.

    it "rolls back the transaction when failure occurs" $ do
        clientId <- D.insertClient "TestClient" "testclient"
        _ <- D.insertUser (clientId + 1) "joe" "joe@example.com" "password1"
        clientCount <- D.withConn $ D.countClientSQL
        clientCount `shouldBe` Just 0

When I run the tests, this is the error I am getting:

Hashmir.Data
  Hashmir Data
    creates a Client record
    creates a Client and a User record
    rolls back the transaction when failure occurs FAILED [1]

Failures:

  test/Hashmir/DataSpec.hs:23:
  1) Hashmir.Data, Hashmir Data, rolls back the transaction when failure occurs
       uncaught exception:
           SqlError (SqlError {seState = "",
               seNativeError = 1452,
               seErrorMsg = "Cannot add or update a child row: a foreign key constraint
                             fails (`hashmir_test`.`users`, CONSTRAINT `client_id`
                             FOREIGN KEY (`client_id`) REFERENCES `clients` (`id`))"})

Randomized with seed 668337839

Finished in 0.3924 seconds
3 examples, 1 failure

The database is protecting itself from an incorrect state, a User record won't be saved with an id that does not match a record in the clients table. This exception is justified, although, it could be handled better with a Maybe type, that's not the point right now. Let's just expect this exception for now to see a proper test failure.

Change the test like this:

    it "rolls back the transaction when failure occurs" $ do
        clientId <- D.insertClient "TestClient" "testclient"
        (D.insertUser (clientId + 1)
                      "joe"
                      "joe@example.com"
                      "password1")
            `shouldThrow` anyException
        clientCount <- D.withConn $ D.countClientSQL
        clientCount `shouldBe` Just 0

The spec now produces the error I would expect:

Failures:

  test/Hashmir/DataSpec.hs:31:
  1) Hashmir.Data, Hashmir Data, rolls back the transaction when failure occurs
       expected: Just 0
        but got: Just 1

Randomized with seed 1723584293

Finished in 0.3728 seconds
3 examples, 1 failure

Finally, we have a spec that fails correctly, as we are not rolling back the created Client record.

The reason the Client record is not rolled back is that we use two different transactions to persist the records: first, the Client record is saved and the connection is committed, and then the User record is attempted to be saved. It fails, the record is not created, but the Client record has already been committed to the database. This is our problem, we should reuse the same connection for both save operations, and only commit it after the second one.

Let's refactor the code to do that. Both the insertClient and insertUser now accept a connection:

insertClient :: H.IConnection conn =>
                      String -> String -> conn -> IO Integer
insertClient name subdomain =
    insertClientSQL name subdomain

insertUser :: H.IConnection conn =>
                    Integer -> String -> String -> String -> conn -> IO Integer
insertUser clientId login email password =
    insertUserSQL clientId login email password

The specs now has to be modified to pass in the connection:

spec :: Spec
spec = before resetDB $ do
    describe "Hashmir Data" $ do
        it "creates a Client record" $ do
            clientId <- D.withConn $ D.insertClient "TestClient" "testclient"
            clientId `shouldBe` 1
        it "creates a Client and a User record" $ do
            userId <- D.withConn (\conn -> do
                clientId <- D.insertClient "TestClient" "testclient" conn
                D.insertUser clientId "joe" "joe@example.com" "password1" conn)
            userId `shouldBe` 1
        it "rolls back the transaction when failure occurs" $ do
            (D.withConn (\conn -> do
                clientId <- D.insertClient "TestClient" "testclient" conn
                D.insertUser (clientId+1) "joe" "joe@example.com" "password1" conn))
                `shouldThrow` anyException
            clientCount <- D.withConn $ D.countClientSQL
            clientCount `shouldBe` Just 0

And finally, the Main function has to be updated as well:

main :: IO ()
main = do
    clientId <- D.withConn $ D.insertClient "TestClient" "testclient"
    putStrLn $ "New client's id is " ++ show clientId
    Just clientCount <- D.withConn D.countClientSQL
    putStrLn $ "There are " ++ show clientCount ++ " records."

When you run the tests, they should all pass now.

Commit point

Summary

In this blog series we set up YeshQL, added logic to insert Client and its dependent User records, we added tests and made sure all the writes are in one transaction.

Our final solution works, but it requires the connection to be passed in. Using a Reader Monad would be a more elegant solution, but that should be a different blog post.

Tuesday, December 12, 2017

Haskell to MySQL via YeshQL (Part 2.)

In the previous blog post, we built the skeleton of a console app in Haskell that talks to MySQL via YeshQL. We used a Makefile to rebuild the database, compile the code and run the app. This is the commit point where we left it at the end of Part 1.

In this article we will add functionality to create clients records, we will extract the logic that acquires, commits and closes the connection, and finally, we will move the database related code into its own module.

Insert a Client record

We can query the clients table, however, there are no records in that table. Let's add one script to the SQL template.

Modify the YeshQL templates by adding the following code:

 [yesh|
     -- name:countClientSQL :: (Int)
     SELECT count(id) FROM clients;
     ;;;
     -- name:insertClientSQL
     -- :client_name :: String
     -- :subdomain :: String
     INSERT INTO clients (name, subdomain) VALUES (:client_name, :subdomain);
 |]

Three semicolons are used to separate SQL statements. The -- name is used to provide a function name that we can refer to in our code. The two lines are the input arguments to the generated function. Those arguments are the values in the INSERT statement on the last line.

Let's add the following Haskell function to insert a new Client record:

insertClient :: String -> String -> IO ()
insertClient name subdomain = do
    conn <- getConn
    clientId <- insertClientSQL name subdomain conn
    commit conn
    disconnect conn
    putStrLn $ "New client's id is " ++ show clientId

The generated insertClientSQL (note the name in the SQL template) is invoked with the two specified arguments plus the connection. The uncommitted connection will not write the record to the table, without that statement the countClientSQL function would return 0 records.
Disconnecting a connection is a good practice (if you can't pool those connections), free up resources when you don't need them.

Invoke the created insertClient function from main function like this:

main = do
    insertClient "TestClient" "testclient"
    countClient

When I try to build the app (use make build for it), I get the following errors:

/Haskell/hashmir/app/Main.hs:34:5: error:
    Variable not in scope: commit :: Connection -> IO a0

It turns out that Database.HDBC library is needed to invoke commit and disconnect.
Add the following import statement to the top of the file:

import qualified Database.HDBC as H

Update the function to use the qualified names for the commit and disconnect calls.

insertClient :: String -> String -> IO ()
insertClient name subdomain = do
    conn <- getConn
    clientId <- insertClientSQL name subdomain conn
    H.commit conn
    H.disconnect conn
    putStrLn $ "New client's id is " ++ show clientId

The project should successfully build, when you run the app (use make run to do it), you should see this output:

% make run
Dropping and rebuilding database hashmir_test
time ~/.local/bin/hashmir-exe
New client's id is 1
There are 1 records.

real 0m0.020s
user 0m0.009s
sys 0m0.006s

The database got rebuilt, one client record was inserted and the count function counted that record.

Commit point

Introducing: withConn

Committing and disconnecting a connection is generally a good practice. Let's match the countClient function with the insertClient function and call commit and disconnect the connection there as well.

countClient :: IO ()
countClient = do
    conn <- getConn
    Just (clientCount) <- countClientSQL conn
    H.commit conn -- added line
    H.disconnect conn -- added line
    putStrLn $ "There are " ++ show clientCount ++ " records."

Now both the countClient and the insertClient has the same duplicated logic:

    conn <- getConn
    ...
    H.commit conn -- added line
    H.disconnect conn -- added line

This reminds me of the use of withFile from the IO module. withFile accepts a lambda where the handle is passed to it and the code in the lambda can use the provided handle. We need the same thing here, withConn would accept an active connection. Consider this function:

withConn :: (Connection -> IO b) -> IO b
withConn f = do
    conn <- getConn
    result <- f conn
    H.commit conn
    H.disconnect conn
    return result

Our refactored insertClient function would look like this:

insertClient :: String -> String -> IO ()
insertClient name subdomain = do
    clientId <-
        withConn (\conn -> do
            insertClientSQL name subdomain conn
        )
    putStrLn $ "New client's id is " ++ show clientId

When you build the project and run it, it should work without errors.

Commit point

Thanks to Haskell's currying, this function can be further simplified. No need to provide the input argument in the lambda:

insertClient :: String -> String -> IO ()
insertClient name subdomain = do
    clientId <-
        withConn (do insertClientSQL name subdomain)
    putStrLn $ "New client's id is " ++ show clientId

This looks much better, but we can further simplify this code:

insertClient :: String -> String -> IO ()
insertClient name subdomain = do
    clientId <- withConn $ do insertClientSQL name subdomain
    putStrLn $ "New client's id is " ++ show clientId

Now that function is easy to read!

Commit point

Let's refactor the countClient function similarly.

countClient :: IO ()
countClient = do
    Just (clientCount) <- withConn countClientSQL
    putStrLn $ "There are " ++ show clientCount ++ " records."

Commit point

Extract Data Access Logic

Having code that prints information on the screen from the functions that talks with the database makes them mixed with responsibilities. They should only return primitives, and the caller main function should print the reports. Let's make those functions a bit more clean:

insertClient :: String -> String -> IO Integer
insertClient name subdomain = do
    withConn $ do insertClientSQL name subdomain

countClient :: IO (Maybe Int)
countClient = do withConn countClientSQL

I really like how Haskell makes the functions that uses IO impure or dirty: once they have been tainted, they are tainted. You should always try to isolate functions that are tainted from the pure ones as the pure ones are easier test - they have no side effects.

The main function is now responsible for reporting the result:

main :: IO ()
main = do
    clientId <- insertClient "TestClient" "testclient"
    putStrLn $ "New client's id is " ++ show clientId
    Just clientCount <- countClient
    putStrLn $ "There are " ++ show clientCount ++ " records."

Commit point

Move Data Logic Into a Library

Our Main.hs file has all the application logic. It looks pretty solid, let's move it into a library module. Copy all database access related code from app/Main.hs into src/Hashmir/Data.hs file like this:

{-#LANGUAGE TemplateHaskell #-}
{-#LANGUAGE QuasiQuotes #-}

module Hashmir.Data where

import Database.YeshQL
import qualified Database.HDBC as H
import Database.HDBC.MySQL

[yesh|
    -- name:countClientSQL :: (Int)
    SELECT count(id) FROM clients;
    ;;;
    -- name:insertClientSQL
    -- :client_name :: String
    -- :subdomain :: String
    INSERT INTO clients (name, subdomain) VALUES (:client_name, :subdomain);
|]

getConn :: IO Connection
getConn = do
    connectMySQL defaultMySQLConnectInfo {
        mysqlHost     = "localhost",
        mysqlDatabase = "hashmir_test",
        mysqlUser     = "hashmir_user",
        mysqlPassword = "shei7AnganeihaeF",
        mysqlUnixSocket = "/tmp/mysql.sock"
    }

withConn :: (Connection -> IO b) -> IO b
withConn f = do
    conn <- getConn
    result <- f conn
    H.commit conn
    H.disconnect conn
    return result

insertClient :: String -> String -> IO Integer
insertClient name subdomain =
    withConn $ insertClientSQL name subdomain

countClient :: IO (Maybe Int)
countClient = withConn countClientSQL

This is the same logic we had in the app/Main.hs file, but now it is in the Hashmir.Data module.

The Main module becomes small once we remove all the code we just moved out of it:

module Main where

import qualified Hashmir.Data as D

main :: IO ()
main = do
    clientId <- D.insertClient "TestClient" "testclient"
    putStrLn $ "New client's id is " ++ show clientId
    Just clientCount <- D.countClient
    putStrLn $ "There are " ++ show clientCount ++ " records."

We also have to tell Cabal where it can find this code. Change the Lib directive to Hashmir.Data in the package.yaml:

library:
  source-dirs: src/
  exposed-modules:
    - Hashmir.Data

The project should build and when you run the app it should insert and return the number of clients records:

% make run
Dropping and rebuilding database hashmir_test
time ~/.local/bin/hashmir-exe
New client's id is 1
There are 1 records.

real 0m0.022s
user 0m0.010s
sys 0m0.007s

Commit point

This last change wraps up our Part 2 in this series. We can now create clients records, count them with a simple withConn function that properly opens, commits and closes the connection.

In the third post in this series, I will show you how we can insert two records in one transaction, how we can deal with errors and how this logic can be tested.

Thursday, November 30, 2017

Haskell to MySQL via YeshQL (Part 1.)

As I was looking for an easy way to talk to Postgres from Clojure, I discovered Yesql. I wanted to find something similar in Haskell, and I found YeshQL. It's a template parsing library on top of HDBC, exactly what I needed to keep SQL and Haskell code separate.

This blog post will show you how you can easily get YeshQL up and running and run queries against MySQL. I'll build a simple CRUD console app that you can follow along, I'll list commit points at the end of each section.

The Clojure tutorial I created for my blog posts is named Kashmir, I'll name this project Hashmir.

You will need stack and ghc installed, I have stack Version 1.5.1 x86_64 hpack-0.17.1 and ghc version 8.0.2. MySQL is also needed, I have version 5.7.20, but I won't use anything fancy as far as the database goes, if you have MySQL installed, I am sure that will do it.

Generate the project with stack

Generate the project with this command: stack new hashmir. Go inside the directory, build and deploy the app with stack build && stack install. Run the skeleton app:

% ~/.local/bin/hashmir-exe
someFunc

Commit point

Using hpack

Since I learned about hpack, I never touch a Haskell project's cabal file any more. This blog post assumes you are familiar with this tool, feel free to learn about hpack more before you proceed.

Add this package.yaml to the project's root directory:

name: hashmir
version: 0.1.0.0
author: Attila Domokos <adomokos@gmail.com>
maintainer: adomokos@gmail.com
copyright: 2017 Attila Domokos
category: Console App
homepage: https://github.com/adomokos/hashmir#readme

ghc-options: -Wall

dependencies:
  - base >= 4.7 && < 5

library:
  source-dirs: src/
  exposed-modules:
    - Lib

executables:
  hashmir-exe:
    source-dirs: app/
    main: Main.hs
    dependencies:
      hashmir
    ghc-options: -threaded -rtsopts -with-rtsopts=-N

Feel free to use your name and Github repo in this file.
Delete and regenerate the project's cabal file with this command: rm -f hashmir.cabal && stack build. stack install should produce the same executable file.

Commit point

Setting up the Database

We will need a MySQL user and a database we can use in this project. Let's add a Makefile to script that out for us. You might need the master or root user in MySQL to create a role and grant access. If the Makefile target does not work, just log on to the MySQL console with your root account and add the role and grant access with the scripts you see in the Makefile.

This is the schema file we will work with. I won't include it in this post, but this script should drop and recreate the tables. Put it into the ./resources directory.

These targets in the Makefile will create the role and will rebuild the database:

create-db-user: ## Creates a DB user with the root MySQL user
    mysql -u root --host $(HOST) -e "CREATE USER '$(DBUSER)'@'$(HOST)' IDENTIFIED BY '$(DBPASSWD)';" > /dev/null 2>&1
    mysql -u root --host $(HOST) -e "GRANT ALL PRIVILEGES ON `$(DBNAME)`.* TO '$(DBUSER)'@'$(HOST)';" > /dev/null 2>&1

build-db: ## Builds the DB
    @echo "Dropping and rebuilding database $(DBNAME)"
    @mysql -u $(DBUSER) --password='$(DBPASSWD)' --host $(HOST) -e "DROP DATABASE IF EXISTS $(DBNAME);" > /dev/null 2>&1
    @mysql -u $(DBUSER) --password='$(DBPASSWD)' --host $(HOST) -e "CREATE DATABASE $(DBNAME);" > /dev/null 2>&1
    @mysql -u $(DBUSER) --password='$(DBPASSWD)' --host $(HOST) $(DBNAME) < resources/schema.sql > /dev/null 2>&1

You should be able to execute make build to rebuild the app, and make run to rebuild the DB and run the app:

% make run
Dropping and rebuilding database crud_yeshql_test
time ~/.local/bin/hashmir-exe
someFunc

real    0m0.011s
user    0m0.002s
sys 0m0.007s

Commit point

Writing the First Query

There are two parts of using YeshQL's code:

  1. The SQL templates
  2. Code that uses the generated functions from the template

Modify the app/Main.hs file like this:

{-#LANGUAGE TemplateHaskell #-}
{-#LANGUAGE QuasiQuotes #-}

module Main where

import Database.YeshQL
import Database.HDBC.MySQL

[yesh|
    -- name:countClientSQL :: (Int)
    SELECT count(id) FROM clients;
|]

getConn :: IO Connection
getConn = do
    connectMySQL defaultMySQLConnectInfo {
        mysqlHost     = "localhost",
        mysqlDatabase = "hashmir_test",
        mysqlUser     = "hashmir_user",
        mysqlPassword = "shei7AnganeihaeF",
        mysqlUnixSocket = "/tmp/mysql.sock"
    }

countClient :: IO ()
countClient = do
    conn <- getConn
    Just (clientCount) <- countClientSQL conn
    putStrLn $ "There are " ++ show clientCount ++ " records."

main :: IO ()
main = countClient

When you try to build the project (I conveniently use make build) one of these errors are displayed:

Failed to load interface for 'Database.YeshQL'

I started referencing Database.YeshQL, however, I did not add that library to the project. This is where hpack is helpful, we only have to add it to the package.yaml file, that will generate a cabal file with the correct references.

Let's modify the dependencies section of package.yaml file like this:

...

dependencies:
  - base >= 4.7 && < 5
  - yeshql
  - HDBC
  - HDBC-mysql

...

When I try to build the project, I get the following error:

Error: While constructing the build plan, the following exceptions were encountered:

In the dependencies for hashmir-0.1.0.0:
    yeshql must match -any, but the stack configuration has no specified version (latest applicable is 3.0.1.3)
needed since hashmir-0.1.0.0 is a build target.

Recommended action: try adding the following to your extra-deps in ~/hashmir/stack.yaml:
- yeshql-3.0.1.3

By modifying the extra-deps: [] array like this in stack.yaml

extra-deps: [
    yeshql-3.0.1.3
]

will download YeshQL for the project and build it successfully.

When I run the app (use make run to do it), this is what I see:

% make run
Dropping and rebuilding database hashmir_test
time ~/.local/bin/hashmir-exe
There are 0 records.

real    0m0.019s
user    0m0.009s
sys 0m0.006s

YeshQL generated a function on-the-fly, named countClientSQL. That's the function I invoked in the countClient function. Since there are no records in the table, 0 was returned from the function.

Commit point

We set up the project, ran the first query against MySQL from Haskell via YeshQL templates. This brings us to the end of part 1 of this series. In the next article, we'll start adding more SQL queries to insert and query various records.

Friday, August 25, 2017

String Calculator with Applicative Functors in Haskell

I like the simplicity of the String Calculator kata. It's a typical example of the map-reduce algorithm, where the string has to be split by a delimiter, mapped into a list of integers and then reduced to their sum. I have often used it as an example to quickly evaluate engineering candidates, try new languages and tools. This was the first challenge I tried to solve in Haskell about a year ago. I found the code the other day, I wanted to see how I could improve upon it with everything I've learned so far.

This is what I found from a year before:

import Test.Hspec
import Data.List.Split (splitOn)

calculator :: String -> Integer
calculator x = sum $ coerceString x ","

coerceString :: String -> String -> [Integer]
coerceString x c = map (coerce) (splitOn c x)
    where
        coerce "" = 0
        coerce a = read a :: Integer

main :: IO ()
main = hspec $ do
    describe "String Calculator" $ do
        it "returns 0 for empty string" $ do
            calculator "" `shouldBe` 0
        it "returns 1 for '1'" $ do
            calculator "1" `shouldBe` 1
        it "returns 3 for '1,2,3'" $ do
            calculator "1,2,3" `shouldBe` 6

This works for simple cases, but what should I do when there is a non-numeric string in the input arguments?

        it "returns 0 for '1,2,!'" $ do
            calculator "1,2,!" `shouldBe` 0

The algorithm crashes as I suspected:

String Calculator
  returns 0 for empty string
  returns 1 for '1'
  returns 3 for '1,2,3'
  returns 0 for '1,2,!' FAILED [1]

Failures:

  string_calculator_01.hs:22:
  1) String Calculator returns 0 for '1,2,!'
       uncaught exception: ErrorCall (Prelude.read: no parse)

Randomized with seed 1943023196

Finished in 0.0023 seconds
4 examples, 1 failure

I could easily wrap the entire logic and return zero when an exception occurs, however, Haskell can do better. Much better.

I can return a Maybe Int from the operation that parses the string to an integer: if the result is Nothing here, the overall result will be Nothing.
There is a string parser in the Text.Read module that does just that, it's called readMaybe.

Here is how it works:

 % ghci
GHCi, version 8.0.2: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /Users/adomokos/.ghci
λ> import Text.Read
λ> (readMaybe "12") :: Maybe Int
Just 12
λ> (readMaybe "!") :: Maybe Int
Nothing
λ>

I need to parse the list of strings, which is achieved by mapping over the values with readMaybe:

GHCi, version 8.0.2: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /Users/adomokos/.ghci
λ> import Text.Read (readMaybe)
λ> import Data.List.Split (splitOn)
λ> let xs = "1,2,3"
λ> splitOn (",") xs
["1","2","3"]
λ> map (\x -> readMaybe x :: Maybe Int) $ splitOn (",") xs
[Just 1,Just 2,Just 3]

I now have a list of Maybe Int values that can be reduced into a single Maybe Int value. Reducing an array of numbers would be super easy (foldl (+) 0 [1,2,3] or foldl1 (+) [1,2,3] or just simply sum [1,2,3]). It's obvious that I will need something similar.

Adding a number to a Maybe Int can be achieved with a Functor:

λ> fmap (+10) (Just 4)
Just 14
λ> (+10) `fmap` (Just 4)
Just 14
λ> (+10) <$> Just 4
Just 14

All three expressions mean the same thing. The first is using the fmap as a conventional function name and arguments style, the second one uses the infix version of fmap and the third one is using a symbol.

This works for adding a number to a Maybe Int, however, I need to use an Applicative Functor to calculate the sum of two Maybe Ints.

λ> (+) <$> Just 10 <*> Just 4
Just 14

Using Applicative Functors, folding the list of Maybe Ints happens like this:

λ> let xs = [Just 1, Just 2, Just 3]
λ> foldl (\acc x -> (+) <$> x <*> acc) (Just 0) xs
Just 6
λ> foldl1 (\acc x -> (+) <$> x <*> acc) xs
Just 6

The solution now works, although a bit hard to read:

import Test.Hspec
import Text.Read (readMaybe)
import Data.List.Split (splitOn)

calculator :: String -> Maybe Int
calculator input =
    foldr (\x acc -> (+) <$> x <*> acc) (Just 0) $
        map (\x -> readMaybe x) $ splitOn "," input

main :: IO ()
main = hspec $ do
    describe "String Calculator" $ do
        it "returns Nothing for empty string" $ do
            calculator "" `shouldBe` Nothing
        it "returns Just 1 for '1'" $ do
            calculator "1" `shouldBe` Just 1
        it "returns Just 3 for '1,2,3'" $ do
            calculator "1,2,3" `shouldBe` Just 6
        it "returns Nothing for '1,2,3,!'" $ do
            calculator "1,2,3,!" `shouldBe` Nothing

It's more meaningful after I've refactored it into chunks:

calculator :: String -> Maybe Int
calculator input =
    let foldingFn acc x =  (+) <$> acc <*> x
        parsedInts = map (\x -> readMaybe x) . splitOn (",")
    in foldr1 (foldingFn) (parsedInts input)

The parsedInts function can be further simplified:

calculator :: String -> Maybe Int
calculator input =
    let foldingFn acc x =  (+) <$> acc <*> x
        parsedInts = map (readMaybe) . splitOn (",")
    in foldr1 (foldingFn) (parsedInts input)

And finally, the sum of list of Maybe values can be calculated by using the sequence function like this:

calculator :: String -> Maybe Int
calculator input =
    let parsedInts = map (readMaybe) . splitOn (",")
    in fmap sum . sequence $ parsedInts input

I find this form to be the most readable, but I liked the journey of getting there through the Applicative Functors.

Wednesday, June 28, 2017

Scenario Outline in RSpec

Some business logic is best described in a table format. The rows and columns can better detail all the different permutations, than words and sentences. Gherkin handles this well with its scenario outlines, however, RSpec does not have such feature.

As I wanted to test all the different permutation of a business logic, I ended up with a ~200 lines spec file. It was hard to review what use cases were covered as I had to scroll up and down to review the file. Had RSpec have a table format, this logic could have been described in 40 lines, which can easily fit into a screen.

It was obvious a simple DSL was needed, but what would that look like? This is what I came up with first:

RSpec.describe 'eating apples' do
  def method_under_test(start, eat, left)
    start - eat == left
  end

  def example_runner(examples)
    lines = examples.split(/\n/)
    lines.delete_at(0) # Don't need the title
    lines.each do |line|
      example = line.split('|')
      start, eat, left = example.reject(&:blank?).map(&:to_i)
      expect(method_under_test(start, eat, left)).to eq(true)
    end
  end

  it 'can tell how many are remaining' do
    examples =
      <<~TABLE
        | start | eat | left |
        | 12    | 5   | 7    |
        | 20    | 5   | 15   |
      TABLE

    example_runner(examples)
  end
end

I am using the example_runner method to parse the examples table and run the specs one by one. The spec itself is clear, the examples TABLE is easy to visualize.

The spec documentation, however, does not reveal the used table. This is what it looks like:

eating apples
  can tell how many are remaining

Let’s say, one of the examples will trigger an error. When I change the second example’s “left” column’s 15 value to 16, this is what I see when I run the spec.

F

Failures:

  1) eating apples can tell how many are remaining
     Failure/Error: expect(method_under_test(start, eat, left)).to eq(true)

       expected: true
            got: false

       (compared using ==)

This is not very useful. I don’t know which example failed. Fortunately RSpec lets us provide more context for an expectation. When I change the line of the assertion with an added context like this,

expect(method_under_test(start, eat, left)).to eq(true), line

the example that failed is properly reported by RSpec:

F

Failures:

  1) eating apples can tell how many are remaining
     Failure/Error: expect(method_under_test(start, eat, left)).to eq(true), line
       | 20    | 5   | 16   |

This could be a decent solution, however, not printing the examples for the documentation output has major drawbacks. Every row in the table should be a separate example, it should be expressed as such. With some clever formatting and using include_example a similar table layout can be achieved. Consider this example:

RSpec.describe 'eating apples' do
  def method_under_test(start, eat, left)
    start - eat == left
  end

  RSpec.shared_examples 'eating_apples' do |example|
    it "calculates the remaining apple for - #{example}" do
      parsed_example = example.split('|')
      start, eat, left = parsed_example.reject(&:blank?).map(&:to_i)

      result = method_under_test(start, eat, left)
      expect(result).to eq(true)
    end
  end

                                     # start | eat | left
                                     # ______|_____|_____
  include_examples 'eating_apples', '  12    | 5   | 7'
  include_examples 'eating_apples', '  20    | 5   | 15'
end

The table is shifted to the right, but it’s clearly visible, and the commented out header provides a different color in my IDE that separates it from the data.

include_examples

The shared_example is also a lot simpler, I don’t need to deal with the header information, as it’s not passed to it, examples are being ran line by line.

By making the test description dynamic, the reporter prints out the tested values:

eating apples
  calculates the remaining apple for -   12    | 5   | 7
  calculates the remaining apple for -   20    | 5   | 15

The triggered error is properly reported by both the line number and the example used for that particular use case.

.F

Failures:

  1) eating apples calculates the remaining apple for -   20    | 5   | 16
     Failure/Error: expect(result).to eq(true)

       expected: true
            got: false

       (compared using ==)

I am not a big fan of adjusted white space for positioning equal signs under one another, but am happy to use white space to format test code to better describe the logic it validates.

RSpec does not have scenario outlines, but hopefully, with a shared_example and a bit of clever formatting the same can be accomplished.

Wednesday, May 31, 2017

Bit Shifting for a Shard ID in Ruby

As our database grew, we had to take a serious look at how can we could split it up by our clients, as most of them wanted to have their own data separate from the others anyway. A few months ago I found a great article from Pinterest, that describes how they sharded their MySQL database.

A sharded entity needs a UUID to uniquely identify the record across all shards. Most of the programming languages can generate a UUID easily, however, what was amazing to me, was that Pinterest generated its own unique ids by encapsulating three distinct numbers in one. Wait, what??! Read that article, it’s definitely worth your time.

While Pinterest encapsulated three numbers into one, we only needed two, a client_id and an entity_id. Our client_id would be a much smaller number than our entity_id, we wanted to reserve more bits for the latter.

It turns out, Ruby has many friendlier tools to deal with binary operations. Let's look at them!

What is the binary representation of the integer number 123?

123-in-binary

Out of the 7 bits, you'll see that the 3rd one is turned off, all the others are turned on, giving us 64 + 32 + 16 + 8 + 2 +1 = 123. How can we get this binary representation in Ruby? It's super easy, just use the to_s(2) method to do it.

pry(main)> 123.to_s(2)
=>"1111011"

This is the exact same string representation as the one in the image above, where the third bit is turned off and represented with a zero.

I'd like to keep the client_id on the left side, but I'd like to reserve bits on the right side. For the sake of simplicity, I will keep this as a small number. Let's add 5 bits to the right-hand side of these bits by using the bitwise left shift operator.

pry(main)> (123 << 5).to_s(2)
=> "111101100000"

The original number, 123, is still represented on the left side, but 5 "0"s were added to the right-hand side. You get the numeric representation of this by leaving out the to_s(2) call at the end:

pry(main)> 123 << 5
=> 3936

This number can be converted back to binary:

pry(main)> 3936.to_s(2)
=> "111101100000"

On the left side I have the binary representation of 123, but how about those bits on the right side? What are those representing? Right now, those bits are all turned off, they will give you 0 ("00000".to_i(2) => 0). How can I store the number 3 on the right side? The bits should look like this:

3-on-right-side

The binary "|" will turn the two rightmost bits on:

pry(main)> (123 MM 5 | 3).to_s(2)
=> "111101100011"

Again, leaving out the to_s(2) will provide the number representation:

pry(main)> (123 << 5 | 3)
=> 3939

The storing part will work, but how can we get our two numbers back from this one combined number? Well, we have to split the bits and convert the binary representation to an integer.

Five bits were used on the right side to store our second number. We need to chop those off to get the number stored on the left side. The bitwise right shift (>>) will do just that:

pry(main)> (3939 >> 5).to_s(2)
=> "1111011"

The string "1111011" is our original 123 in a binary string format. We can convert that to an integer by using the to_i(2) String method:

pry(main)> (3939 >> 5).to_s(2).to_i(2)
=> 123

I right shifted the original number, 3939, converted it to a binary string and converted that to an Integer.

There are more efficient ways to do this by using a binary "&" (3939 >> 5) & 0b1111111 => 123 with the max value the bits can represent. That's what the Pinterest article had, but I found using the Ruby conversion methods a bit more friendly to those of us who are not dealing with binary data on a daily basis.

We have the number on the left side, but what about the number on the right side? When we convert the number representation (3939) to a binary string, we know that the five characters on the right side will represent the bits of our other number. Ruby String’s last(x) will do just that:

pry(main)> (3939 >> 0).to_s(2).last(5)
=> "00011"

Converting this binary String to Integer should be similar to what we've done before:

pry(main)> (3939 >> 0).to_s(2).last(5).to_i(2)
=> 3

Using the binary "&" with the max number the bits can store will do this conversion in one step: (3939 >> 0) & 0b11111 => 3. As a side note, the binary number can be represented as a hexadecimal value: (3939 >> 0) & 0x1F => 3. This is a lot shorter than a series of ones and zeros.

There is a limit of how large the numbers can be as you have a limit of bits to store those. The max number can be determined by flipping the available bits on. For an 7 bit number it's 64 + 32 + 16 + 8 + 4 + 2 + 1 = 127 or 2**x-1, where x is the number of bits. In our case it is 2**7-1 = 127.

We ended up using a 64-bit Integer for our shard_id, which is a BIGINT in MySQL. We store client_id in 22 bits giving us the maximum of 2**22-1 = 4_194_304 and 40 bits for the entity_id with 2**40-1 = 1_099_511_627_775 max value. The remaining two bits are "worth in gold, we can expand the number or store a third (albeit small) number in it.