Friday, January 28, 2011

My LinkedIn connectivity graph

Parser monad versus parser generation

I have been playing with a parser monad for the last two months. Then last week I hesitated to ditch it and bring back a traditional parser generator (fsyacc) with the twist of using monadic code to be able to delay constructions such as auto indentation that had caused me trouble earlier.
Well I need to say that I decided to stay with the parser monad. I had three reasons:
  1. Having refactored the monadic parser it is pretty readable and looks as good as yacc syntaxes I've wrote. In any case it is no longer worse.
  2. I learned a few new tricks during this refactoring that were fundamental "monadic" tricks. As this is much of an educational exercise, sticking with the monadic parser will teach me more.
  3. Monads have a differential relation to lambda calculus and "equational" relations are the theme of this blog!

Wednesday, January 26, 2011

Type idiosyncrasies

My friend Doug says:
" would never occur to me to spend so much time thinking about a syntactical 'type' idiosyncrasy..."
Ah, life would be simpler if I did not need to worry about these kinds issues! Yet having probably written more than a million lines of code in the past (I started young!), I come to the conclusion that the higher the abstraction, the better is my productivity. Given the limited amount of time I have when working at home, I need to be as productive as possible.

Still, functional programming (FP) is a deadly sport. With procedural or object orientation, it is rare that refactoring completely changes the logic of your design. Such changes are not rare with FP. Here are a few examples:
  • Data can be passed around as arguments or carried by arguments or can be stored in monadic states
  • Specific operations can be made explicit to a monad. That is, made explicit to the higher level design of your code.
  • You can swap between a type centric recursion and a functional recursion. That is types can be defined recursive or be defined "open" with dangling parameterized type leafs which get resolved by the type inference through the recursion of functions.
  • ...
This type of design "swiveling" is not easy and puts your confidence as an architect to test, especially under time pressure.

Tuesday, January 25, 2011

tail refactoring

I found myself with this function which I didn't like:

let seq2 p1 p2 f =
state {
let! v1 = p1
let! v2 = p2
return! f v1 v2

This function is meant to sequence two state monadic functions and then do something monadic with the result.

Noting that f was the last operation applied, I refactored is as:

let seq2 p1 p2 =
state {
let! v1 = p1
let! v2 = p2
return (v1, v2)

and I added a new operator |>@ which applies a unary monadic operator to the result of the left hand monadic operation. As seq2 is monadic I can write: seq2 x y |>@ f . The advantage is that seq2 is now more general as I have other operators like |>@.

This refactoring is about generalizing the last operation of a function.

Funny crash of F# type inference with visual studio 2010

(warning this posting is buggy because the lesser and greater chars are taken as html comands, I'll fix it at some point)

I had a type, something like:
type Type =
| A of X
| B of Y
| C of Z

And I needed to convert it to
type Type'<'X> =
| A of 'X
| B of Y
| C of Z

type Type = Type'<...>

I had added a phantom type to X so I had

type Type = Type'> >

But when typing, I would enter X <> before finishing with the 'Tag' part (because X really has more arguments) and as a result create direct a type recursion in the Type arguments... Bang, Visual Studio goes down. It took me three crashes to understand that I needed to type the the word so as never to temporarily have the word "Type". For example, start with Tag and then add the Type in front.

Having said this, type inference sometimes need to delay unification, such as with parametrized types. In these cases it is hard to detect loops and that may well be why VS crashes.

Tuesday, January 18, 2011

"Cache oblivious" agile management

Cache oblivious data structures or algorithms scale when over different sizes of caches and different sizes of data usage. So how should we organize cross agile team communication so as to be oblivious to the size and the amount of change that needs to be channeled across teams? Now ideally each team is working independently; but the real world is not like that: scrum masters (let's refer to scrum as an example) like to meet and resolve impediments, architects meet to reduce redundancy and try keep the whole system for growing out of control, product owners are continuously working on their codependent backlogs.

The simple answer is to try not to worry about this. Ideally your business model allows you to minimize crossed team dependencies and to phase out products quickly enough that for the cross dependencies do not grow. But even if you managed to this may still be building a legacy of infrastructure. As architecture tends towards a distributed asynchronous systems the legacy is now in the message protocols which even if they are stateless become mini programs in their own right. So what can be done?

Your overall software development process needs to degrade gracefully. This implies that no impediment should block the development machine. The consequence then is that architecture, scrum master and HR related work need special attention!

I'll follow up with my recommendations in the later posting

Saturday, January 15, 2011

What architecture for a trading system?

Listening to Randy Shoup's interview of his eBay’s Architectural Principles presentation I was at first surprised by how similar their architecture was to my views on how to design a trading system (something I have done for fourteen years!). Then to remember e-bay may not be a high-frequency trading system but it is both an exchange, a back office and a front end system.

Tuesday, January 11, 2011

Learning functional programming

Tonight I look at Tony Morris' presentation on How to learn function programming
He starts with a focus on Pure lazy programming...
I feel that laziness is nice locally but not globally. But I do get the point, in order to "think" functionally, laziness has the advantage to "free your mind". So maybe he is right.
(I think we all agree with pureness).

When I try to explain functional programming to programmers, I start by focusing on data differentiation. Sounds complicated but it is glue to which you can always revert back to. I do so by focusing on the notion of zipper, focus and traversal (map or iter). Also I would bring in monads pretty quickly because they allow the whole higher abstraction notion easily.

Then back to the presentation: functional programming takes years of training, as does OO. My recommendation: think big, write code at home and learn! I do "kind of" believe that developers that do not write at home, do not really gain enough skills to really be on top of their job.

Still listening to Tony, I do agree that unless you learn "deep" functional programming concepts, you do not understand procedural languages!

Finally, sounds silly but I do really, really believe that the only way to really understand functional programming (and therefore all other languages) is to write compilers for functional languages! Start small, take years if necessary, you will gain more than any other private development project!

narrowing and widening in type inference

The joy of googling yourself!

I was (again) wondering about the deeper theories that tie type inference algorithms that use narrowing to those that use widening. A search on the web led to a comment on LTU ( Argh, that was my comment!

Nevertheless, since then, I come to accept that one is the dual of the other and that is does make sense to combine the narrowing and widening either to simplify the type inference algorithm or to implement "approximate" types.

Monday, January 10, 2011

Is laziness good for software architecture?

F# will automatically insert laziness into certain recursive monadic constructions. Haskell is lazy by default. And I found myself wondering if this a good thing with regards to the overall architecture of an application.

I would tend to say that laziness is not a good thing with regards to your overall design. I have two arguments:
  • Laziness does not help the overall asynchronous model, especially as the future tends towards cloud computing with its network latency
  • Laziness can blur the clarity of what is really happening. And as a consequence the overall design suffers.
I like my systems to be very much in control of what is happening, therefore my recommendation is to have one asynchronous model for the whole system. Therefore do not include the "Delay" method in your F# monads and do fix warnings when the F# compiler says it will use a delayed reference.

Wednesday, January 05, 2011

Local versus global monadic states in large applications

(Note: this needs a correction because there a third way which is to just transform the monad "top down", it has restrictions but is very usable too, so I will need to update this).

Yesterday evening it occurred to me that two choices are possible when programming large monadic applications:
  1. A single common monadic state is shared among the modules while each module is written independently of this state and is given an accessor to access its local state in this global shared state. The application then defines the global states built from local states, and the accessors to go access these local states.
  2. Each module works on a state which presents primarily the local state of the module but which carries anonymously the global state. One way to do this is to "zipperize" the global state and walk over it with a "focus" (see . "Functional Pearl: The Zipper").
I have been using solution 1 for my prototyping. Having now spent a few hours trying to convert my little application to solution 2, I have the following remarks:
  • The single global state and use of accessor per module is easier to implement but implies that the top level application design has visibility into all modules and that their assembly is not overly complex.
  • Local module specific states with anonymous "left over global state" is more powerful and allows a finer anonymization through a module hierarchy. It is also more functionally pure. Nevertheless it has the shortcoming that as each module is working "relative" to its local state, and much functional glue must be included for each cross module call. My experience was that it is pretty tricky to to get all these conversions right because they are all relative. Although in practice the application most likely has a reference global state type and therefore this style of design revert to using code to convert from and to each local module state and global state (just like the accessor).
Although I like conceptually the local module state concept, I gave up trying to use it after an evening of rewrite.

Monday, January 03, 2011

Parser generator versus combinatorial parser

I've used lex/yacc type parser generators many times in the past. For a change I used a parser monad for my latest educational effort. The initial reason was that I wanted to have auto-indentation and getting the lexical indentation state right is hard when the lexer is separate from the parser. With the parser monad, the lexical analysis is just the "lower level" of a single parsing algorithm, it is then much easier to implement constructions like auto-indent where the lexical interpretation is context specific.

Another big advantage of the parser monad is that it is "just a state monad" and, as I mentioned in an earlier post, multiple states can be combined in the same state monad while still being written separately. Therefore, having written my parser and type inference algorithm separately, I can now bring them together with advantage that type errors now show up within their parsing context. This way the type inference reports meaningful errors while not needing to know anything about the language source.

The disadvantage of working with combinatorial parsers are that it takes a lot of core refactoring to make things readable, another issue is correctness of the parser: it is not because you can define the parser that it will parse "correctly". Because of issues with F# laziness, I found that only allowing recursion to be brought in as a top level construction kept me on safer grounds.

Saturday, January 01, 2011

Higher order programming and visual studio 2010

I purchased Visual Studio 2010 (Professional) a few months ago with the idea it was the most productive development environments around. Yet now I need to revisit this belief.
The reason for this change is my dive into programming with deep functional abstraction: when programming with combinatorial operators and monads, the code no longer has a strong tie with the resulting executable. The debugger becomes near useless: the call stack is anonymous because it refers mostly to bind like functions that shed no light on what is happening, the breakpoints are often not accepted, and when they are they do not help you much because they are not a relevant parts of your code. I find myself thrown back into stone age needing to insert trace operations to understand what my code does. The other day I wrote if x1 <> x2 ... It took me nine hours to find the mistake (which I then rewrote with a switch statement because that is the right way not to make that type of mistake).

Therefore I am slowly getting the feeling that ocaml on linux is as "easy" to use as F# in Visual Studio when the code has lots of functional abstractions. And I am almost ready to say the hell with with the ML family and switch to Haskell because the formal framework is "smoother".