Transforming XML data using the F# XML type provider
10 Jul 2016
At some point you’ll find yourself doing ETL. Whether it’s loading a few GB of data into SQL Server or consuming tiny MQ messages, one step is to parse (and potentially validate) data. Usually, this is rather boring and tedious work. Unless…
Consider the following, vastly simplified messages:
A very convenient way of working with (unknown to the programmer) data in F# are so-called type providers.
An F# type provider is a component that provides types, properties, and methods for use in your program.
The XMLProvider is configurable with a list of messages:
The generated type is then used to load a file containing only one message
element:
Extraction, fast and easy
The type provider generates types equivalent to the following ones:
For loading however, data should be in a format suitable for sending to a database:
Again, this type is generated by a type provider: this time SQLProvider.
Our first approach for mapping from XML to DB types looked like this:
From this small example it might not be obvious: MissingNil.Nil
and Nil.Nil
are two different types.
So just extracting the match
es won’t get us much further.
For sure you are now really concerned about the code duplication for extracting nillable, potentially optional values.
Just like I was. Fortunately there is…
The missing piece is a transformation from nillable to nullable:
This requires subsuming all Nil
types in a generic Nillable<'T>
. Per se, this cannot be done in F# as there are no partial classes.
In a dynamic language we would probably just assign Value
.
In a typed mainstream language (such as C#) we might make Nil
implement a generic interface (through partial classes) or use reflection.
While being safer, implementing the interface is more work.
Also, the knowledge of how to uniformly treat different types should arguably not be embedded within these types.
As there are Nillable<'T> options
as well and we do know how to transform Option
s and Nullable
s,
trying to come up with a function Nillable<'T> -> 'T option
feels natural.
Also, I wanted to stick with option
s for validation (out of scope) as long as possible.
In F# (and other languages supporting structural typing) there’s a thing called static type constraints and specifically member constraints. This allows to constrain a parameter to all types having certain members:
The most eye-catching feature here are the two (^X : (member Y : Z) x)
expressions:
^X
is a so-called statically resolved type parameter.
In contrast to generic ones statically resolved type parameters get replaced at compile timemember Y : Z
is the constraint on type ^X
to have a member Y
of type Z
(member Y : Z) x
is applying the member on an instance of ^X
inline
keyword instructs the compiler to generate a copy of the function for every resolved type instead of restricting it to one specific resolutionExecuting the above snippet in fsi
yields:
So we got a function from a parameter n
of some constrained type ^N
to 'T option
.
^N
doesn’t need to be a Nil
lable. Having two members: Nil
returning a bool option
and Value
returning an arbitrary value suffices.
All our *.Nil
types satisfy this condition.
To align our implementation with existing option
functionality, we extend Option
like this:
and create a new Nillable
type:
As stated before, this is just for demo purposes. In our production code we stick to option
s a bit longer and change to Nullable
s only after validation.
Let’s have a look at the type of the second Nillable.toNullable
overload:
Great! The compiler figured that out all for itself. Imagine having to type all that…
Finally, this leads to concise mapping code:
Transform, less fast but fun
Wait, already? What about the Load in ETL? Well, having the transformed DbMessage
this is almost boring.
We’ve seen an application of structural typing that turns the lack of F# partial classes into competitive advantage. And we’ve seen that type inference especially shines with statically resolved type parameters and member constraints.