An application of structural typing

Transforming XML data using the F# XML type provider

10 Jul 2016

At some point you’ll find yourself doing ETL. Whether it’s loading a few GB of data into SQL Server or consuming tiny MQ messages, one step is to parse (and potentially validate) data. Usually, this is rather boring and tedious work. Unless…

The problem

Consider the following, vastly simplified messages:

<?xml version="1.0" encoding="UTF-8" ?>
<message xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Available>Foo</Available>
  <Nil xsi:nil="true"/>
  <Missing>Bar</Missing>
  <Missing_Nil>1337</Missing_Nil>
</message>
<?xml version="1.0" encoding="UTF-8" ?>
<message>
  <Available>Bar</Available>
  <Nil>3.141</Nil>
</message>

Extract

A very convenient way of working with (unknown to the programmer) data in F# are so-called type providers.

An F# type provider is a component that provides types, properties, and methods for use in your program.

The XMLProvider is configurable with a list of messages:

type XmlMessage = XmlProvider<"messageSamples.xml", SampleIsList=true, Global=true>
<?xml version="1.0" encoding="UTF-8" ?>
<messages xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <message>
    <Available>Foo</Available>
    <Nil xsi:nil="true"/>
    <Missing>Bar</Missing>
    <Missing_Nil>1337</Missing_Nil>
  </message>
  <message>
    <Available>Baz</Available>
    <Nil>3.141</Nil>
  </message>
  <message>
    <Available></Available>
    <Nil>2.718</Nil>
    <Missing_Nil xsi:nil="true"/>
  </message>
</messages>

The generated type is then used to load a file containing only one message element:

let message = XmlMessage.Load("message.xml")

Extraction, fast and easy

Transform

The type provider generates types equivalent to the following ones:

type Message = {
    Available : string
    Missing : string option
    MissingNil : MissingNil option
    Nil : Nil
}

and MissingNil = {
    Nil : bool option
    Value : int
}

and Nil = {
    Nil : bool option
    Value : decimal
}

For loading however, data should be in a format suitable for sending to a database:

type DbMessage = {
    Available : string
    Missing : string
    MissingNil : Nullable<int>
    Nil : Nullable<decimal>
}

Again, this type is generated by a type provider: this time SQLProvider.

Our first approach for mapping from XML to DB types looked like this:

let map (xmlMessage : XmlMessage.Message) =
    let missingNil =
        xmlMessage.MissingNil
        |> Option.bind (fun n -> match n.Nil with Some true -> None | _ -> Some n.Value)
        |> Option.toNullable
    let nil =
        match xmlMessage.Nil.Nil with Some true -> None | _ -> Some xmlMessage.Nil.Value
        |> Option.toNullable
    {
        Available = xmlMessage.Available
        Missing = xmlMessage.Missing |> Option.toObj
        MissingNil = missingNil
        Nil = nil
    }

From this small example it might not be obvious: MissingNil.Nil and Nil.Nil are two different types. So just extracting the matches won’t get us much further. For sure you are now really concerned about the code duplication for extracting nillable, potentially optional values. Just like I was. Fortunately there is…

A solution

The missing piece is a transformation from nillable to nullable:

type Nillable<'T> = {
    Nil : bool option
    Value : 'T
}
Nullable<'T>

This requires subsuming all Nil types in a generic Nillable<'T>. Per se, this cannot be done in F# as there are no partial classes.

In a dynamic language we would probably just assign Value. In a typed mainstream language (such as C#) we might make Nil implement a generic interface (through partial classes) or use reflection. While being safer, implementing the interface is more work. Also, the knowledge of how to uniformly treat different types should arguably not be embedded within these types.

As there are Nillable<'T> options as well and we do know how to transform Options and Nullables, trying to come up with a function Nillable<'T> -> 'T option feels natural. Also, I wanted to stick with options for validation (out of scope) as long as possible.

In F# (and other languages supporting structural typing) there’s a thing called static type constraints and specifically member constraints. This allows to constrain a parameter to all types having certain members:

let inline optionOfNillable n =
        match (^N : (member Nil : bool option) n) with
        | Some true -> None
        | _ -> Some (^N : (member Value : 'T) n)

The most eye-catching feature here are the two (^X : (member Y : Z) x) expressions:

Executing the above snippet in fsi yields:

val inline optionOfNillable :
  n: ^N -> 'T option
    when  ^N : (member get_Nil :  ^N -> bool option) and
          ^N : (member get_Value :  ^N -> 'T)

So we got a function from a parameter n of some constrained type ^N to 'T option. ^N doesn’t need to be a Nillable. Having two members: Nil returning a bool option and Value returning an arbitrary value suffices. All our *.Nil types satisfy this condition.

To align our implementation with existing option functionality, we extend Option like this:

type Option<'T> with
    static member inline ofNillable n =
        match (^N : (member Nil : bool option) n) with
        | Some true -> None
        | _ -> Some (^N : (member Value : 'T) n)

and create a new Nillable type:

type Nillable =
    static member inline toNullable n =
        Option.ofNillable n |> Option.toNullable
    static member inline toNullable n =
        Option.bind Option.ofNillable n |> Option.toNullable

As stated before, this is just for demo purposes. In our production code we stick to options a bit longer and change to Nullables only after validation.

Let’s have a look at the type of the second Nillable.toNullable overload:

toNullable : n: ^a option -> System.Nullable<'b>
               when  ^a : (member get_Nil :  ^a -> bool option) and
                   ^a : (member get_Value :  ^a -> 'b) and
                   'b : (new : unit ->  'b) and 'b : struct and
                   'b :> System.ValueType

Great! The compiler figured that out all for itself. Imagine having to type all that…

Finally, this leads to concise mapping code:

let map (xmlMessage : XmlMessage.Message) =
    {
        Available = xmlMessage.Available
        Missing = xmlMessage.Missing |> Option.toObj
        MissingNil = xmlMessage.MissingNil |> Nillable.toNullable
        Nil = xmlMessage.Nil |> Nillable.toNullable
    }

Transform, less fast but fun

Conclusion

Wait, already? What about the Load in ETL? Well, having the transformed DbMessage this is almost boring.

We’ve seen an application of structural typing that turns the lack of F# partial classes into competitive advantage. And we’ve seen that type inference especially shines with statically resolved type parameters and member constraints.