Eric Conner GitHub

Fixing Recursive Structs in Thrift Generated Python Code

The Problem

Python code generated by Thrift up to version 0.10.0 does not support recursive structures. The issue was first reported in July 2014. In python, types are discovered dynamically so a recursive reference to a class inside of its definition does not work.

For example, the class definition below

class Recursive(object):
  my_prop = (Recursive,)

causes the error,

Traceback (most recent call last):
  File "testcode.py", line 1, in <module>
    class Recursive(object):
  File "testcode.py", line 2, in Recursive
    my_prop = (Recursive,)
NameError: name 'Recursive' is not defined

For python, Thrift generates code that exhibits this problem because it allows recursive definitions in .thrift files.

struct Recursive {
    1: list<Recursive> Children
}

The above thrift code generates the following python code,

class Recursive:
  thrift_spec = (
    None, # 0
    (1, 
     TType.LIST,
     'Children',
     (TType.STRUCT, (Recursive, Recursive.thrift_spec)), 
     None,), # 1
  )
  ...

The thrift core python code uses these thrift_spec tuples, which contain type information and default values, for serialization and for default arguments to constructors. It is then important to preserve these tuples when supporting recursive structures, but we can see that they cannot exist as written.

There are two pretty apparent problems though:

  1. If thrift_spec is to use Recursive then it needs to be defined and monkey-patched onto the class after the class is defined. There is no other way to make this recursive definition work in python.
  2. thrift_spec refers to itself via Recursive.thrift_spec. In python a tuple is a value type. Any attempt to make a value type self-referential will just result in a copy of that tuple rather than a true self-reference. Basically, self-referential tuples don’t really make sense in python.

The Solution

Facebook’s fbthrift contains a solution to this problem which I followed for the solution in the thrift project. There are 3 major changes,

  1. Move all thrift_spec definitions to the end of the ttypes file. Or, in the case of a service definition, after the service.
  2. Change the initial declaration of thrift_spec for a TStruct object to [<Struct>, None], (to a list which is a reference type)
  3. Use a method call fix_spec to wire up the recursive references to thrift_spec so that they contain [<Struct>, <Struct>.thrift_spec]

With these changes, Thrift now generates code that for the Recursive case above looks like this:

class Recursive(object):
    """
    Attributes:
     - Children
    """
    # ... omitted ...

all_structs.append(Recursive)
Recursive.thrift_spec = (
    None,  # 0
    (1, TType.LIST, 'Children', (TType.STRUCT, [Recursive, None], False), None, ),  # 1
)
fix_spec(all_structs)
del all_structs

And the fix_spec method recursively walks over the thrift_spec definition and re-wires the references:

def fix_spec(all_structs):
    for s in all_structs:
        spec = s.thrift_spec
        for t in spec:
            if t is None:
                continue
            elif t[1] == TType.STRUCT:
                t[3][1] = t[3][0].thrift_spec
            elif t[1] in (TType.LIST, TType.SET):
                _fix_list_or_set(t[3])
            elif t[1] == TType.MAP:
                _fix_map(t[3])


def _fix_list_or_set(element_type):
    if element_type[0] == TType.STRUCT:
        element_type[1][1] = element_type[1][0].thrift_spec
    elif element_type[0] in (TType.LIST, TType.SET):
        _fix_list_or_set(element_type[1])
    elif element_type[0] == TType.MAP:
        _fix_map(element_type[1])


def _fix_map(element_type):
    if element_type[0] == TType.STRUCT:
        element_type[1][1] = element_type[1][0].thrift_spec
    elif element_type[0] in (TType.LIST, TType.SET):
        _fix_list_or_set(element_type[1])
    elif element_type[0] == TType.MAP:
        _fix_map(element_type[1])

    if element_type[2] == TType.STRUCT:
        element_type[3][1] = element_type[3][0].thrift_spec
    elif element_type[2] in (TType.LIST, TType.SET):
        _fix_list_or_set(element_type[3])
    elif element_type[2] == TType.MAP:
        _fix_map(element_type[3])

A bit ugly, but it should get the job done. There is a pull request for this change up at github.



comments powered by Disqus