Personal Programming Notes

To err is human; to debug, divine.

Convert Python Objects to JSON

In this post, we looks into converting a plain, simple Python object into JSON. JSON serialization in Java is also provided as an example. In the following post, we will look into a more advanced method of conversion with attributes pretty-printed in order, like in the Java example.

JSON serialization in Java

In Java, it is pretty straight-forward to convert Java objects (POJO) to JSON using Jackson library. The following code will convert an example POJO to JSON:

Example POJO
1
2
3
4
5
6
7
public class Config {
  public String type;
  public String host;
  public String user;
  public String password;
  public String url;
}
Jackson examples
1
2
3
4
5
6
7
8
9
10
11
12
13
ObjectMapper mapper = new ObjectMapper();
Config conn = new Config();
conn.type = "hive";
conn.host = "192.168.5.184";
conn.user = "cloudera";
conn.password = "password";
conn.url = "jdbc:hive2://192.168.5.184:10000/DWH";

// POJO to JSON in file
mapper.writeValue(new File("config.json"), obj);
// POJO to JSON in String
String jsonInString = mapper.writerWithDefaultPrettyPrinter()
      .writeValueAsString(conn);

The JSON output is shown below. Note that the keys (e.g., “type”, “host”) appear in the same order as defined in the Config class. This will become important later when we try to convert Python objects to JSON.

JSON representation of Config object
1
2
3
4
5
6
7
{
  "type" : "hive",
  "host" : "192.168.5.184",
  "user" : "cloudera",
  "password" : "password",
  "url" : "jdbc:hive2://192.168.5.184:10000/DWH"
}

JSON serialization in Python

In Python, we have json module to convert a serializable object to JSON format. The first attempt at JSON serialization in Python may look like this, with a slightly complex Python object is intentionally used as an example:

First attempt at JSON serialization
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
class Config(object):
    pass


def get_hive_config():
    """ Get pre-defined Hive configuration.

    :return: Config object for Hive.
    """

    conn = Config()
    conn.type = "hive"
    conn.host = "192.168.5.184"
    conn.user = "cloudera"
    conn.password = "password"
    conn.url = "jdbc:hive2://192.168.5.184:10000/DWH"

    return conn


def get_vertica_config():
    """ Get pre-defined Vertica configuration.

    :return: Config object for Vertica.
    """

    conn = Config()
    conn.type = "vertica"
    conn.host = "192.168.5.174"
    conn.user = "dbadmin"
    conn.password = "password"
    conn.url = "jdbc:vertica://192.168.5.174:5433/VMart"

    return conn


def create_config_file(filename, query_generator):

    hive_source = get_hive_config()
    vertica_target = get_vertica_config()

    config = Config()
    config.source = hive_source
    config.target = vertica_target
    config.testName = "count"
    config.queries = query_generator

    with open(filename, 'w') as config_file:
        json.dump(config, config_file)


def main():

    FILE_NAME = "hive_vertica_count.json"
    query_generator = generate_count_queries()
    create_config_file(FILE_NAME, query_generator)

This first attempt with json.dump(config, config_file) will fail with the following error:

JSON serialization error
1
TypeError: <__main__.Config object at 0x10ab824d0> is not JSON serializable

As the message indicates, Config object is not JSON serializable. json.dump function expects a serializable object such as one of Python standard object types (see Python to JSON mapping table below) or their subclasses.

Python JSON
dict object
list, tuple array
str, unicode string
int, long, float number
True true
False false
None null


The solution for that problem is to specify the default parameter with a function that returns object’s __dict__ attribute. __dict__ is the internal attribute dictionary that contains all attributes associated with an object. Object attribute references are translated to lookups in this dictionary, e.g., o.x is translated to o.__dict__["x"].

Correct options
1
2
with open(filename, 'w') as config_file:
    json.dump(config, config_file, default=vars, indent=4)
Pretty print without ordering
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
    "source": {
        "url": "jdbc:hive2://192.168.5.184:10000/DWH",
        "host": "192.168.5.184",
        "password": "password",
        "type": "hive",
        "user": "cloudera"
    },
    "queries": "...",
    "target": {
        "url": "jdbc:vertica://192.168.5.174:5433/VMart",
        "host": "192.168.5.174",
        "password": "password",
        "type": "vertica",
        "user": "dbadmin"
    },
    "testName": "count"
}

Here, we use vars built-in function to retrieve the object’s __dict__ attribute. Note that simply using json.dump(vars(config), config_file) will NOT work if any attribute of the object is another complex object (e.g., source and target attributes in this example). For more complex objects such as those include sets, we may have to define our own Encoder that extends json.JSONEncoder and provide it to json.dump function. The next post will discuss how to print keys in order of which they are defined, like in the Java example.