Decoding Arnold .ass Files (for fun)

One of my coworkers recently asked me if there is a way to import .ass files back into Houdini after you’ve exported them, similar to what you can do with .usd and Solaris nowadays - We have a large unmanaged geo archive and during some digging they stumbled upon a library of objects which were only exported as .ass files. On first glance there don’t seem to be any native tools for converting/reimporting .ass files, but the files are saved in ASCII, so it can’t be that hard to read back, right?

examining the file

The .ass file is comprised of smaller sections for different parts of the scene, e.g. the camera, lights, shaders, geometry. I’m purely focusing on importing geometry data, specifically the polymesh data type.

As a test I started by exporting a box, and this is what the .ass export spat out:

      
      
        Box
      
    

      
polymesh
{
 name /obj/box/polygons
 matrix
 1 0 0 0
 0 1 0 0
 0 0 1 0
 0 0 0 1
 motion_end 0
 id 13317002
 nsides 6 1 UINT
4 4 4 4 4 4
 vidxs 24 1 b85UINT
B$$-3*%<i&:%s\56%X%l.%WVW2$?cW2
 vlist 8 1 b85VECTOR
89+]caDq9989+]caDq99aDq99!89+]c$$$$(aDq9989+]c89+]c!aDq99$$$$'89+]c!aDq99$$$$'89+]caDq9989+]c89+]caDq99
 smoothing on
 declare _obj_path constant STRING
 _obj_path "/obj/box"
 }

        Click to expand and view more
      

all the necessary mesh data is neatly organized into individual lines - we’ve got the mesh name and path, the transformation matrix, the ID, and the stuff we are looking for, the polygon & vertex information!

nsides - lists how many vertices each polygon in the mesh has
vidxs - assigns each vertex to its position stored in the vlist array
vlist - array of positions per point

Each of these array entries is stored with the following syntax :

name <num_elements> <num_motionblur_keys> <data_type>
<elem1> <elem2> <elem3> <elem4> ...

        Click to expand and view more
      

A couple of notable things we can see in how these array attributes are stored:

vidxs and vlist arrays have different number of elements - the length of vlist corresponds to the number of points and the length of vidxs corresponds to the number of vertices.
the smaller bits of information, like nsides are stored in human-readable UINT format while the longer arrays are encoded - this is marked by the b85 prefix on their data type.

After figuring all of this it’s just the case of writing a quick python script to pull the vertex + transformation info out and load it into 3D software of choice (Houdini in my case). Unless..

b52 encoding

Under the .ass file binary encoding option in the official documentation :

…is used to compress large arrays (bigger than 16) containing float in their components. They are encoded into a more compact ASCII representation (b85), leading to smaller files and faster load times, while still being mostly human-readable. Also, the binary encoding has exact 32-bit precision…

b85 encoding is included in the Python standard library under the base64 module - it encodes 4 bytes of data into 5 ASCII characters - since each of our floats is stored as 32-bit we can decode each tuple of 5 characters as 1 float. Trying to test this very basic logic, I ran into a couple of problems right away:

The length of the encoded data wasn’t divisible by 5 in most cases
The characters used in the encoding didn’t match any encoding map used by the base b85 or any of its variations (zeromq, adobe ascii85, etc.)

I’d like to just point out that I have no experience in encoding data, so all of these things were really confusing to me. Even if I managed to get a result out, it was usually just nan or numbers that didn’t make any sense.

After some more digging, I got extremely lucky and found an .ASS b85 decoder python code written as one of pipe tools for Anima Istanbul . Part of their code is an auto-mapper function which creates a LUT for custom b85 encoding by providing both the decoded and encoded data, and an already-made LUT specifically made for .ass files!

First draft for getting some readable UINT8 data:

        
  

      

        ass_import.py
      

      
      
import numpy as np

LUTS = {
    "arnold": {
        "byte_order": "<", 
        "expansion_char": "!", 
        "special_values": {"$$$$$": "z", "8Fcb9": "y"}, 
        "char_to_int": {"%": 1, "$": 0, "'": 3, "&": 2, ")": 5, "(": 4, "+": 7, "*": 6, "-": 9, ",": 8, "/": 11, ".": 10, "1": 13, "0": 12, "3": 15, "2": 14, "5": 17, "4": 16, "7": 19, "6": 18, "9": 21, "8": 20, ";": 23, ":": 22, "=": 25, "<": 24, "?": 27, ">": 26, "A": 29, "@": 28, "C": 31, "": 30, "E": 33, "D": 32, "G": 35, "F": 34, "I": 37, "H": 36, "K": 39, "J": 38, "M": 41, "L": 40, "O": 43, "N": 42, "Q": 45, "P": 44, "S": 47, "R": 46, "U": 49, "T": 48, "W": 51, "V": 50, "Y": 53, "X": 52, "[": 55, "Z": 54, "]": 57, "\\": 56, "_": 59, "^": 58, "a": 61, "`": 60, "c": 63, "": 62, "e": 65, "d": 64, "g": 67, "f": 66, "i": 69, "h": 68, "k": 71, "j": 70, "m": 73, "l": 72, "o": 75, "n": 74, "q": 77, "p": 76, "s": 79, "r": 78, "u": 81, "t": 80, "w": 83, "v": 82, "x": 84}, 
        "int_to_char": ["$", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", ">", "?", "@", "A", "", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "\\", "]", "^", "_", "`", "a", "", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x"]
    }
}

#Decodes a 5-character block of data
def __b85_decode_block(data, lut, i = 0):
    int_sum = 52200625 * lut[data[i]] + \
            614125 * lut[data[i+1]] + \
            7225 * lut[data[i+2]] + \
            85 * lut[data[i+3]] + \
            lut[data[i+4]]
    return int_sum

def __b85_decode(data, lut):
    out = bytearray()
    #Process data in 5-character blocks
    for i in range(0,len(data),5):
        int_sum = __b85_decode_block(data, lut, i)
        out.extend(int_sum.to_bytes(4, 'little'))
    return out

def test():
    #Test Data - 12x4
    data = "%<_l4%<_l4%<_l4"
    lut = LUTS['arnold']['char_to_int']
    bdata = __b85_decode(data, lut)
    data = np.frombuffer(bytes(bdata),dtype="<u1")
    print(data)

if __name__ == '__main__':
    test()

        Click to expand and view more
      

Note the expansion_char and special_values entries in the LUT - they pointed me to learn about 2 common modifications to b85 encoding to save space:

special_values -> zero/space compression, where a single ASCII character is used to represent a commonly found value, like spaces or zeroes - in .ass files this is 0.0 ($$$$$) and 1.0 (8fcb9)
expansion_char -> expansion characters are used as a ‘for loop’ which duplicates repeating values = let’s say your mesh has half of the UVs set to 0.0, it’d be very unefficient to encode all of them out individually.

Looking up reasons for the non-divisible size of encoded data also taught me about

padding - a process when the last block which doesn’t have enough information to decode - a partial block - is filled with filler characters which keep the individual byte data the same - in our case its the highest possible value for a byte, "x":85.

expansion character logic

I moved forward with using their __b85_encode function for all my decoding, but I quickly found out that even though it resolved the replacement of special values, the expansion character behaviour wasn’t implemented or described in any way.

So I did some experimenting by exporting multiple duplicates of a quad in different orders, until an expansion character started appearing on the nsides array attribute:

      
        11 quads
      
 nsides 11 1 b85UINT
B%<_l4%<_l4%<_l

        Click to expand and view more

      
        12 quads
      
 nsides 12 1 b85UINT
B!%<_l4$$$$'

        Click to expand and view more

      
        13 quads
      
 nsides 13 1 b85UINT
B!%<_l4$$$$'%<

        Click to expand and view more

Ignoring the B at the start (for now), The expansion character ! is followed by:

values to copy - %<_l4 - UINT8 array of [4 4 4 4]
number of copies - $$$$' - 3

You can also see an example of a partial block at the end of 13 quads - %< (with xxx added as padding).

After including all the necessary exceptions, this is the new __b85_decode() function:

        
  

      

        ass_import.py
      

      
      
def __b85_decode(data, lut, special_values=None, exp_char=None):

    #SPECIAL VALUES
    if special_values:
        for key in special_values.keys():
            data = data.replace(special_values[key], key)
    
    #EXPANSION
    if exp_char and exp_char in data:
        i = 0
        while i<len(data):
            if data[i] == exp_char:
                to_copy = data[i+1:i+6]
                copy_val = data[i+6:i+11]
                copy_num = __b85_decode_block(copy_val, lut)
                data = data[:i] + r"".join([to_copy for x in range(copy_num)]) + data[i+11:]
                i += 5*copy_num
            else:
                i+=1
    
    #PADDING
    mod = len(data)%5
    bytes_to_keep = len(data)
    pad = 0
    if mod>0:
        pad = 5-mod
        bytes_to_keep = (len(data) + pad) // 5*4 - pad
        data += "".ljust( pad ,"x")

    #CHECK VALID
    for x in data:
        if x not in lut.keys():
            print(r"Unknown key %s" % x)

    out = bytearray()
    for i in range(0,len(data),5):
        int_sum = __b85_decode_block(data, lut, i)
        out.extend(int_sum.to_bytes(4, 'little'))

    #Skip filler bytes and reverse the padded bit
    if mod>0:
        out = out[:-4] + out[-4+pad:]
    return out

        Click to expand and view more
      

so about 32-bit precision…

After exporting various different meshes and geometry and just straight up staring at the encoded data Matrix style, I noticed a repeating pattern - All the encoded data arrays were prefixed by a single letter, B, F, C or G. Removing this prefix started giving me the right values for data prefixed with B - all the other ones were still outputting random values. After examining the outputted binary data and playing around with duplicating a grid about a 1000 times, I found out that Arnold doesn’t actually store all INTs as 32-bits - if the last vertex index can fit into a smaller bit-size, the encoding changes accordingly:

B - 8-bit precision
F - 10-bit precision
C - 16-bit precision
G - 32-bit precision

These proved to be a little more of a challenge to implement, since all the byte data was stored in reverse (Little Endian), so it required some numpy array reversing, combining and reordering:

        
        ass_import.py
      
def numpy_bit_decode(byte_data, size):
    # 1. Convert bytes to a NumPy array of uint8
    arr = np.frombuffer(byte_data, dtype=np.uint8)
    
    # 2. Unpack bytes into bits (8 bits per byte)
    bits = np.unpackbits(arr)

    # 2.5. Mirror the bits (since we are converting from little Endian)
    bits = bits.reshape(-1, 8)
    bits = np.flip(bits,1)
    bits = bits.reshape(-1)
    #print(len(bits)/8)
    
    # 2.8 Truncate end of every 32-bit chunk
    n_32chunks = len(bits) // 32

    remainder = bits[n_32chunks * 32:]
    bits = bits[:n_32chunks * 32].reshape(-1,32)
    
    trunc_size = 32 // size * size
    #print("trunc size %d" % trunc_size)
    
    bits = bits[:, : trunc_size ]
    bits = bits.reshape(-1)

    bits = np.concatenate([bits, remainder[:trunc_size]])
    #print(len(bits)/8)

    # 3. Trim bits to a multiple of -size- so we can reshape
    n_chunks = len(bits) // size
    bits = bits[:n_chunks * size].reshape(-1, size)
    
    # 3.5. Flip bits back
    bits = np.flip(bits,1)
    
    # 4. Create a "weights" vector: [512, 256, 128, 64, 32, 16, 8, 4, 2, 1]
    powers_of_2 = 2 ** np.arange(size)[::-1]
    
    # 5. Matrix multiplication (dot product) converts bits to decimals
    return bits.dot(powers_of_2)

        Click to expand and view more

Finishing up

After finally getting the correct geometry information, all that was left was a bunch of wrapper functions for taking the geometry arrays, merging them to vectors when needed and importing them into Houdini using their SDK Python API:

...
import hou

piece_geo = hou.Geometry()

pts = extract_data(mesh["vlist"])
primsides = extract_data(mesh["nsides"])
vtxids = extract_data(mesh["vidxs"])

#print(len(pts))
total += len(vtxids)

split_indices = np.cumsum(np.array(primsides)[:-1])
polys = np.split(np.array( [int(x) for x in vtxids] ), split_indices)
polys = [[int(x) for x in y[::-1]] for y in polys]

piece_geo.createPoints(pts)
piece_geo.createPolygons(polys)
...

        Click to expand and view more
      

The final code parses the .ass file, looks for all polymesh primitives and loads them into the Houdini as individual named packed primitives. You can find it here, if you ever need to reimport some .ass files (for whatever reason).

GitHub Link

Once you have the above-linked file saved in your scripts forlder, it’s just a case of importing it and loading the geo using a Python SOP node with 1 spare string parameter file :

import ass_import
from imp import reload
reload(ass_import)

node = hou.pwd()
geo = node.geometry()
file = node.parm("file").evalAsString()
ass_import.load_polymesh(geo,file)

        Click to expand and view more
      

*(reload is there just to force reload if there are any changes made while Houdini is open)

examining the file

b52 encoding

expansion character logic

so about 32-bit precision…

Finishing up

Comments

Start searching

No results found