One of my coworkers recently asked me if there is a way to import .ass files back into Houdini after you’ve exported them, similar to what you can do with .usd and Solaris nowadays - We have a large unmanaged geo archive and during some digging they stumbled upon a library of objects which were only exported as .ass files. On first glance there don’t seem to be any native tools for converting/reimporting .ass files, but the files are saved in ASCII, so it can’t be that hard to read back, right?
examining the file
The .ass file is comprised of smaller sections for different parts of the scene, e.g. the camera, lights, shaders, geometry. I’m purely focusing on importing geometry data, specifically the polymesh data type.
As a test I started by exporting a box, and this is what the .ass export spat out:
polymesh
{
name /obj/box/polygons
matrix
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
motion_end 0
id 13317002
nsides 6 1 UINT
4 4 4 4 4 4
vidxs 24 1 b85UINT
B$$-3*%<i&:%s\56%X%l.%WVW2$?cW2
vlist 8 1 b85VECTOR
89+]caDq9989+]caDq99aDq99!89+]c$$$$(aDq9989+]c89+]c!aDq99$$$$'89+]c!aDq99$$$$'89+]caDq9989+]c89+]caDq99
smoothing on
declare _obj_path constant STRING
_obj_path "/obj/box"
}
all the necessary mesh data is neatly organized into individual lines - we’ve got the mesh name and path, the transformation matrix, the ID, and the stuff we are looking for, the polygon & vertex information!
nsides- lists how many vertices each polygon in the mesh hasvidxs- assigns each vertex to its position stored in thevlistarrayvlist- array of positions per point
Each of these array entries is stored with the following syntax :
name <num_elements> <num_motionblur_keys> <data_type>
<elem1> <elem2> <elem3> <elem4> ...
A couple of notable things we can see in how these array attributes are stored:
vidxsandvlistarrays have different number of elements - the length of vlist corresponds to the number of points and the length of vidxs corresponds to the number of vertices.- the smaller bits of information, like
nsidesare stored in human-readableUINTformat while the longer arrays are encoded - this is marked by theb85prefix on their data type.
After figuring all of this it’s just the case of writing a quick python script to pull the vertex + transformation info out and load it into 3D software of choice (Houdini in my case). Unless..
b52 encoding
Under the .ass file binary encoding option in the official documentation :
…is used to compress large arrays (bigger than 16) containing float in their components. They are encoded into a more compact ASCII representation (b85), leading to smaller files and faster load times, while still being mostly human-readable. Also, the binary encoding has exact 32-bit precision…
b85 encoding is included in the Python standard library under the base64 module - it encodes 4 bytes of data into 5 ASCII characters - since each of our floats is stored as 32-bit we can decode each tuple of 5 characters as 1 float. Trying to test this very basic logic, I ran into a couple of problems right away:
- The length of the encoded data wasn’t divisible by 5 in most cases
- The characters used in the encoding didn’t match any encoding map used by the base
b85or any of its variations (zeromq,adobe ascii85, etc.)
I’d like to just point out that I have no experience in encoding data, so all of these things were really confusing to me. Even if I managed to get a result out, it was usually just nan or numbers that didn’t make any sense.
After some more digging, I got extremely lucky and found an .ASS b85 decoder python code
written as one of pipe tools for Anima Istanbul
. Part of their code is an auto-mapper function which creates a LUT for custom b85 encoding by providing both the decoded and encoded data, and an already-made LUT specifically made for .ass files!
First draft for getting some readable UINT8 data:
import numpy as np
LUTS = {
"arnold": {
"byte_order": "<",
"expansion_char": "!",
"special_values": {"$$$$$": "z", "8Fcb9": "y"},
"char_to_int": {"%": 1, "$": 0, "'": 3, "&": 2, ")": 5, "(": 4, "+": 7, "*": 6, "-": 9, ",": 8, "/": 11, ".": 10, "1": 13, "0": 12, "3": 15, "2": 14, "5": 17, "4": 16, "7": 19, "6": 18, "9": 21, "8": 20, ";": 23, ":": 22, "=": 25, "<": 24, "?": 27, ">": 26, "A": 29, "@": 28, "C": 31, "": 30, "E": 33, "D": 32, "G": 35, "F": 34, "I": 37, "H": 36, "K": 39, "J": 38, "M": 41, "L": 40, "O": 43, "N": 42, "Q": 45, "P": 44, "S": 47, "R": 46, "U": 49, "T": 48, "W": 51, "V": 50, "Y": 53, "X": 52, "[": 55, "Z": 54, "]": 57, "\\": 56, "_": 59, "^": 58, "a": 61, "`": 60, "c": 63, "": 62, "e": 65, "d": 64, "g": 67, "f": 66, "i": 69, "h": 68, "k": 71, "j": 70, "m": 73, "l": 72, "o": 75, "n": 74, "q": 77, "p": 76, "s": 79, "r": 78, "u": 81, "t": 80, "w": 83, "v": 82, "x": 84},
"int_to_char": ["$", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", ">", "?", "@", "A", "", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "\\", "]", "^", "_", "`", "a", "", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x"]
}
}
#Decodes a 5-character block of data
def __b85_decode_block(data, lut, i = 0):
int_sum = 52200625 * lut[data[i]] + \
614125 * lut[data[i+1]] + \
7225 * lut[data[i+2]] + \
85 * lut[data[i+3]] + \
lut[data[i+4]]
return int_sum
def __b85_decode(data, lut):
out = bytearray()
#Process data in 5-character blocks
for i in range(0,len(data),5):
int_sum = __b85_decode_block(data, lut, i)
out.extend(int_sum.to_bytes(4, 'little'))
return out
def test():
#Test Data - 12x4
data = "%<_l4%<_l4%<_l4"
lut = LUTS['arnold']['char_to_int']
bdata = __b85_decode(data, lut)
data = np.frombuffer(bytes(bdata),dtype="<u1")
print(data)
if __name__ == '__main__':
test()Note the expansion_char and special_values entries in the LUT - they pointed me to learn about 2 common modifications to b85 encoding to save space:
special_values-> zero/space compression, where a single ASCII character is used to represent a commonly found value, like spaces or zeroes - in .ass files this is 0.0 ($$$$$) and 1.0 (8fcb9)expansion_char-> expansion characters are used as a ‘for loop’ which duplicates repeating values = let’s say your mesh has half of the UVs set to 0.0, it’d be very unefficient to encode all of them out individually.
Looking up reasons for the non-divisible size of encoded data also taught me about
- padding - a process when the last block which doesn’t have enough information to decode - a partial block - is filled with filler characters which keep the individual byte data the same - in our case its the highest possible value for a byte,
"x":85.
expansion character logic
I moved forward with using their __b85_encode function for all my decoding, but I quickly found out that even though it resolved the replacement of special values, the expansion character behaviour wasn’t implemented or described in any way.
So I did some experimenting by exporting multiple duplicates of a quad in different orders, until an expansion character started appearing on the nsides array attribute:
nsides 11 1 b85UINT
B%<_l4%<_l4%<_l nsides 12 1 b85UINT
B!%<_l4$$$$' nsides 13 1 b85UINT
B!%<_l4$$$$'%<Ignoring the B at the start (for now), The expansion character ! is followed by:
- values to copy -
%<_l4- UINT8 array of[4 4 4 4] - number of copies -
$$$$'- 3
You can also see an example of a partial block at the end of 13 quads - %< (with xxx added as padding).
After including all the necessary exceptions, this is the new __b85_decode() function:
def __b85_decode(data, lut, special_values=None, exp_char=None):
#SPECIAL VALUES
if special_values:
for key in special_values.keys():
data = data.replace(special_values[key], key)
#EXPANSION
if exp_char and exp_char in data:
i = 0
while i<len(data):
if data[i] == exp_char:
to_copy = data[i+1:i+6]
copy_val = data[i+6:i+11]
copy_num = __b85_decode_block(copy_val, lut)
data = data[:i] + r"".join([to_copy for x in range(copy_num)]) + data[i+11:]
i += 5*copy_num
else:
i+=1
#PADDING
mod = len(data)%5
bytes_to_keep = len(data)
pad = 0
if mod>0:
pad = 5-mod
bytes_to_keep = (len(data) + pad) // 5*4 - pad
data += "".ljust( pad ,"x")
#CHECK VALID
for x in data:
if x not in lut.keys():
print(r"Unknown key %s" % x)
out = bytearray()
for i in range(0,len(data),5):
int_sum = __b85_decode_block(data, lut, i)
out.extend(int_sum.to_bytes(4, 'little'))
#Skip filler bytes and reverse the padded bit
if mod>0:
out = out[:-4] + out[-4+pad:]
return outso about 32-bit precision…
After exporting various different meshes and geometry and just straight up staring at the encoded data Matrix style, I noticed a repeating pattern - All the encoded data arrays were prefixed by a single letter, B, F, C or G. Removing this prefix started giving me the right values for data prefixed with B - all the other ones were still outputting random values. After examining the outputted binary data and playing around with duplicating a grid about a 1000 times, I found out that Arnold doesn’t actually store all INTs as 32-bits - if the last vertex index can fit into a smaller bit-size, the encoding changes accordingly:
B- 8-bit precisionF- 10-bit precisionC- 16-bit precisionG- 32-bit precision
These proved to be a little more of a challenge to implement, since all the byte data was stored in reverse (Little Endian), so it required some numpy array reversing, combining and reordering:
def numpy_bit_decode(byte_data, size):
# 1. Convert bytes to a NumPy array of uint8
arr = np.frombuffer(byte_data, dtype=np.uint8)
# 2. Unpack bytes into bits (8 bits per byte)
bits = np.unpackbits(arr)
# 2.5. Mirror the bits (since we are converting from little Endian)
bits = bits.reshape(-1, 8)
bits = np.flip(bits,1)
bits = bits.reshape(-1)
#print(len(bits)/8)
# 2.8 Truncate end of every 32-bit chunk
n_32chunks = len(bits) // 32
remainder = bits[n_32chunks * 32:]
bits = bits[:n_32chunks * 32].reshape(-1,32)
trunc_size = 32 // size * size
#print("trunc size %d" % trunc_size)
bits = bits[:, : trunc_size ]
bits = bits.reshape(-1)
bits = np.concatenate([bits, remainder[:trunc_size]])
#print(len(bits)/8)
# 3. Trim bits to a multiple of -size- so we can reshape
n_chunks = len(bits) // size
bits = bits[:n_chunks * size].reshape(-1, size)
# 3.5. Flip bits back
bits = np.flip(bits,1)
# 4. Create a "weights" vector: [512, 256, 128, 64, 32, 16, 8, 4, 2, 1]
powers_of_2 = 2 ** np.arange(size)[::-1]
# 5. Matrix multiplication (dot product) converts bits to decimals
return bits.dot(powers_of_2)Finishing up
After finally getting the correct geometry information, all that was left was a bunch of wrapper functions for taking the geometry arrays, merging them to vectors when needed and importing them into Houdini using their SDK Python API:
...
import hou
piece_geo = hou.Geometry()
pts = extract_data(mesh["vlist"])
primsides = extract_data(mesh["nsides"])
vtxids = extract_data(mesh["vidxs"])
#print(len(pts))
total += len(vtxids)
split_indices = np.cumsum(np.array(primsides)[:-1])
polys = np.split(np.array( [int(x) for x in vtxids] ), split_indices)
polys = [[int(x) for x in y[::-1]] for y in polys]
piece_geo.createPoints(pts)
piece_geo.createPolygons(polys)
...The final code parses the .ass file, looks for all polymesh primitives and loads them into the Houdini as individual named packed primitives. You can find it here, if you ever need to reimport some .ass files (for whatever reason).
Once you have the above-linked file saved in your scripts forlder, it’s just a case of importing it and loading the geo using a Python SOP node with 1 spare string parameter file :
import ass_import
from imp import reload
reload(ass_import)
node = hou.pwd()
geo = node.geometry()
file = node.parm("file").evalAsString()
ass_import.load_polymesh(geo,file)*(reload is there just to force reload if there are any changes made while Houdini is open)
Comments