Vanilla 1.1.9 is a product of Lussumo. More Information: Documentation, Community Support.

    • CommentAuthorNick Hawk
    • CommentTimeFeb 1st 2015
    Here's how deep learning works.

    It's quite similar to how I imagine our brains learn. Training on abstract representations (hidden layers) seems to be what we do. At least that's my intuition.

    I have hundreds of INNS journals sitting around here, which I used to read religiously monthly. Now just dust in the wind.
    hinton just threw me off into a space i used to inhabit namely efficient representations and watermarking and video compression and the like. imagine you have an image for which a novel compression algorithm occurs to you. you wish to exploit similar pieces of the image. most if not all modern compression will simply recompress the exact same thing twice. we can do better.

    (it's called vector quantisation actually, and the only well-known codec to use it was cinepak, put out by apple, which was the best game in town until mpeg). invented by a crazy talented ozzie called peter bennett and i got hold of the source code while at radius and made it heaps better. heaps.)

    so you have a number of "mosaics" with which the entire image can be described to an accuracy you deem reasonable. then the pic consists of a list of mosaic indices. and that's it (sort of). why am i even talking about this stuff? well, because of what comes next --- the best way to number the mosaics so as to reduce the total number of bits in the mosaic index list.

    it's not hard like most things once you know what's up. you can read books on number theory and suchlike but it really comes down in the end to plain old common sense.

    the dumbest thing you could do is to assign the biggest numbers to the most frequently occurring indices. so we do the opposite. but first, do we shift the number scale so that half go negative and half positive, to make numbers in average half their magnitude? it hardly matters. win a bit, lose a bit for sign. so that's dumb too.

    we assign indices (simply a pointer reassignment) in order of decreasing frequency. like a huffman code. now we're done.
    So _that's_ why my compressed videos look like shift.
    • CommentAuthorLakes
    • CommentTimeMay 1st 2015
    You need a 640 x 480 monitor for best viewing of those. ;)
    • CommentTimeMay 1st 2015
    Are the mosaics sent with the video like the dictionary is sent with huffman? or is it hardcoded into a codec? (judging from what you've said and what is on Wiki I assume it's not built up like LZW)
    The dictionary is dynamic in that it's updated regularly. But rather than being embedded in the compressed stream, which would worsen the compression ratio, its creation instructions are embedded instead and the decoder creates it from these metadata on the fly.
    • CommentTimeMay 1st 2015
    Ah so it's more like LZW in that respect.
    It used k-means clustering into Voronoi cells. There has been some recent work on clustering algorithms that suggests that this could be improved.

    My enhancements never made it into the public domain. These included about 6 dB of compression performance gain (PSNR), and addressing the sloppiness in the datarate resulting in overflow of the tiny CD buffers then in use. I was able to guarantee a data rate of +/- 1 bps.
    That'll just barely buy you a pint in the Nominal Bunch.
    Tough Crowd.
    But I'm here all week.

    Da Man, more recently.
    Sounds like Dawkins doesn't he? (but ever so slightly less posh)
    Ah... this used to be my topic, exactly. (I studied under Alan Kawamoto at UCSC, among others.) Now I'm not sure I can get into it any more.