Skip to main content

Mike Sugarbaker

Why AI audio is a different ballgame

5 min read

(Sorry I'm late.)

In case you missed it, the latest thing we're calling AI (or Machine Learning or whatever) is ethically very problematic! A legal case has been brought by a few professional illustrators who have had their hard-won, marketable styles straight-up ganked by text-to-image processors. While the second most discoursed-about ML-generation medium, that of text, is not criticized as often on an intellectual-property basis as on its tendency to present statistically-likely nonsense as authoritative-sounding truth, I still worry about its contribution to AI's brand in general as ripping off artists and creators who most often weren't even given opportunity to opt out.

Why would I be worried personally? Because I'm enchanted by the sounds of Dance Diffusion, an application of the Stable Diffusion method to the generation of audio. Like its image-creating sibling, Dance Diffusion starts either from noise - fully random data - or from some starting data with noise transparently overlaid. It then "denoises" the data toward what its model says is likely. As with image generators, giving the model some starting data can result in what's known as "style transfer" - rendering the preexisting content in a style closer to the model, while keeping the fundamentals of the starting point intact. This is, for the most part, how I've been using DD - to do things like ask a model trained on nothing but drum solos to transform a clip from a rap vocal, and other such. (Models that do text-to-sound generation, or creation of audio from written prompts, are beginning to emerge as of this writing but are still fairly limited.)

I have mostly done my style transfers with pre-made models trained on relatively small sets of recordings. My favorite results have come from a model called "unlocked-250k," which gets its first name from its training data, the Unlocked Recordings collection on the Internet Archive. This collection, despite what one might assume from its presence on IA for free download, is mostly under copyright, and not under any sort of unusually permissive license. (None of it is "in print" or otherwise commercially available.) So why are these recordings here, in this model that's in the default model selection for this tool? How does it keep not occurring to nerds that this is a problem?

But here's the thing: when it comes to music, we already have the tools to deal with this situation. They're called ASCAP and BMI. In fact, these tools were created in response to a nearly identical problem: technological changes to the way musicians' IP is distributed, which made said distribution much more indirect and, uh, diffuse.

I doubt these institutions are perfect - I'm certainly not at a point in my music career where I'm in a position to need to know lots about them. (Also I'm completely eliding the issue of voice actors, for whom style transfer is already becoming a threat - but they do have a union!) But I bet they could be talked into handling artists' contributions to the statistical probability of a piece of music's direction and form as it evolves out of noise. Those contributions are individually small, but we generally know exactly what group of artists ought to be getting them, for which recordings. And in the case of very composed starting data, like the hummed basslines and melody fragments I've been making so I can transform them into weird blurts of orchestra, the songwriting is not actually in the picture in the final product (there's some mechanism that gets royalties to arrangers and producers, but I don't know that it's one of the same ones). So I'd expect what users of audio AIs end up paying, as royalties or fees, wouldn't be as large as if you sample something outright. Everybody wins!

I can think of a number of things to quibble back and forth about in such an arrangement (what about consent? You don't get to opt out of having your song played on the radio; is this similar?), but the point is this sort of problem can be understood and handled, and appropriate institutions can be created to attempt to deal with it. Is it conceivable that visual artists could respond to AI by forming a similar layer of institutions? I doubt they have the muscle, plus maybe there is too much work-for-hire happening in illustration, compared to recorded music, to make that approach make sense. But it all puts me in mind of Elinor Ostrom's work debunking (before it was published??!?!) the so-called tragedy of the commons. Everything stays complicated about human beings working together, but invested people with a commitment to each other can work out creative solutions. That gives us an alternative to the simple, absolute cancellation of an entire, fascinating line of research and activity. I hope we take it, in whatever form.

Mike Sugarbaker

Oh hey, it's getting toward time for my annual blog post

Mike Sugarbaker

What is web abolition?

6 min read

Let's do our part for internet content by starting with someone else's Twitter thread:


For several months now I've been having these periodic waves of compulsion to build a web site. I don't even know exactly what web site. Something nostalgic, just for fun; something that helps people create a world made of writing, like the weird, failed alt-wikipedias that obsessed me in my early twenties. Something game-like, maybe... or just a literal game? I've started to cut code for a few of these ideas (I'll still write code, as it turns out, just not for anyone else), but more often I've ruled them out or lost steam before even getting started. Any single one of them would fail to be what I want, which is a weird fever-dream amalgam of all of them.

Hopefully this has all been an extinction burst. The truth is, the web's done. It's best considered legacy now, all of it. This is the oldest of news to zoomers, who've never seen it as a virtuality to explore, but as the place where most forms of bureaucracy live (plus there's a Facebook interface there, I guess, but you'd only use it if you had to). The future almost certainly lies in networks that are roughly monkeysphere-sized, made not for the whole world but for a semi-gated group of people, who are probably connected by some context that isn't solely an online one; Slack or Discord is the paradigm, although given recent events, it sure seems worth looking at open source alternatives.

For a lot longer than several months, I've been involved with efforts to look past the web infrastructure we've got, towards something more useful and less costly and harmful. For a while, this meant the so-called IndieWeb, a collection of efforts to get the things we like about social media back into independent blogs. Then it meant the more radical fixes of the peer-to-peer web. Recently I've had my eye out for entirely new protocols - fresh takes on something like gopher, maybe with some of the interactivity of HTTP, and hopefully some of the writeability that was supposed to be a part of browsers from the beginning. That would be awesome, I've been thinking. Sort of.

And lo, here is Gemini, a new protocol which is almost those things. It was created by a few coders who were still hanging out on Gopher, which alone pleases my soul and gives me that feeling of good-hearted willful obscurity that goes with a lot of my favorite subcultures over the course of my life. It cuts a lot of features out of the web, which is a good and productive way to proceed, but writeability and interactivity are among them, sacrificed for user privacy. (They think the fact that you can send server queries that are rich enough to let you edit the web without editing HTML is inherently tied to everything that's gone wrong! What if they're right? What are my weird, dying dreams of textscapes in that case?) Gemini is like the IndieWeb, and like Beaker turned out to be, in that it doesn't offer anything that an average user, who seems pretty unmoved by privacy and such if their continued enthusiasm for Facebook is any hint, will actually experience as better - as motivation to switch. But the virtuality and exploreability of Gemini is delightful, if not strictly necessary anymore. (I recommend Lagrange.)

So what is a movement toward realizing the web as decaying infrastructure, something whose main verb is "crumble," not "connect"? I envision something like a Matrix client with a web browser bolted onto its side, the way that early versions of Netscape also supported gopher. That web browser would include JavaScript blocking and image blocking by default, sandbox all domains from each other's cookies, and possibly make use of gateway servers to further insulate the user. I'm sure that doesn't cover everything, but you get the idea.

At this late date, we have the gift of knowing what people want from the web, so we could make our client a lot less generic, with features that support RSS, photo feeds, maybe even some IndieWeb-style distributed social stuff. Or what the hell, bolt a Facebook client on there too! Their terms of service don't forbid this from what I can tell (Twitter's do, but Twitter should be abandoned, not embraced and extended). All to the point of positioning the web as something that isn't the center, for those few of us left who need help with that.

I admit I've also been trying idly to think of ways to augment places like Slack and Discord; something that a virtual world, visual or textual, can offer these groups of people... but most likely, my nerd scenes only ever valued such virtualities because we lacked the real-world context these new chat spaces have. Once your community is real and online both, the only adventurous journey that motivates you is... unionizing? Or in the case of non-corporate Slacks, maybe abolishing cops and landlords. If you want something to explore, look out on the streets.

So there's your nice tall glass of goodbye to all that. Don't get me wrong, the web isn't going away or anything. We're stuck with it, just like it itself is stuck with HTML 5 and JavaScript, 25-year-old technologies that were largely intended for other uses (thanks to the Twitter widget above, I have had to add <P> tags to this post manually). Where else are Google results going to come from but a hundred million blogs and forums that just won't quite die? Nothing dies on the internet. Not even Gopher died. If the Internet Archive were a company, I'd want to invest in it;* it's the only operation that feels relevant to the future of "new media," and it's specifically all about its past.

(A fun inside-baseball place to go from this conclusion: when do we short Google? Not immediately, for sure, but it's starting to look a little bit dead-man-walking, like how Yahoo looked ten years ago. Its only hope as a company is either to divest from search almost all the way, or else to really double down on owning the web even more than it does now. Hilariously, one of the best ways to do the latter would be to give direct financial and rhetorical support to the various IndieWeb initiatives. The fact that this will never occur to them would be the most damning evidence that they're in their decline as a company, if it weren't eclipsed by their massive failure to deal on any level with diversity and the general footgun shootout of their corporate culture.)

* You can invest in the sense of donating, which I very much encourage, if you still have the means after donating to things that are much more humanly material.

Mike Sugarbaker

- I don't have anything to say to you people
- I have learned a great deal about the limitations of jokes as a way of relating to strangers but I'm still leaving the above
- I'm probably an asymptomatic carrier
- I still haven't figured out what I'm going to do besides programming
- If I spent as much time playing Dreams as I do watching Dreams streamers I'd be as good at Dreams as I wish I was
- Defund the police
- Abolish the web
- I'm kidding about that last one
- Sort of

Mike Sugarbaker

Some bullet points:

- Still not dead
- Playing Dreams pretty hard. Come check me at https://twitter.com/WoodsenseStudio
- Not a programmer anymore
- Maybe writing a book about roleplaying? We'll see

Mike Sugarbaker

Not dead, just stopped tweeting from here, then stopped tweeting entirely. I own my own Mastodon instance (theha.us) so I’m not going to bother with syndicating to it from here.

If/when I decide what book I’m writing, I’ll post about it here. Or if I make that giant, crazy web site.

Mike Sugarbaker

A new illustrated podcast from the creators of Children's Hour of Knowledge. https://www.youtube.com/watch?v=q_dV3w9ehbU

Mike Sugarbaker

Slow week for Danny - refactoring, probly lots of bugs - but you can use Markdown now! github.com/misuba/Danny dat://danny-tutorial-misuba.hashbase.io/

Mike Sugarbaker

Things are still 0.1 as foretold but here is a wee adventure to get you started: dat://danny-tutorial-misuba.hashbase.io/

Mike Sugarbaker

I dropped a couple of hints about this, but here's v0.1 of my Beaker app, a contemporary take on HyperCard. https://github.com/misuba/Danny