It's pretty rude of OpenAI to make their use of your content opt-out

hiddedevries.nl Blog

OpenAI, the company that makes ChatGPT, now offers a way for websites to opt out of its crawler. By default, it will just use web content as it sees fit. How rude!

The opt-out works by adding a Disallow directive for the GPTBot User Agent in your robots.txt. The GPTBot docs say:

Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety.

I get the goal of optimising AI models for accuracy and capabilities, but I don't see why it would be ok for these “AI” companies to just take whatever content they want. Maybe your local bakery's goal is to sell tastier croissants. Reasonable goal. Now, can they steal croissants from other companies that make tasty croissants, unless those companies opt out? I guess few people would answer ‘yes’?

Google previously got into legal trouble for their somewhat dubious practice of displaying headlines and snippers from newspaper's articles. It seems reasonable to reuse content when referring to it, at least headlines, most websites do that. Google does it with sources displayed and has links to the original. ChatGPT has neither, which makes their stealing (or reusing) especially problematic.

Taking other people's writing should be an opt-in and probably paid for (even if makers of AI don't think so). The fact that this needs to be said and isn't, say, the status quo, tells me that companies like OpenAI don't see much value in writing or writers. To deploy this software in the way they have, shows a fundamental misunderstanding of the value of arts. As someone who loves reading and writing, that concerns me. OpenAI have enormous funds that they choose to spend on things and not other things.

It is in the very nature of LLMs that very large amounts of content are needed for them to be trained. Opt-in makes that difficult, because it would mean not having a lot of the training content required for the product's functioning. Payment makes that expensive, because if it's lots of content, that means it would cost lots of money. But hey, such difficulties and costs aren't the problem of content writers. OpenAI's use of opt-out instead of opt-in unjustifyably makes it their problem.

For that reason alone, I think the only fair LLMs would the ones trained on ‘own’ content, like a documentation site that offers a chatbot-route into its content in addition to the main affair (an approach that is still risky for numerous other reasons).

Originally posted as It's pretty rude of OpenAI to make their use of your content opt-out on Hidde's blog.

Reply via email

tech
blogs

I had a great time at DEF CON 31

Christine Dodrill's Blog

tech
blogs

Notes on using a single-person Mastodon server

Julia Evans

I started using Mastodon back in November, and it’s the Twitter alternative where I’ve been spending most of my time recently, mostly because the Fediverse is where a lot of the Linux nerds seem to be right now.

I’ve found Mastodon quite a bit more confusing than Twitter because it’s a distributed system, so here are a few technical things I’ve learned about it over the last 10 months. I’ll mostly talk about what using a single-person server has been like for me, as well as a couple of notes about the API, DMs and ActivityPub.

I might have made some mistakes, please let me know if I’ve gotten anything wrong!

what’s a mastodon instance?

First: Mastodon is a decentralized collection of independently run servers instead of One Big Server. The software is open source.

In general, if you have an account on one server (like ruby.social), you can follow people on another server (like hachyderm.io), and they can follow you.

I’m going to use the terms “Mastodon server” and “Mastodon instance” interchangeably in this post.

on choosing a Mastodon instance

These were the things I was concerned about when choosing an instance:

An instance name that I was comfortable being part of my online identity. For example, I probably wouldn’t want to be @b0rk@infosec.exchange because I’m not an infosec person.
The server’s stability. Most servers are volunteer-run, and volunteer moderation work can be exhausting – will the server really be around in a few years? For example mastodon.technology and mastodon.lol shut down.
The admins’ moderation policies.
That server’s general reputation with other servers. I started out on mastodon.social, but some servers choose to block or limit mastodon.social for various reasons
The community: every Mastodon instance has a local timeline with all posts from users on that instance, would I be interested in reading the local timeline?
Whether my account would be a burden for the admin of that server (since I have a lot of followers)

In the end, I chose to run my own mastodon server because it seemed simplest – I could pick a domain I liked, and I knew I’d definitely agree with the moderation decisions because I’d be in charge.

I’m not going to give server recommendations here, but here’s a list of the top 200 most common servers people who follow me use.

using your own domain

One big thing I wondered was – can I use my own domain (and have the username @b0rk@jvns.ca or something) but be on someone else’s Mastodon server?

The answer to this seems to be basically “no”: if you want to use your own domain on Mastodon, you need to run your own server. (you can kind of do this, but it’s more like an alias or redirect – if I used that method to direct b0rk@jvns.ca to b0rk@mastodon.social, my posts would still show up as being from b0rk@mastodon.social)

There’s also other ActivityPub software (Takahē) that supports people bringing their own domain in a first-class way.

notes on having my own server

I really wanted to have a way to use my own domain name for identity, but to share server hosting costs with other people. This isn’t possible on Mastodon right now, so I decided to set up my own server instead.

I chose to run a Mastodon server (instead of some other ActivityPub implementation) because Mastodon is the most popular one. Good managed Mastodon hosting is readily available, there are tons of options for client apps, and I know for sure that my server will work well with other people’s servers.

I use masto.host for Mastodon hosting, and it’s been great so far. I have nothing interesting to say about what it’s like to operate a Mastodon instance because I know literally nothing about it. Masto.host handles all of the server administration and Mastodon updates, and I never think about it at all.

Right now I’m on their $19/month (“Star”) plan, but it’s possible I could use a smaller plan with no problems. Right now their cheapest plan is $6/month and I expect that would be fine for someone with a smaller account.

Some things I was worried about when embarking on my own Mastodon server:

I wanted to run the server at social.jvns.ca, but I wanted my username to be b0rk@jvns.ca instead of b0rk@social.jvns.ca. To get this to work I followed these Setting up a personal fediverse ID directions from Jacob Kaplan-Moss and it’s been fine.
The administration burden of running my own server. I imported a small list of servers to block/defederate from but didn’t do anything else. That’s been fine.
Reply and profile visibility. This has been annoying and we’ll talk about it next

downsides to being on a single-person server

Being on a 1-person server has some significant downsides. To understand why, you need to understand a little about how Mastodon works.

Every Mastodon server has a database of posts. Servers only have posts that they were explicitly sent by another server in their database.

Some reasons that servers might receive posts:

someone on the server follows a user
a post mentions someone on the server

As a 1-person server, my server does not receive that many posts! I only get posts from people I follow or posts that explicitly mention me in some way.

The causes several problems:

when I visit someone’s profile on Mastodon who I don’t already follow, my server will not fetch the profile’s content (it’ll fetch their profile picture, description, and pinned posts, but not any of their post history). So their profile appears as if they’ve never posted anything
bad reply visibility: when I look at the replies to somebody else’s post (even if I follow them!), I don’t see all of the replies, only the ones which have made it to my server. If you want to understand the exact rules about who can see which replies (which are quite complicated!), here’s a great deep dive by Sebastian Jambor. I think it’s possible to end up in a state where no one person can see all of the replies, including the original poster.
favourite and boost accounts are inaccurate – usually posts show up having at most 1 or 2 favourites / boosts, even if the post was actually favourite or boosted hundreds of times. I think this is because it only counts favourites/boosts from people I follow.

All of these things will happen to users of any small Mastodon server, not just 1-person servers.

bad reply visibility makes conversations harder

A lot of people are on smaller servers, so when they’re participating in a conversation, they can’t see all the replies to the post.

This means that replies can get pretty repetitive because people literally cannot see each other’s replies. This is especially annoying for posts that are popular or controversial, because the person who made the post has to keep reading similar replies over and over again by people who think they’re making the point for the first time.

To get around this (as a reader), you can click “open link to post” or something in your Mastodon client, which will open up the page on the poster’s server where you can read all of the replies. It’s pretty annoying though.

As a poster, I’ve tried to reduce repetitiveness in replies by:

putting requests in my posts like “(no need to reply if you don’t remember, or if you’ve been using the command line comfortably for 15 years — this question isn’t for you :) )”
occasionally editing my posts to include very common replies
very occasionally deleting the post if it gets too out of hand

The Mastodon devs are extremely aware of these issues, there are a bunch of github issues about them:

My guess is that there are technical reasons these features are difficult to add because those issues have been open for 5-7 years.

The Mastodon devs have said that they plan to improve reply fetching, but that it requires a significant amount of work.

some visibility workarounds

Some people have built workarounds for fetching profiles / replies.

Also, there are a couple of Mastodon clients which will proactively fetch replies. For iOS:

Mammoth does it automatically
Mona will fetch posts if I click “load from remote server” manually

I haven’t tried those yet though.

other downsides of running your own server: discovery is much harder

Mastodon instances have a “local timeline” where you can see everything other people on the server are posting, and a “federated timeline” which shows sort of a combined feed from everyone followed by anyone on the server. This means that you can see trending posts and get an idea of what’s going on and find people to follow. You don’t get that if you’re on a 1-person server – it’s just me talking to myself! (plus occasional interjections from my reruns bot).

Some workarounds people mentioned for this:

you can populate your federated timeline with posts from another instance by using a relay. I haven’t done this but someone else said they use FediBuzz and I might try it out.
some mastodon clients (like apparently Moshidon on Android) let you follow other instances

If anyone else on small servers has suggestions for how to make discovery easier I’d love to hear them.

account migration

When I moved to my own server from mastodon.social, I needed to run an account migration to move over my followers. First, here’s how migration works:

Account migration does not move over your posts. All of my posts stayed on my old account. This is part of why I moved to running my own server – I didn’t want to ever lose my posts a second time.
Account migration does not move over the list of people you follow/mute/block. But you can import/export that list in your Mastodon settings so it’s not a big deal. If you follow private accounts they’ll have to re-approve your follow request.
Account migration does move over your followers

The follower move was the part I was most worried about. Here’s how it turned out:

over ~24 hours, most of my followers moved to the new account
one or two servers did not get the message about the account migration for some reason, so about 2000 followers were “stuck” and didn’t migrate. I fixed this by waiting 30 days and re-running the account migration, which moved over most of the remaining followers. There’s also a tootctl command that the admin of the old instance can run to retry the migration
about 200 of my followers never migrated over, I think because they’re using ActivityPub software other than Mastodon which doesn’t support account migration. You can see the old account here

using the Mastodon API is great

One thing I love about Mastodon is – it has an API that’s MUCH easier to use than Twitter’s API. I’ve always been frustrated with how difficult it is to navigate large Twitter threads, so I made a small mastodon thread view website that lets you log into your Mastodon account. It’s pretty janky and it’s really only made for me to use, but I’ve really appreciated the ability to write my own janky software to improve my Mastodon experience.

Some notes on the Mastodon API:

You can build Mastodon client software totally on the frontend in Javascript, which is really cool.
I couldn’t find a vanilla Javascript Mastodon client, so I wrote a crappy one
API docs are here
Here’s a tiny Python script I used to list all my Mastodon followers, which also serves as a simple example of how easy using the API is.
The best documentation I could find for which OAuth scopes correspond to which API endpoints is this github issue

Next I’ll talk about a few general things about Mastodon that confused or surprised me that aren’t specific to being on a single-person instance.

DMs are weird

The way Mastodon DMs work surprised me in a few ways:

Technically DMs are just regular posts with visibility limited to the people mentioned in the post. This means that if you accidentally mention someone in a DM (“@x is such a jerk”), it’s possible to accidentally send the message to them
DMs aren’t very private: the admins on the sending and receiving servers can technically read your DMs if they have access to the database. So they’re not appropriate for sensitive information.
Turning off DMs is weird. Personally I don’t like receiving DMs from strangers – it’s too much to keep track of and I’d prefer that people email me. On Twitter, I can just turn it off and people won’t see an option to DM me. But on Mastodon, when I turn off notifications for DMs, anyone can still “DM” me, but the message will go into a black hole and I’ll never see it. I put a note in my profile about this.

defederation and limiting

There are a couple of different ways for a server to block another Mastodon server. I haven’t really had to do this much but people talk about it a lot and I was confused about the difference, so:

A server can defederate from another server (this seems to be called suspend in the Mastodon docs). This means that nobody on a server can follow someone from the other server.
A server can limit (also known as “silence”) a user or server. This means that content from that user is only visible to that user’s followers – people can’t discover the user through retweets (aka “boosts” on Mastodon).

One thing that wasn’t obvious to me is that who servers defederate / limit is sometimes hidden, so it’s hard to suss out what’s going on if you’re considering joining a server, or trying to understand why you can’t see certain posts.

there’s no search for posts

There’s no way to search past posts you’ve read. If I see something interesting on my timeline and want to find it later, I usually can’t. (Mastodon has a Elasticsearch-based search feature, but it only allows you to search your own posts, your mentions, your favourites, and your bookmarks)

These limitations on search are intentional (and a very common source of arguments) – it’s a privacy / safety issue. Here’s a summary from Tim Bray with lots of links.

It would be personally convenient for me to be able to search more easily but I respect folks’ safety concerns so I’ll leave it at that.

My understanding is that the Mastodon devs are planning to add opt-in search for public posts relatively soon.

other ActivityPub software

We’ve been talking about Mastodon a lot, but not everyone who I follow is using Mastodon: Mastodon uses a protocol called ActivityPub to distribute messages.

Here are some examples of other software I see people talking about, in no particular order:

I’m probably missing a bunch of important ones.

what’s the difference between Mastodon and other ActivityPub software?

This confused me for a while, and I’m still not super clear on how ActivityPub works. What I’ve understood is:

ActivityPub is a protocol (you can explore how it works with blinry’s nice JSON explorer)
Mastodon servers communicate with each other (and with other ActivityPub servers) using ActivityPub
Mastodon clients communicate with their server using the Mastodon API, which is its own thing
There’s also software like GoToSocial that aims to be compatible with the Mastodon API, so that you can use a Mastodon client with it

more mastodon resources

Fedi.Tips seems to be a great introduction
I think you can still use FediFinder to find folks you followed on Twitter on Mastodon
I’ve been using the Ivory client on iOS, but there are lots of great clients. Elk is an alternative web client that folks seem to like.

I haven’t written here about what Mastodon culture is like because other people have done a much better job of talking about it than me, but of course it’s is the biggest thing that affects your experience and it was the thing that took me longest to get a handle on. A few links:

Erin Kissane on frictions people run into when joining Mastodon
Kyle Kingsbury wrote some great moderation guidelines for woof.group (note: woof.group is a LGBTQ+ leather instance, be prepared to see lots of NSFW posts if you visit it)
Mekka Okereke writes lots of great posts about issues Black people encounter on Mastodon (though they’re all on Mastodon so it’s a little hard to navigate)

that’s all!

I don’t regret setting up a single-user server – even though it’s inconvenient, it’s important to me to have control over my social media. I think “have control over my social media” is more important to me than it is to most other people though, because I use Twitter/Mastodon a lot for work.

I am happy that I didn’t start out on a single-user server though – I think it would have made getting started on Mastodon a lot more difficult.

Mastodon is pretty rough around the edges sometimes but I’m able to have more interesting conversations about computers there than I am on Twitter (or Bluesky), so that’s where I’m staying for now.

blogs
tech

Introducing nixexpr: Nix expressions for JavaScript

Christine Dodrill's Blog

tech
blogs

Introducing nixexpr: Nix expressions for JavaScript

Christine Dodrill's Blog

hero image eifel-tower2 — Nikon D3300, photo by Xe Iaso -- A picture of the tip of the Eifel Tower facsimilie in Las Vegas with a partially cloudy sky

As a regular reminder, it is a bad idea to give me ideas. Today's bad idea is brought to you by managerial nerd sniping, insomnia, and the letter "Q".

At a high level: writing complicated data structures in JavaScript kinda sucks. Here's an example of the kinds of things that I've been writing as I go down the ElasticSearch tour-de-insanite:

{
  highlight: {
    pre_tags: ['<em>'],
    post_tags: ['</em>'],
    require_field_match: false,
    fields: {
      body_content: {
        fragment_size: 200,
        number_of_fragments: 1,
      },
    },
  },
}

This works, this is perfectly valid code. It creates an object that has a few nested layers of stuff in it, but overall I just don't like how it looks. I think it looks superfluous. What if we could make it look a little bit nicer? How about something like this?

{
  highlight = {
    pre_tags = [ "em" ];
    post_tags = [ "</em>" ];
    require_fields_match = false;
    fields.body_content.fragment_size = 200;
    fields.body_content.number_of_fragments = 1;
  };
}

This is a Nix expression. It's a data structure that looks like JSON, but you have the power of a programming language at your fingertips. Note the difference between these two parts:

{
  fields: {
    body_content: {
      fragment_size: 200,
      number_of_fragments: 1,
    },
  },
}

{
  fields.body_content.fragment_size = 200;
  fields.body_content.number_of_fragments = 1;
}

These are semantically equal, but you don't have to use so much indentation and layering. These settings are all related, so it makes sense that the way that you use them is on the same level as the way that you define them.

If you want to try out this awesome power for yourself, Install Nix and then add @xeserv/nixexpr to your JavaScript dependencies.

npm install --save @xeserv/nixexpr

Then you can use it like this:

import { nix } from "@xeserv/nixexpr";

const someValue = "this is a string";

const myData = nix`{
    hello = "world";
    someValue = ${someValue};
}`;

console.log(myData);

I originally wrote this in Go for my scripting automation tool named yeet, but I think it's generically useful enough to exist in its own right in JavaScript. I think that there's a lot of things that the JavaScript ecosystem can gain from Nix, and I'm excited to see what people do with this.

This was made so I could write scripts like this:

// snipped for brevity
const url = slug.push("within.website");
const hash = nix.hashURL(url);

const expr = nix.expr`{ stdenv }:

stdenv.mkDerivation {
  name = "within.website";
  src = builtins.fetchurl {
    url = ${url};
    sha256 = ${hash};
  };

  phases = "installPhase";

  installPhase = ''
    tar xf $src
    mkdir -p $out/bin
    cp web $out/bin/withinwebsite
    cp config.ts $out/config.ts
  '';
}
`;

And then I'd be able to put that Nix expression into a file. I'll get into more details about this in a future post.

<Mara> Something something this isn't best practice something something this is a hack to make dealing with a legacy deployment easier something something

How it works

This is a very cheeky library, and it's all powered by one of the most fun to abuse Nix functions ever: builtins.fromJSON. This function takes a string and turns it into a Nix value at the interpreter level and it's part of the callpath for turning a string into an integer in Nix. It's an amazingly powerful function in its own right, but it gets even more fun when we bring JavaScript into the mix.

Any JavaScript data value (simple objects, strings, numbers, etc) can be formatted as JSON with the JSON.stringify function:

> JSON.stringify({"hi": "there"})
'{"hi":"there"}'

This includes strings. So if we use JSON.stringify to convert it to a JSON string, then string encode it again, we can inject arbitrary JavaScript code into Nix expressions:

let formattedValue = `(builtins.fromJSON ${
  JSON.stringify(JSON.stringify(value))
})`;

The most horrifying part about this hack is that it works.

What's next?

If this ends up getting used, I may try and make "fast paths" for strings and numbers so that they don't have to go through the JSON encoding/decoding process. But so far this works well enough for my purposes.

tech
blogs

Reading List

It's pretty rude of OpenAI to make their use of your content opt-out

I had a great time at DEF CON 31

Notes on using a single-person Mastodon server

what’s a mastodon instance?

on choosing a Mastodon instance

using your own domain

notes on having my own server

downsides to being on a single-person server

bad reply visibility makes conversations harder

some visibility workarounds

other downsides of running your own server: discovery is much harder

account migration

using the Mastodon API is great

DMs are weird

defederation and limiting

there’s no search for posts

other ActivityPub software

what’s the difference between Mastodon and other ActivityPub software?

more mastodon resources

that’s all!

Introducing nixexpr: Nix expressions for JavaScript

Introducing nixexpr: Nix expressions for JavaScript

How it works

What's next?