astrolabe

at://samuel.bsky.team/com.whtwnd.blog.entry/3ljlqmchv2b2a

Record JSON

{
  "$type": "com.whtwnd.blog.entry",
  "content": "\u003e Note: this blog post is aimed at a technical audience\n\nAT Protocol (hereafter referred to as “atproto”) is a groundbreaking new technology by Bluesky (the company) used to build Bluesky (the social app). At this point, a lot has been said about *why* this is being build and *how* it works at a high level, but I think it’s long overdue for someone to get their hands dirty and figure out how to build something from the ground up with it.\n\nThis post is *not* going to cover how it works as a whole from the top down. Instead, this series will attempt to build a bottom-up understanding, by explaining and implementing the nitty-gritty. atproto has many parts, many of which may seem strange in isolation, but they all fit together to form a cohesive whole. If you want to get a better understanding of how it works at a high level, I recommend Dan Abramov’s Web Without Walls talk - you’ll probably be pretty lost if you haven’t seen it, so consider it required watching.\n\nhttps://www.youtube.com/watch?v=F1sJW6nTP6E\n\nI am also *not* going to start from the absolute beginning. This should be considered a sequel to [Quick start guide to building applications on AT Protocol](https://atproto.com/guides/applications), which is the official guide to building an atproto demo app. We are going to take the Statusphere example app and extend it, using more advanced atproto concepts.\n\n## Understanding lexicon\n\nIn the statusphere example, we make a **lexicon** schema to describe records of type `xyz.statusphere.status`. It looks something like this:\n\n```json\n{\n  \"lexicon\": 1,\n  \"id\": \"xyz.statusphere.status\",\n  \"defs\": {\n    \"main\": {\n      \"type\": \"record\",\n      \"key\": \"tid\",\n      \"record\": {\n        \"type\": \"object\",\n        \"required\": [\"status\", \"createdAt\"],\n        \"properties\": {\n          \"status\": {\n            \"type\": \"string\",\n            \"minLength\": 1,\n            \"maxGraphemes\": 1,\n            \"maxLength\": 32\n          },\n          \"createdAt\": {\n            \"type\": \"string\",\n            \"format\": \"datetime\"\n          }\n        }\n      }\n    }\n  }\n}\n```\n\nIn summary, it describes a `record`, which is of type `object`. This object has two properties, a “status” which is a single grapheme long, and a “createdAt” date, both of which are required.\n\nThis schema describes the shape of records one might encounter on the network. An example might be:\n\n```json\n{\n  \"$type\": \"xyz.statusphere.status\",\n  \"status\": \"🦋\",\n  \"createdAt\": \"2025-03-03T18:43:46.740Z\"\n}\n```\n\nHere’s a question. Is this a valid `status` record?\n\n```json\n{\n  \"$type\": \"xyz.statusphere.status\",\n  \"status\": \"🦋\",\n  \"createdAt\": \"2025-03-03T18:43:46.740Z\",\n  \"comment\": \"a butterfly :)\",\n  \"labels\": {\n    \"$type\": \"com.atproto.label.defs#selfLabels\",\n    \"values\": [\n      { \"val\": \"graphic-media\" }\n    ]\n  }\n}\n```\n\nThe answer is yes! All records are **open unions**. This means that a record might have more content attached to it than described in the lexicon you have. It’s this way for a few reasons:\n\n1. **The lexicon might evolve in the future to have more fields**. Open unions allow forward compatibility - an app will only use the fields it knows about, but will still consider future records that might have more fields to be valid.\n2. **Extensibility**. Other developers might want to attach extra data to records that use your lexicon (although make sure you [namespace your fields](https://docs.bsky.app/blog/pinned-posts)!).\n3. **You don’t own the data**. All the data is out there, in the atmosphere, being published and owned by the end users. Lexicons merely describe the *minimum viable shape* of the data that your application will accept and consider valid.\n\nIn the above example, it could be that a future version of the statusphere app has added comments and self-labels. Your statusphere app however only needs the `status` and `createdAt` fields to be present (and match the format) for the object to be considered valid, even though there’s extra stuff in there - just ignore it and use the status.\n\nAnother case is when a field might have multiple variants, such as the embeds on Bluesky posts. You can see in the schema here, we define what kind of records you can use in the embed: images, video, etc:\n\nhttps://github.com/bluesky-social/atproto/blob/main/lexicons/app/bsky/feed/defs.json#L16-L25\n\nHowever, this is an open union, so there could be another kind of embed in there instead - and the post would still be valid. An example of this is when we added video embeds. We added the new record type to the union so that the new versions of the app would know about the new possible variant, but old apps already out there in the wild still wouldn’t know what it was. However, since it’s an open union, they can just ignore the strange new record and render the post without an embed.\n\n## Views\n\nLexicons can define more than records though. A helpful concept in atproto is “views” - records are the raw data type that’s out there in the real world but might be quite minimal, whereas views are processed data that your app may produce for ease of use. An example of this in `bsky.app` would be the difference between a `app.bsky.feed.post` record and a `app.bsky.feed.defs#postView` - a post record can be nothing but a text field and a date, whereas a `PostView` is a post record that has been processed by the Bluesky AppView and had helpful metadata attached, such as counting the number of likes and reposts, and attaching the author’s name and avatar.\n\nStatusphere statuses are pretty rough to work with directly! Already, we store them in the database in a non-lexicon format, since we need to save who it is by. Let’s describe a new object called a `#statusView`, which describes what a status might look like after processing.\n\n```json\n{\n  \"lexicon\": 1,\n  \"id\": \"xyz.statusphere.defs\",\n  \"defs\": {\n    \"statusView\": {\n      \"type\": \"object\",\n      \"required\": [\"uri\", \"status\", \"profile\", \"createdAt\"],\n      \"properties\": {\n        \"uri\": { \"type\": \"string\", \"format\": \"at-uri\" },\n        \"status\": {\n          \"type\": \"string\",\n          \"minLength\": 1,\n          \"maxGraphemes\": 1,\n          \"maxLength\": 32\n        },\n        \"createdAt\": { \"type\": \"string\", \"format\": \"datetime\" },\n        \"profile\": { \"type\": \"ref\", \"ref\": \"#profileView\" }\n      }\n    },\n    \"profileView\": {\n      \"type\": \"object\",\n      \"required\": [\"did\", \"handle\"],\n      \"properties\": {\n        \"did\": { \"type\": \"string\", \"format\": \"did\" },\n        \"handle\": { \"type\": \"string\", \"format\": \"handle\" }\n      }\n    }\n  }\n}\n```\n\nNote that:\n- these are not `\"type\": \"record\"`, they are `\"type\": \"object\"`. This means that we can generate a helper type using codegen, but it won’t generate the helper methods to save a record of this type to the network, like it would for a record - it’s purely for internal use.\n- We also have to define a `#profileView`, and reference it in the `profile` field. This is because lexicon doesn’t allow you to define nested objects.\n\nWe can then generate a StatusView from a status we get from the database like so:\n\n```ts\nimport { XyzStatusphereDefs } from '@statusphere/lexicon'\nimport { AppContext } from '#/context'\nimport { Status } from '#/db'\n\nexport async function statusToStatusView(\n  status: Status,\n  ctx: AppContext,\n): Promise\u003cXyzStatusphereDefs.StatusView\u003e {\n  return {\n    uri: status.uri,\n    status: status.status,\n    createdAt: status.createdAt,\n    profile: {\n      did: status.authorDid,\n      handle: await ctx.resolver\n        .resolveDidToHandle(status.authorDid)\n        .catch(() =\u003e 'invalid.handle'),\n    },\n  }\n}\n```\n\n## Queries and Procedures\n\nLexicon also allows you to define Queries and Procedures. This lets you automatically codegen API routes from your lexicons, on both frontend and backend. The Statusphere example doesn’t need to do this, because it’s a simple express app with APIs that use FormData and HTML. However, a more complex app might want a strict API contract between their internal services, or even between their services and external third-party services (like Bluesky feed generators). Lexicon lets you define these too! Let’s consider an API to fetch the list of most recent statuses. We can define parameters, like a limit for the number of statuses we want, and then define the API response. In this case, we can use our new `xyz.statusphere.defs#statusView`, which means our frontend can use the helpful extra metadata we added to the record!\n\n```json\n{\n  \"lexicon\": 1,\n  \"id\": \"xyz.statusphere.getStatuses\",\n  \"defs\": {\n    \"main\": {\n      \"type\": \"query\",\n      \"description\": \"Get a list of the most recent statuses on the network.\",\n      \"parameters\": {\n        \"type\": \"params\",\n        \"properties\": {\n          \"limit\": {\n            \"type\": \"integer\",\n            \"minimum\": 1,\n            \"maximum\": 100,\n            \"default\": 50\n          },\n        }\n      },\n      \"output\": {\n        \"encoding\": \"application/json\",\n        \"schema\": {\n          \"type\": \"object\",\n          \"required\": [\"statuses\"],\n          \"properties\": {\n            \"cursor\": { \"type\": \"string\" },\n            \"statuses\": {\n              \"type\": \"array\",\n              \"items\": {\n                \"type\": \"ref\",\n                \"ref\": \"xyz.statusphere.defs#statusView\"\n              }\n            }\n          }\n        }\n      }\n    }\n  }\n}\n```\n\nProcedures are a similar story - where queries are GET requests, procedures are POST requests. Let’s define one for sending statuses:\n\n```json\n{\n  \"lexicon\": 1,\n  \"id\": \"xyz.statusphere.sendStatus\",\n  \"defs\": {\n    \"main\": {\n      \"type\": \"procedure\",\n      \"description\": \"Send a status into the ATmosphere.\",\n      \"input\": {\n        \"encoding\": \"application/json\",\n        \"schema\": {\n          \"type\": \"object\",\n          \"required\": [\"status\"],\n          \"properties\": {\n            \"status\": {\n              \"type\": \"string\",\n              \"minLength\": 1,\n              \"maxGraphemes\": 1,\n              \"maxLength\": 32\n            }\n          }\n        }\n      },\n      \"output\": {\n        \"encoding\": \"application/json\",\n        \"schema\": {\n          \"type\": \"object\",\n          \"required\": [\"status\"],\n          \"properties\": {\n            \"status\": {\n              \"type\": \"ref\",\n              \"ref\": \"xyz.statusphere.defs#statusView\"\n            }\n          }\n        }\n      }\n    }\n  }\n}\n```\n\n## Code generation\n\nNow we have these shiny new lexicons, what can we do with them? The answer is the `@atproto/lex-cli` package, which allows code generation from lexicons into both frontend and backend code. This is how `@atproto/api` is generated from the lexicons, for example.\n\nIt’s worth noting at this point that I have refactored the Statusphere example into having a separate frontend and backend. The frontend now uses react/vite, and the backend is still express but now uses JSON APIs rather than sending HTML. Take a look:\n\nhttps://github.com/mozzius/statusphere-react\n\nYou’ll note that lexicons sit at the root of the monorepo. `pnpm lexgen` triggers the code generation, which uses `lex gen-api` to generate a client-side package in `@statusphere/lexicon`, and `lex gen-server` to generate backend code in `@statusphere/appview`. Let’s have a look what it generated from our new lexicons, and how it helps.\n\n### Client codegen\n\nI packaged the frontend codegen into `@statusphere/lexicon`, so it could be shared. In `/packages/client/src/lib/api.ts`, we create a `StatusphereAgent` which has the API routes we defined earlier. For example:\n\n```typescript\nconst {data} = agent.xyz.statusphere.getStatuses({limit: 10})\n```\n\nIf you’ve used `@atproto/api` before, you get it. Basically, we can add extra methods to the atproto agent, as defined by our custom lexicons.\n\n### Backend codegen\n\nHere’s where it gets interesting. `lex gen-server` generates a XRPC Server using the `@atproto/xrpc-server`. These are the matching backend routes for the agent on the frontend. We can define typesafe route handlers for each of our queries and procedures, then add them to our Express server. Here’s the pattern I used, which is similar to the Bluesky AppView (although feel free to plumb the data through differently).\n\n```typescript\n// src/index.ts\nimport express from 'express'\n// this is codegen'd by lex gen-server\nimport { createServer } from '#/lexicon'\nimport API from '#/api'\n\n// other setup stuff, such as the ingestor\n\nconst app = express()\napp.use(express.json())\napp.use(express.urlencoded({ extended: true }))\n\n// Create our XRPC server\nlet server = createServer({\n  validateResponse: env.isDevelopment,\n  payload: {\n    jsonLimit: 100 * 1024, // 100kb\n    textLimit: 100 * 1024, // 100kb\n    // no blobs\n    blobLimit: 0,\n  },\n})\n\nserver = API(server, ctx)\n\napp.use(server.xrpc.router)\n\n// add other routes, start up the server\n```\n\nThis is where we add all the individual route handlers to the Express server:\n\n```typescript\n// src/api/index.ts\nimport { AppContext } from '#/context'\nimport { Server } from '#/lexicons'\nimport getStatuses from './lexicons/getStatuses'\nimport getUser from './lexicons/getUser'\nimport sendStatus from './lexicons/sendStatus'\n\nexport default function (server: Server, ctx: AppContext) {\n  getStatuses(server, ctx)\n  sendStatus(server, ctx)\n  getUser(server, ctx)\n  return server\n}\n```\n\nAnd here’s an example of a route handler - in this case, for `xyz.statusphere.getStatuses`\n\n```typescript\n// src/api/lexicons/getStatuses.ts\nimport { AppContext } from '#/context'\nimport { Server } from '#/lexicons'\nimport { statusToStatusView } from '#/lib/hydrate'\n\nexport default function (server: Server, ctx: AppContext) {\n  server.xyz.statusphere.getStatuses({\n    handler: async ({ params }) =\u003e {\n      // Fetch data stored in our SQLite\n      const statuses = await ctx.db\n        .selectFrom('status')\n        .selectAll()\n        .orderBy('indexedAt', 'desc')\n        .limit(params.limit)\n        .execute()\n\n      return {\n        encoding: 'application/json',\n        body: {\n          statuses: await Promise.all(\n            statuses.map((status) =\u003e statusToStatusView(status, ctx)),\n          ),\n        },\n      }\n    },\n  })\n}\n```\n\nThis is super helpful for ensuring your frontend and backend are synced - the source of truth is coming from your lexicons. Adding a new route is a matter of defining a new lexicon, running codegen, and adding a new handler, which is super simple since it’s all already typesafe on frontend and backend.\n\n## Homework\n\nFork https://github.com/mozzius/statusphere-react and extend it with a new XRPC query! Maybe try adding a query to get the most recent status of a given user, or even get a user’s status history. Play around with adding more detailed lexicon view objects - for example, an `emojiStats` object which defines how many times an emoji has been posted.\n",
  "createdAt": "2025-03-05T11:49:39.974Z",
  "theme": "github-light",
  "title": "atproto by example part 1: records and views",
  "visibility": "public"
}