yencabulator 17 hours ago

So `#` starts a line comment but `#t` is a boolean. Yeah, that's never gonna hurt anyone.

dangoodmanUT a day ago

gotta have some examples on the landing before the fold, otherwise i have no reason to explore

  • porcoda 18 hours ago

    Tutorial link too hard to click? Seriously: these comments inevitably come up for most language related pages, and I don’t see how they don’t fall under the site guidelines of “Please don't post shallow dismissals”.

    • yencabulator 17 hours ago

      The comment specifically said "on the landing before the fold", so yours is the shallow dismissal. I agree with the comment; lead with the things we shall judge ye by.

    • paddy_m 17 hours ago

      I made a similar comment. I think they come up because people are genuinely interested in a project and trying to offer the creator a fresh perspective. When creating a project where your the domain expert it's so easy to get stuck in your own head, and then you start explaining the project to a newcomer diving into deep details when they don't understand the starting point.

      Until a project has a lot of traction (think docker, react, django not uv or jq) it's very safe to assume that every visitor to your page doesn't understand the background.

boxed a day ago

> This is a good time to mention that even though from a semantic perspective sets and dictionaries do not carry information about the ordering of their elements.

Except they do in Python. It is extremely useful, surprisingly often.

fjfaase a day ago

Is it true that Records are the same as Dictionaries, because the labels in the records can have any value?

Interesting on how on one hand the size of SignedInteger is unlimited, but on the other hand there is a ByteString. A ByteString could also have been represented by as a sequence of SignedInteger. I also wonder if it would not better to have a Unicode character as an atomic unit and represent a string as a sequence of Unicode characters.

This makes me wonder whether this is a high-level data model or yet another data representation.

  • tonyg a day ago

    No, a record is a tagged (sequence of) value(s).

      <tag v1 v2 v3>
    
    If you put a single dictionary-valued "field" in a record, you get a variation with named fields

      <tag {
        field1: value1
        field2: value2
        field3: value3
      }>
    
    Records have positional "fields" because of the Scheme heritage of the design.

    --

    Re bytestring -- yes there are some concessions to real machines/languages in there that aren't absolutely required. Other examples include booleans and strings, which could have been <true> and <false> and <string [65 66 67]> etc respectively.

    There's a little more on this topic in footnote 2 on the "conventions" page: https://preserves.dev/conventions.html#fn:why-dictionaries

    • carterschonwald 19 hours ago

      Looks like a lot of excellent work!

      Are there any good examples of nontrivial schemas etc?

      • tonyg 7 hours ago

        Thanks. Yes there are: see sections 15, 16 and 17 of https://synit.org/book/, where Preserves, Preserves Schemas, and the Syndicated Actor Model make a reactive replacement system layer for linux (essentially an alternative to systemd)

yegle 17 hours ago

Does Preserves have a page with a comparison to other common serialization languages?

As someone familiar with Protobuf, comparing Preserves vs Protobuf text format, here's my quick comparison between the two after reading through the tutorial:

- Preserves' Symbol is very close to Protobuf enums. But Symbol can contain characters like dash

- There doesn't seem to be an equivalent of Preserves' Record in Protobuf, but the tutorial's example of using <Unknown ...> To denote a missing <Date ...> can be simulated using the `oneof` field in Protobuf.

- Having to write #t/#f in Preserves is unfortunate. I guess this is the result of schemaless serialization language and potential parsing ambiguity with a Symbol?

- Protobuf have a way to annotate the schema and reuse at runtime, very similar to Preserves' annotations.

  • skybrian 11 hours ago

    Protobufs are designed to support schema evolution without explicit versioning. (All fields are optional so they can be added or dropped, provided field numbers aren’t reused.)

    It looks like Preserves just uses version numbers in its schemas. On the other hand, you can read the data without a schema, similar to JSON.

    • tonyg 7 hours ago

      The version number is the schema language version, not the version of the collection of types described in the file.

      The schema language is extensible/evolvable in that pattern matching ignores extra entries in a sequence and extra key/value pairs in a dictionary. So you could have a "version 1" of a schema with

        Person = <person @name String> .
      
      and a "version 2" with

        Person = @v2 <person @name String @address Address>
               / @v1 <person @name String> .
      
      Then, Person.v2 from "version 2" would be parseable by Person from "version 1", and Person from "version 1" would parse using "version 2" as a Person.v1.

      The schema language is in production but the design is still a work in progress and I expect more changes before a 1.0 release of the schema language.

      (The schema language is completely separate from the preserves data model, by the way -- one could imagine other schema languages being used instead/as well)

djoldman a day ago
  • tonyg a day ago

    Or, in "quick reference card" form: https://preserves.dev/cheatsheet.html

    The syntax isn't the most interesting part though; the thing that distinguishes it from most other data languages out there is that it has semantics (= a rigorous definition of when values are equal and when they aren't). So you can use Preserves semantics with JSON syntax (a subset of Preserves' text syntax) as one way of getting actually-meaningful JSON.

    Plus, comments (and other annotations) ;-)

lionkor a day ago

Why/when/where would I need this?

  • paddy_m a day ago

    I was going to ask the exact same question. The title makes it sound like something I might be interested in, then I visited the page and I have no idea what it does.

    After some brief reading of docs, I'm trying to write one sentence explanations. Maybe this will be helpful to you

    What

    Preserves is a specification and set of libraries in popular languages that lets you reliably exchange data between XML, JSON and EDN.

    Who

    Preserves is built for (data engineers|data framework writers) to reliably interchange data.

    Why

    Formats like JSON in particular are imprecise. Preserves forces you to deal with these vagaries up front

    What else?

    With P-Expressions you can search a preserve compliant datasource much like you would query JSON with JQ

    Who Not? Who shouldn't use this

    This will not help a data analyst exchange data between CSV and Excel

    • lionkor 17 hours ago

      Thank you! That makes more sense now

      • paddy_m 17 hours ago

        I have know idea if the project author would agree with those sentences, I was just proposing them.

  • tonyg a day ago

    Useful if you have a JSON-keyed table, for example: JSON lacks a useful (standardised) equivalence relation, meaning you get weak and/or implementation-specific guarantees about how key lookup works. Equivalences were the motivation for developing Preserves: I was (and am) working on a generalized approach to messaging middleware, you might say, meaning that things like "patterns over values" and "filters" and "value-keyed tables" are all things I need to talk about. (This all comes out of RabbitMQ/AMQP thinking back in the day and my PhD-and-after work subsequently.)

conartist6 a day ago

CSTML is targeting many of the same weaknesses in JSON. It's fun to see a whole different, competing set of design choices at work. I had a very different take on schema validation and how to use the < syntax.

account-5 18 hours ago

Might be an ignorant question but why not just use XML? It seems like XML could do all this, from my limited reading?

  • tonyg 6 hours ago

    XML would make a fine choice. It lacks atomic data types other than text, and compound data types other than sequences, unless you count element attributes, which are in a kind of awkward position because of the historical development of the language. Preserves has a richer suite of primitive data types and decomposes XML's elements into separate notions of map, sequence, and tagged value.

  • mhalle 13 hours ago

    XML would require a schema to express the concepts in Preserves, or JSON for that matter.

    The reason JSON is lower friction than XML for data representation is that you get basic data representations (numbers, strings, arrays, maps) for free in a natural native syntax that happens to parallel multiple programming languages.

    XML, in contrast, is a meta-language that allows schema to express different data representations. You've got to use attributes and elements to represent data and data types. XSD is a common datatype schema, but it's quite verbose, and data serialization looks very different from what it looks like in a programming language representation.

    Preserves looks like a superset of JSON. It includes additional data representation concepts through syntax extensions, but the idea is the same.

    What I don't see is a standard way to map record types (like "irl" in the tutorial) to a unique identifier like an URI/IRI, or something like a CURIE. That kind of feature would allow Preserves to better describe standardized record types.

layer8 19 hours ago

Looking at the headings in that TOC, “Preserves“ is a bit of an unfortunate naming choice grammatically.

  • tonyg 6 hours ago

    Yeah I struggle with "Preserve" vs "Preserves" sometimes. Was there something in particular that struck you as unfortunate, though?

    • layer8 3 hours ago

      "Preserves <something>", for example "Preserves data", reads like "it preserves data". Probably less so in the middle of a sentence, due to the uppercasing, but in the TOC it reads like bullet points enumerating what is preserved.

      • tonyg 2 hours ago

        Thank you! I wonder if something a bit contrived such as small-caps could help. I'll experiment.

        • layer8 an hour ago

          Putting Preserves in italics would be an alternative.

          The name nevertheless feels awkward to me, also in spoken conversation. A made-up word like maybe “Pres” or “Edal” (from “expressive data language”) would work better IMO.

twism a day ago

So EDN?

  • tonyg a day ago

    Yeah EDN is quite similar. Preserves has no nil, allows any value as a tag, gets into the weeds more on when strings are equal or not, doesn't distinguish lists and vectors, and doesn't require each kind of tagged element to define an equivalence. And it has annotations (vs EDN's comments) and embedded values.