AltME: Ren - Data Exchange Format


(not lesser 'cause it's Red, just because is still in the works :)
One of the bad parts of JSON for me is the hassle of escaping multiline strings when more often the not the JSON is a multi-line string that mustn't have the newlines escaped.
Is is possible to add {} for multi-lined strings?
One comment and two questions about the definition of strings - "A string is a series of zero or more Unicode characters"
1. It would be better to refer to code points than characters.
2. How will the string be encoded? I presume the straightforward is the same as the REN document. (I see that the JSON standard allows for UTF-8, UTF-16 or UTF-32).
3. Will  the ^escaping provided for specifying hex codepoints such as ^(010000) ? If so it will need to be made clear that it  the value of the codepoint is to be supplied (so that people don't escape the UTF-8 for the codepoint be accident.)
I've read the "spec" a bit more thoroughly, the answer is yes. I would like to ask for 2 and 6 digit hex strings to be allowed (for convenience). I think it needs to be noted that the hex number is the number of the Unicode codepoint.
One point about encoding. If the standard states that REN documents must be UTF-8 encoded, that makes life easier for implementors as they don't need to worry about handling the endian issues of UTF-16 and UTF-32.
It is also very good for REBOL3 and Red implementors as both use UTF-8 by default. (Not too good for REBOL 2 though.)
@Chris We have two lexers in Red, one is written in Rebol for the compiler, one is written in Red (and no more in Red/System) for the runtime part (LOAD support).
Pekr, It can never be 100%, as R2, R3, Red, World, etc. all differ somewhat.
Chris, right.
Peter, I want multiline strings as well. Initially I tried posting proposals, building up from base types, to try and gain consensus. Didn't work so well. So consider the current spec to have gaps. I will adjust the text to refer to code points. On convenience, which applies to multiline strings as well, Redbol compatibility is a primary concern, but also keeping the rules as simple and unabmiguous as possible. Unicode isn't my area, so I will defer to you and others on cost/benefit.
To Petr's (Pekr) point about hiding the grammar on a detail page, I did just recently think about doing something like that. Not entirely, but leaving out sub-rules like int, frac, exp, etc. since the main page is just informative. That is, just list the top-level types.
However, the page was intended, as Petr so astutely observed, for the people who might implement parsers and generators for it. The idea being to have a single page that offers a description and reasoning for each datatype, with the informative grammar right there as well.
I think multiline strings offer so much value that we need them. Any strong arguments against?
On encoding, I'm good with requiring UTF8. Any arguments against?
I agree with both multiline strings and UTF8.

one note about multiline (not just strings, in general): a lot of things assume JSON to always be able t o fit in a single line. Eg. use line terminator as delimiter between JSON objects. (Eg. IPC in node.js is line terminated JSON)
Interesting. There is certainly something to be said for mashing things into one line in certain contexts. I have a SINGLE-LINE-MOLD func I use for logging, for example, and having tools that could operate in a mix of line-oriented nix fashion with each line being structured, is a possibility too.
Gregg, did you miss MOLD/FLAT? :-)
It doesn't remove line breaks.
Ah, Red does. So my func is obsolete. :-)
Though mine, IIRC, replaces line breaks with their escaped counterparts.
Ah, Red does that too, on ML strings. You win. :-)
And thank you. :-)

Back to time!

Last message posted 208 weeks ago.