My implementation of OFFSET? in Red is exactly the same as Kaj's. I didn't thinl about adding that check either, but I think I have used it on different series at least once in the past, tracking progress in two different queues.
Kaj
By request, I added a /deep refinement on my JSON emitter for emitting nested blocks as objects. The /map refinement now only works on even sized blocks, instead of considering odd sized blocks an error:
The JSON loader now supports string escaping, except Unicode escapes, which are implemented but are waiting for Red's LOAD to support char! syntax with parens: #"^(0000)"
The Unicode escapes \u aren't really implemented yet: they always output NULL, but they're only needed for obscure control characters.
Kaj
To simplify the TNetStrings and JSON converters for Red, I implemented found?, any-word!, any-word?, series?, any-string!, any-string?, any-block! and any-block? in common.red:
The Stone DB is consuming a lot of my time but its moving forward pretty nice... current single thread (in-RAM) imports run at 10 million nodes per second using an average node payload of 40 bytes (which is longer than the average I'd typically use). The majority of the time is spent verifying internal dataset integrity and memory copying.
it takes 3 seconds to basically grab all available process RAM (2GB ) and create 30 million data nodes. 1 million nodes takes 50ms on average, I'm getting pretty flat scaling so far, which is a very good sign . note that the data is completely memory copied into the DB . I'm not pointing to the original import data.
all of these benchmarks, are not even using a dedicated import function... this is like the worst case scenario for import. its a dumb FOR loop using a fully bounds checking single insert-node() function.... if I did an import loop which only does the bounds checking and stores counters in a loop, I likely can scale the import a lot.
I'm now starting work on the higher level interfaces, basically creating database setup on the fly and hopefully by friday I should have the file i/o started.
maybe next week I'll start to see how I can create a native Stone DB interface for R3.
TomBon
nice tech you are doing there maxim. count me in for some big data tests. i never used graph DBs before but would like to give it a try for a non scalable setup, suboptimal solved via simple key traversal stored into a nosql core currently.
- Floating point numbers are now parsed and loaded as file! types, so external data with floats can at least be loaded and the numbers can be detected, so they could be processed further by your own functions.
red>> load-JSON "6.28" == %6.28
- char! type is now more explicitly supported, in the sense that single character strings will be loaded as char! so they are more efficient.
- object! type is now supported, so it becomes easier to emit TNetStrings with nested dictionaries and JSON data with nested objects. The converters can still (and need to) be compiled: they use the interpreter only very sparingly for objects support.
- All Red data types can now be emitted. Not explicitly supported types are FORMed. - Several new refinement options, in particular for object support.
- The JSON converter now implements the full specification on json.org except escaped UTF-16 surrogate pairs. There is little reason for them to occur in JSON data.
Kaj
The JSON converter is still smaller than the official R2 implementation. It's now larger than the R3 implementation, but has more features. It's still an order of magnitude smaller than most JSON implementations in other languages.
Maxim
StoneDB is starting to take shape. I got the preliminary disk storage prototype finished today. I can't give factual speed benchmarks since for now I've got no time to do extensive testing... but it seems to be able to store at least 500000 nodes a second (@about 14MB/s), which is pretty decent for a prototype using default C disk writing functions and absolutely no regards for disk i/o profiling. this is even more acceptible considering its running on a lame notebook disk. (I should have a SSD after the holidays, so I'll be able to compare :-)
with the current architecture, I should be able to read any cell directly from disk so query set can be larger than physical RAM.
If all goes well, I should have persistent read/write access to the DB's file data done by the time I go to bed tonight ..... yay!
After that... cell linking which will require a different variable length dataset driver. This new one will allow perpetual appending without any need to copy memory :-)