I take that back, this isn't a bug so much as a request that we be more liberal than the JSON-5 spec permits. I might be open to that, but first we should make the error easier to understand. Here's what's happening:
The values of info() and field() entries in a database file are supposed to be valid JSON-5 values, and are currently parsed as such.
A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks, except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).
Thus all control characters are supposed to be escaped according to the older JSON rules.
The JSON-5 spec at https://spec.json5.org/#strings has similar language, although its BNF also confusingly allows an unescaped SourceCharacter from the ECMAScript language spec at https://262.ecma-international.org/5.1/#sec-6 specification, which is defined as "any Unicode code unit". I think that can be ignored though.
Both of our JSON parser lexers (dbLex.l and yajl_lex.c) follow those strict specifications when it comes to the set of characters allowed inside strings. I could change them to allow unescaped tab characters inside strings (please comment if you have an opinion about that either way), but I will have to fix both parsers to do that.
I agree that the error messages you got aren't particularly helpful. The character being complained about is the initial double-quote at the start of the string – the lexer couldn't match the whole string because of the illegal character between the quotes, so it back-tracked to the very start of the it and complained about the quote itself. This also explains the "funny" result with the BEL character, which isn't currently legal anywhere in a .db file.
To give a more friendly error message here I can add error-matching patterns that recognize anything that looks like a string to a human but doesn't to the strict lexer, then tell the user what's wrong with their string. The first part should be relatively straightforward to code, although I'm not sure I want to write code that could analyze any kind of broken string and explain why it isn't a legal JSON string.
I take that back, this isn't a bug so much as a request that we be more liberal than the JSON-5 spec permits. I might be open to that, but first we should make the error easier to understand. Here's what's happening:
The values of info() and field() entries in a database file are supposed to be valid JSON-5 values, and are currently parsed as such.
Looking at the JSON spec https:/ /datatracker. ietf.org/ doc/html/ rfc7159 and the diagrams on https:/ /www.json. org/json- en.html that spec doesn't actually allow *any* unescaped control characters inside string values. The spec says:
A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks, except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).
Thus all control characters are supposed to be escaped according to the older JSON rules.
The JSON-5 spec at https:/ /spec.json5. org/#strings has similar language, although its BNF also confusingly allows an unescaped SourceCharacter from the ECMAScript language spec at https:/ /262.ecma- international. org/5.1/ #sec-6 specification, which is defined as "any Unicode code unit". I think that can be ignored though.
Both of our JSON parser lexers (dbLex.l and yajl_lex.c) follow those strict specifications when it comes to the set of characters allowed inside strings. I could change them to allow unescaped tab characters inside strings (please comment if you have an opinion about that either way), but I will have to fix both parsers to do that.
I agree that the error messages you got aren't particularly helpful. The character being complained about is the initial double-quote at the start of the string – the lexer couldn't match the whole string because of the illegal character between the quotes, so it back-tracked to the very start of the it and complained about the quote itself. This also explains the "funny" result with the BEL character, which isn't currently legal anywhere in a .db file.
To give a more friendly error message here I can add error-matching patterns that recognize anything that looks like a string to a human but doesn't to the strict lexer, then tell the user what's wrong with their string. The first part should be relatively straightforward to code, although I'm not sure I want to write code that could analyze any kind of broken string and explain why it isn't a legal JSON string.