Mars

Add Char type

Bug #870518 reported by Matt Giuca on 2011-10-08

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mars	Triaged	Wishlist	Matt Giuca	Mars 1.1

Bug Description

I have decided (after all this) that Mars does need a separate Char type, for primarily two reasons:
- Despite Mars historically being specified as dealing with plain bytes, I am becoming increasingly annoyed by real languages not implementing Unicode properly. So I have decided to lead by example by adding proper Unicode strings to Mars.
- Changing the Int type to Num (bug #870515) -- a floating point type -- makes the current idiom of treating strings as arrays of integers even sillier (an array of floating point numbers?) Therefore, having a dedicated Char type will be useful.

There would still be no String type -- a string would be an Array(Char) and all string-related functions would be modified to deal with such a type.

Char would be defined as an integer in the range [0, 0x10ffff], with values representing Unicode code points. Char values would display as quoted character literals, and character/string literals would have type Char and Array(Char), respectively. Character and string literal syntax would be extended with \uxxxx and \Uxxxxxxxx notation for specifying code point values.

We would supply several new built-in functions: chr and ord, for conversion from Char to Num. We would also need to be concerned with encodings when reading and writing from a file, and possibly need to specify a way to read and write bytes from a file as well.

See original description

Tags:

Related branches

lp://staging/~mgiuca/mars/byte

Matt Giuca (mgiuca) on 2011-10-08

description:

updated

Revision history for this message

Matt Giuca (mgiuca) wrote on 2012-01-05:

Upon further ponderance, this is too big a feature to implement. The biggest problem is that I/O would need to be aware of what encoding the stream is using (if it forces you to use UTF-8 then it just makes things worse).

So, rather than adding a Char type, I will settle for adding a Byte type, defined as an unsigned integer in the range [0, 0xff]. There will be no arithmetic on bytes. Byte literals will be character literals, so they aren't like Java bytes (small integers) -- they actually represent characters. The low 128 values represent ASCII characters, while the high 128 values represent byte values in some arbitrary encoding -- generally UTF-8.

Matt Giuca (mgiuca) on 2014-07-05

Changed in mars:
milestone:	1.0 → 1.1

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.