Convert a UTF-16 encoded string to an array of integers using UTF-8 encoding.
We believe in a future in which the web is a preferred environment for numerical computation. To help realize this future, we’ve built stdlib. stdlib is a standard library, with an emphasis on numerical and scientific computation, written in JavaScript (and C) for execution in browsers and in Node.js.
The library is fully decomposable, being architected in such a way that you can swap out and mix and match APIs and functionality to cater to your exact preferences and use cases.
When you use stdlib, you can be absolutely certain that you are using the most thorough, rigorous, well-written, studied, documented, tested, measured, and high-quality code out there.
To join us in bringing numerical computing to the web, get started by checking us out on GitHub, and please consider financially supporting stdlib. We greatly appreciate your continued support!
[![NPM version][npm-image]][npm-url] [![Build Status][test-image]][test-url] [![Coverage Status][coverage-image]][coverage-url]
Convert a [UTF-16][utf-16] encoded string to an array of integers using [UTF-8][utf-8] encoding.
bash
npm install @stdlib/string-utf16-to-utf8-array
script
tag without installation and bundlers, use the [ES Module][es-module] available on the [esm
][esm-url] branch (see [README][esm-readme]).deno
][deno-url] branch (see [README][deno-readme] for usage intructions).umd
][umd-url] branch (see [README][umd-readme]).javascript
var utf16ToUTF8Array = require( '@stdlib/string-utf16-to-utf8-array' );
array
of integers using [UTF-8][utf-8] encoding.javascript
var out = utf16ToUTF8Array( '☃' );
// returns [ 226, 152, 131 ]
U+0000
to U+D7FF
and U+E000
to U+FFFF
).U+10000
to U+10FFFF
and encodes U+10000-U+10FFFF
by subtracting 0x10000
from the code point, expressing the result as a 20-bit binary, and splitting the 20 bits of 0x0-0xFFFFF
as upper and lower 10-bits. The respective 10-bits are stored in two 16-bit words: a high and a low surrogate.text
0x00000000 - 0x0000007F:
0xxxxxxx
0x00000080 - 0x000007FF:
110xxxxx 10xxxxxx
0x00000800 - 0x0000FFFF:
1110xxxx 10xxxxxx 10xxxxxx
0x00010000 - 0x001FFFFF:
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
x
represents a code point bit. Only the shortest possible multi-byte sequence which can represent a code point is used.javascript
var utf16ToUTF8Array = require( '@stdlib/string-utf16-to-utf8-array' );
var values;
var out;
var i;
values = [
'Ladies + Gentlemen',
'An encoded string!',
'Dogs, Cats & Mice',
'☃',
'æ',
'𐐷'
];
for ( i = 0; i < values.length; i++ ) {
out = utf16ToUTF8Array( values[ i ] );
console.log( '%s: %s', values[ i ], out.join( ',' ) );
}