πŸ”¨ Quick β€˜n’ Dirty Tutorial (In Progress)

Warning

πŸ”¨ This isn’t finished yet! Come check back by the next major or minor version update.

cuneicode is a C library whose headers work in both C and C++. Its implementation is currently done in C++. To use it, use:

  • one of the many CMake methods (add_subdirectory, FetchContent, or similar)

  • directly add and build all the sources to your project

Warning

Adding sources directly to your project is not guaranteed to work in future major revisions, as certain build steps might generate code in the future.

Once the library is appropriately included, you can start using cuneicode.

Simple Conversions

To convert from UTF-16 to UTF-8, use the appropriately c8 and c16-marked free functions in the library:

 1
 2#include <ztd/cuneicode.h>
 3
 4#include <ztd/idk/size.h>
 5
 6#include <stdio.h>
 7#include <string.h>
 8#include <stdlib.h>
 9
10int main() {
11
12	const ztd_char16_t utf16_text[] = u"πŸ₯ΊπŸ™";
13	ztd_char8_t utf8_text[9]        = { 0 };
14
15	// Now, actually output it
16	const ztd_char16_t* p_input = utf16_text;
17	ztd_char8_t* p_output       = utf8_text;
18	// ztd_c_array_size INCLUDES the null terminator in the size!
19	size_t input_size   = ztd_c_array_size(utf16_text);
20	size_t output_size  = ztd_c_array_size(utf8_text);
21	cnc_mcstate_t state = { 0 };
22	// call the function with the right parameters!
23	cnc_mcerror err = cnc_c16snrtoc8sn( // formatting
24	     &output_size, &p_output,       // output first
25	     &input_size, &p_input,         // input second
26	     &state);                       // state parameter
27	if (err != CNC_MCERROR_OK) {
28		const char* err_str = cnc_mcerror_to_str(err);
29		printf(
30		     "An (unexpected) error occurred and the conversion could not "
31		     "happen! Error string: %s\n",
32		     err_str);
33		return 1;
34	}
35
36	// requires a capable terminal / output, but will be
37	// UTF-8 text!
38	printf("Converted UTF-8 text: %s\n", (const char*)utf8_text);
39
40	return 0;
41}

We use raw printf to print the UTF-8 text. It may not appear correctly on a terminal whose encoding which is not UTF-8, which may be the case for older Microsoft terminals, some Linux kernel configurations, and deliberately misconfigured Mac OSX terminals. There are also some other properties that can be gained from the use of the function:

  • the amount of data read (using initial_input_size - input_size);

  • the amount of data written out (using initial_output_size - output_size);

  • a pointer to any extra input after the operation (p_input);

  • and, a pointer to any extra output that was not written to after the operation (p_output).

One can convert from other forms of UTF-8/16/32 encodings, and from the wide execution encodings/execution encoding (encodings used by default for const char[] and const wchar_t[] strings) using the various different prefixed-based functions.