Rosetta Code

UTF-8 encode and decode

Encode fixed code points into UTF-8 byte arrays and decode them back.

Medium View source

Source rosettacode/popular/utf_8_encode_and_decode.vibe

# title: UTF-8 encode and decode
# source: https://rosettacode.org/wiki/UTF-8_encode_and_decode
# category: Rosetta Code
# difficulty: Medium
# summary: Encode fixed code points into UTF-8 byte arrays and decode them back.
# tags: popular, strings, encoding, unicode
# vibe: 0.2

def utf8_encode(code_point)
  if code_point < 128
    [code_point]
  elsif code_point < 2048
    [192 + (code_point / 64), 128 + (code_point % 64)]
  elsif code_point < 65536
    [
      224 + (code_point / 4096),
      128 + ((code_point / 64) % 64),
      128 + (code_point % 64)
    ]
  else
    [
      240 + (code_point / 262144),
      128 + ((code_point / 4096) % 64),
      128 + ((code_point / 64) % 64),
      128 + (code_point % 64)
    ]
  end
end

def utf8_decode(bytes)
  if bytes.length == 1
    bytes[0]
  elsif bytes.length == 2
    ((bytes[0] - 192) * 64) + (bytes[1] - 128)
  elsif bytes.length == 3
    ((bytes[0] - 224) * 4096) + ((bytes[1] - 128) * 64) + (bytes[2] - 128)
  else
    ((bytes[0] - 240) * 262144) + ((bytes[1] - 128) * 4096) + ((bytes[2] - 128) * 64) + (bytes[3] - 128)
  end
end

def run
  code_points = [36, 162, 8364, 128578]
  rows = []
  index = 0

  while index < code_points.length
    code_point = code_points[index]
    bytes = utf8_encode(code_point)
    rows = rows.push({
      code_point: code_point,
      bytes: bytes,
      decoded: utf8_decode(bytes)
    })
    index = index + 1
  end

  rows
end

Output

Press run to execute run from this example.

rosetta-code popular strings encoding unicode browser-runner