Charset Encoding
smpp-core includes SmppCharset for encoding SMS messages in GSM 7-bit and UCS-2 formats.
GSM 7-bit Encoding
The GSM 7-bit default alphabet is the most efficient encoding for SMS, supporting 160 characters per message.
import io.smppgateway.smpp.charset.SmppCharset;
// Encode to GSM 7-bit
String message = "Hello World!";
byte[] encoded = SmppCharset.encodeGsm7(message);
// Decode from GSM 7-bit
String decoded = SmppCharset.decodeGsm7(encoded);
Supported Characters
GSM 7-bit supports:
-
Basic Latin letters (A-Z, a-z)
-
Digits (0-9)
-
Common punctuation
-
Some special characters (@, £, $, ¥, etc.)
-
Extended characters via escape sequence (€, [, ], {}, etc.)
Check Encodability
String message = "Hello €100";
if (SmppCharset.canEncodeGsm7(message)) {
// Use GSM 7-bit (160 chars per SMS)
byte[] encoded = SmppCharset.encodeGsm7(message);
} else {
// Fall back to UCS-2 (70 chars per SMS)
byte[] encoded = SmppCharset.encodeUcs2(message);
}
Count Septets
Extended characters (like €) count as 2 septets:
int septets = SmppCharset.countGsm7Septets("Hello €100");
// Returns 12 (5 + 1 + 2 + 1 + 1 + 2 = 12, where € counts as 2)
// Check if message fits in single SMS (160 septets)
if (septets <= 160) {
// Single SMS
} else {
// Needs segmentation
}
UCS-2 Encoding
UCS-2 (UTF-16BE) supports the full Unicode BMP, but limits messages to 70 characters.
// Encode to UCS-2
String message = "Hello 世界!";
byte[] encoded = SmppCharset.encodeUcs2(message);
// Decode from UCS-2
String decoded = SmppCharset.decodeUcs2(encoded);
Use UCS-2 for:
-
Non-Latin scripts (Chinese, Arabic, Hindi, etc.)
-
Emojis and special symbols
-
Messages that can’t be encoded in GSM 7-bit
Latin-1 Encoding
For Latin-1 (ISO-8859-1) encoded messages:
byte[] encoded = SmppCharset.encodeLatin1(message);
String decoded = SmppCharset.decodeLatin1(encoded);
Data Coding Values
Set the appropriate DataCoding value in your PDU:
import io.smppgateway.smpp.types.DataCoding;
// GSM 7-bit default
SubmitSm pdu = SubmitSm.builder()
.shortMessage(SmppCharset.encodeGsm7(message))
.dataCoding(DataCoding.DEFAULT) // 0x00
.build();
// UCS-2
SubmitSm pdu = SubmitSm.builder()
.shortMessage(SmppCharset.encodeUcs2(message))
.dataCoding(DataCoding.UCS2) // 0x08
.build();
| DataCoding | Value | Description |
|---|---|---|
|
0x00 |
GSM 7-bit default alphabet |
|
0x01 |
IA5 (CCITT T.50) |
|
0x03 |
Latin 1 (ISO-8859-1) |
|
0x08 |
UCS-2 (UTF-16BE) |
Example: Auto-detect Encoding
public SubmitSm createSubmitSm(String message, Address source, Address dest) {
byte[] encoded;
DataCoding dataCoding;
if (SmppCharset.canEncodeGsm7(message)) {
encoded = SmppCharset.encodeGsm7(message);
dataCoding = DataCoding.DEFAULT;
} else {
encoded = SmppCharset.encodeUcs2(message);
dataCoding = DataCoding.UCS2;
}
return SubmitSm.builder()
.sourceAddress(source)
.destAddress(dest)
.shortMessage(encoded)
.dataCoding(dataCoding)
.build();
}