Categories
PHP

Split, tokenize, and iterate a string

In this tutorial, we’ll discuss the str_split() and mb_str_split() functions to split a string into an array based on a specified length, strtok() function to split a string into smaller strings (tokens), and chunk_split() to split a string into smaller chunks.

  1. str_split()
  2. mb_str_split()
  3. strtok()
  4. chunk_split()

Convert a string to an array with str_split()

<?php
 //Syntax
 str_split(string $string, int $length = 1): array

The str_split() function takes two parameters:

  1. $string: the input string.
  2. $length (optional): maximum length of the chunk, default is 1.

This function splits a string into an array based on a specified length.

<?php
 $str = 'abcdef';
 $ar1 = str_split($str);
 # ['a','b','c','d','e','f']

 $ar2 = str_split($str, 2);
 # ['ab','cd','ef']
 
 print_r($ar1);
 /*Array (
    [0] => a
    [1] => b
    [2] => c
    [3] => d
    [4] => e
    [5] => f )*/
 print_r($ar2);
 /*Array(
    [0] => ab
    [1] => cd
    [2] => ef)*/

Note: use mb_str_split() to deal with a multi-byte string.

mb_str_split()

<?php
//Syntax
mb_str_split(string $string, int $length = 1, ?string $encoding = null): array

This function takes three parameters:

  1. $string: the input string
  2. $length (optional): maximum length of the chunk, default is 1.
  3. $encoding (optional): the character encoding, if not provided, the internal character encoding value will be used.

The str_split() function deals with single-byte characters but it can not handle the multi-byte characters, see the following example:

<?php
 $string = '€£Ͻڻ➿';
 $array = str_split($string);
 print_r($array);
 #Prints: [0] => � [1] => � [2] => � ...

The following example uses the mb_str_split() function that can handle multi-byte characters:

<?php
 $string = '€£Ͻڻ➿';
 $array  = mb_str_split($string);
 print_r($array);
/* Prints:
[0] => €
[1] => £
[2] => Ͻ
[3] => ڻ
[4] => ➿ */

 $array  = mb_str_split($string, 2, 'UTF-8');
 print_r($array);
/* Prints:
[0] => €£
[1] => Ͻڻ
[2] => ➿ */

Tokenize string with strtok()

<?php
 //Syntax
 strtok(string $string, string $token): string|false

This function takes two parameters:

  1. $string: the input string.
  2. $token: a delimiter to split string.

The strtok($string, $token) returns the first part of the string, and the subsequent calls requires only the $token parameter, so the strtok($token) returns the next token of the string. This function returns FALSE when there are no more tokens to be returned. See the following code:

<?php
 $string = 'a.b,c.d';
 $token = '.,';
 // initialized
 echo strtok($string, $token); # a
 // jump to next token
 echo strtok($token); # b
 // jump to next token
 echo strtok($token); # c
 // jump to next token
 echo strtok($token); # d
 // jump to next token
 echo strtok($token); # false, prints nothing

You can use the while loop to make subsequent calls until the function reaches the end of the string, see example:

<?php
 $string = 'a.b.c.d';
 $token = '.';
 // initialized
 $tok = strtok($string, $token);
 while ($tok !== false) {
  echo $tok;

  // jump to next token
  $tok = strtok($token);
 }
//Prints: abcd

Split a string into smaller chunks using chunk_split()

<?php
 //Syntax
 chunk_split(string $string, int $length = 76, string $separator = "\r\n"): string

This function has three parameters:

  1. $string: The string to be chunked.
  2. $length: The chunk length, default value is 76.
  3. $separator: The line ending sequence, default value is \r\n.

By default, the chunk_split() function returns a chunk length of 76 with a trailing CRLF (\r\n), leaving the original string untouched.

Example: Split a string into smaller chunks

<?php
 $v = 'By default, the chunk_split() function returns a chunk length of 76 with a trailing CRLF';
 echo chunk_split($v, 15, '<br>');

Output of chunk_split($v, 15, '<br>');

The chunk_split() function usually used along with the base64_encode() function to accomplish RFC 2045 standards for sending an email attachment. See example:

<?php
 $text = 'Welcome to BrainBell.com...'.
         'Welcome to BrainBell.com...'.
         'Welcome to BrainBell.com...'.
         'Welcome to BrainBell.com...'.
         'Welcome to BrainBell.com...'.
         'Welcome to BrainBell.com...';
 $encoded = base64_encode($text);
 $chunked = chunk_split($encoded);
 echo $chunked;

The following output prints on a web browser:

Base64 encoded text, chunked with chunk_split() function

Working with arrays: