How to find length of string in Solidity, from the smart contract of ens

Why bytes(str).size is just not sufficient for getting the size of the string in Solidity — and let’s perceive the strlen technique from contracts of ens

Discover size of string in solidity

On this planet of Javascript discovering the size of string is such a straightforward factor
simply do str.size and that’s all.

However strings will not be so pleasant to work inside Solidity.

In solidity, the string is a gaggle of characters saved inside an array and shops the information in bytes.

There is no such thing as a size technique in string kind.

I used to be going by means of Buildspace’s build-polygon-ens mission and located the hyperlink to StringUtils.sol

I knew tips on how to discover the size of the string in Solidity we will convert the string into bytes and discover the size of it. So it needs to be as simple as doing bytes(str).size; however the technique on this util file was a bit completely different:

// SPDX-License-Identifier: MIT
// Supply:
// https://github.com/ensdomains/ens-contracts/blob/grasp/contracts/ethregistrar/StringUtils.sol
pragma solidity >=0.8.4;
library StringUtils
/**
* @dev Returns the size of a given string
*
* @param s The string to measure the size of
* @return The size of the enter string
*/
perform strlen(string reminiscence s) inner pure returns (uint256)
uint256 len;
uint256 i = 0;
uint256 bytelength = bytes(s).size;
for (len = 0; i < bytelength; len++)
bytes1 b = bytes(s)[i];
if (b < 0x80)
i += 1;
else if (b < 0xE0)
i += 2;
else if (b < 0xF0)
i += 3;
else if (b < 0xF8)
i += 4;
else if (b < 0xFC)
i += 5;
else
i += 6;


return len;

It had this bizarre loop in code that I couldn’t perceive.

So, the developer in me googled it, however all of the articles I got here throughout did this to search out the size of the stringbytes(str).size; I discovered some comparable code on Stackoverflow however nobody truly defined what is occurring inside.

for(len = 0; i < bytelength; len++) 
bytes1 b = bytes(s)[i];
if(b < 0x80)
i += 1;
else if (b < 0xE0)
i += 2;
else if (b < 0xF0)
i += 3;
else if (b < 0xF8)
i += 4;
else if (b < 0xFC)
i += 5;
else
i += 6;

After 3 hours of self-exploration I used to be capable of determine it out myself (a little bit sluggish however I did it).

So I assumed let’s write it down so it might be useful for all the oldsters like me (not so skilled with bits, bytes 0️⃣1️⃣).

How bytes(str).size works

Once we convert string to bytes that is what Solidity does:

// if we do bytes("xyz"), solidity converts it as 
xyz -> 78 79 7a // 78=x, 79=y, 7a=z
ABC -> 41 42 43 // 41=A, 42=B, 43=C

Be aware: Use this website for changing strings to bytes.

In case you see every character generates 1 byte. That’s why once we do bytes(””).size we get the size of the string,

However there are some characters for which generated bytes are multiple. For instance:

€ -> e2 82 ac

For the image of the Euro, generated bytes are 3.

So if we attempt to discover the size of string which incorporates the image of Euro in it, the size returned by bytes(str).size, is not going to return the proper string size for this character as there are 3 bytes generated:

That’s when that for loop we have seen above involves the rescue.

Let’s iterate over this e2 82 ac bytes array and test what’s occurring inside that loop:

for(len = 0; i < bytelength; len++) 
bytes1 b = bytes(s)[i];
// b = e2 for first iteration
if(b < 0x80)
i += 1;
else if (b < 0xE0)
i += 2;
else if (b < 0xF0)
i += 3;
else if (b < 0xF8)
i += 4;
else if (b < 0xFC)
i += 5;
else
i += 6;

For the primary iteration b=e2 there’s a situation on the next line:

if(b < 0x80) 
i += 1;

Let’s decode this. This situation will principally examine decimal values of those hexadecimal characters:

0x80 -> 128
// our b is e2 in the meanwhile, decimal worth for e2 = 226
0xe2 -> 226

For normal characters, decimal conversion of their hex character might be < 128 , like for a, it’s 97

So, if we test all situations like this:

for(len = 0; i < bytelength; len++) 
bytes1 b = bytes(s)[i];
if(b < 0x80) //0x80 = 128 => 226 < 128 ❌
i += 1;
else if (b < 0xE0) //0xE0 = 224 => 226 < 224 ❌
i += 2;
else if (b < 0xF0) //0xF0 = 240 => 226 < 240 ✅
i += 3;

...

So, now our i is 3,

so the situation in for loop might be 3<3, which is fake and the loop will break, and the worth of

len might be 1 in the meanwhile

And that’s it, it’s the appropriate worth for the size of string “

If you wish to attempt some extra strings like “”, here’s a small checklist of characters that occupies greater than 1 byte:

€ -> e2 82 ac 
à -> c3 83
¢ -> c2 a2

Create, a random string something like abc¢Ã, for instance, and take a look at it out.

Ta-Da! And now it really works.

Need to Join?Join with me on Twitter: @pateldeep_eth Linkedin: LinkedinInitially printed at https://pateldeep.xyz/

More Posts