How to detect Emojis With JavaScript | by Kesk -*- | Feb, 2022

Use common expression to match Emojis in Strings

Photograph by from

I just lately filtered an enormous Twitter timeline to investigate it utilizing a deep neural community. As , tweets can include completely different sorts of content material, together with emojis. So one of many first steps was to wash the information, on this case eradicating all emoticons from the timeline.

Though this may be achieved in some ways, I’ll present the way to do it with JavaScript as a result of it’s simple and quick, so let’s begin.

As you is perhaps guessing from the subtitle of this submit, we are going to use common expressions to do it.

Trendy browsers help Unicode property, which lets you match emojis primarily based on their belonging within the Emoji Unicode class. For instance, you should use Unicode property escapes like pEmoji or PEmoji to match/no match emoji characters. Notice that 0123456789#* and different characters are interpreted as emojis utilizing the earlier Unicode class. Due to this fact, a greater means to do that is to make use of the Extended_Pictographic Unicode class that denotes all of the characters sometimes understood as emojis as a substitute of the Emoji class.

Let’s see some examples.

Use p to match the Unicode characters

For those who use the “Emoji” Unicode class, you could get incorrect outcomes:

const withEmojis = /pEmoji/u
//true opps!
Instance 1

Due to this fact it’s higher to make use of the Extended_Pictographic scape as beforehand talked about:

const withEmojis = /pExtended_Pictographic/u
Instance 2

Use P to negate the match.

const noEmojis = /PExtended_Pictographic/u
Instance 3

As you possibly can see, that is a simple technique to detect Emojis, however if you happen to use our earlier withEmojis regex with a grouped emoji, you can be stunned by the end result.

const withEmojis = /pExtended_Pictographic/ugconst familyEmoji = '👨‍👩‍👧' console.log(familyEmoji.size) 
//(3) ['👨', '👩', '👧']
//*** opps!
Instance 4

As you possibly can see, if you happen to use the “replaceAll” methodology with our regex expression, you receive three: <***> as a substitute of 1 “<*.> This conduct happens as a result of the grouped Emoji is rendered as a single image however consists of multiple code level.

To keep away from this and different uncommon behaviors, you should use libraries like by s. This library affords an everyday expression to match all emoji symbols and sequences (together with textual representations of Emoji) as per the Unicode Commonplace.

I hope this little article might be helpful for you.

More Posts