From the Winter 2024 Issue

UNISQUATTING IDN HOMOGRAPH ATTACKS

Danny Gershman
Founder and CEO | Radius Method

Unisquatting (a portmanteau for Unicode cybersquatting) or the Internationalized Domain Name (IDN) homograph attack is a fairly new type of mechanism that builds on several other types of Domain Name System (DNS) address attacks. The typosquat (Uniform Resource Locator (URL) hijacking) attack relies on being able to register a domain name very closely resembling another domain. For instance, a person may accidentally type exaample.com in their browser with the expectation of going to example.com.

Let’s say that example.com is a financial institution (Example Bank) for Alice. Alice typically goes to this site and logs in with her username and password. Alice is at home one Saturday morning, drinking her coffee, and decides to check her finances. She accidentally types exaample.com in her browser and presses enter. What she doesn’t know is that Mr. Orange had bought and registered exaample.com using an anonymous cryptocurrency like ZCash (making it untraceable), and developed the site to look and feel just like Example Bank. Mr. Orange reused all the HTML assets from example.com and made it look exactly the same. He even went as far as to purchase an SSL certificate. The minor difference that Alice cannot see is that when she enters her credentials, they do not go to Example Bank for authentication, but instead get submitted to Mr. Orange’s server logs. Mr. Orange now has Alice’s username and password and can now log into her account.

There are several ways to prevent such an attack. Alice can ensure that multi-factor authentication requiring possession of a physical token is enabled on her account.  In that case, even if Mr. Orange steals her username and password, his login attempt will be unsuccessful because he does not have the physical device necessary for the additional verification step.  (Note:  Physical devices can be a physical token, such as a YubiKey, an RSA SecurID fob, or an authenticator application running on a smartphone that generates a time-based one-time password (TOTP).)

Alice may also have made herself vulnerable by reusing the same username and password for other online accounts. She does this because she doesn’t want to have to memorize many different passwords, and remember which password is for which account.  (This is a common security error.)  If Mr. Orange can find other places where Alice has accounts (for instance her Email provider), he could try to authenticate (using the credentials stolen in the typosquatting attack on example.com) to those other services.  Using a password manager or vault which is locked with multi-factor authentication (a single complex password plus some combination of physical token possession and/or biometrics) would provide a secure way around having to memorize many passwords.  Additionally, it also addresses the issue of password reuse.

Example Bank can also take some precautionary measures. It could go and register a wide range of different typo variations and set up proper DNS forwarding so that the typo URLs resolve to the correct example.com site. This reduces the attack surface available to Mr. Orange.  Example Bank could also make use of a self-assessment tool that monitors their security posture.  Some of these tools have the ability to monitor for typosquats and send alerts when inappropriate usage is detected.  Once Example Bank is aware of the malicious typosquatting, it can take actions up to and including acquiring the typosquat domains to protect their customers from fraud.

Mr. Orange, unfortunately, hasn’t been standing still.  As Alice and Example Bank take measures to reduce the attack surface, Mr. Orange has been coming up with more sophisticated ways to compromise Alice’s security.

Initially, the Internet could only use domain names based on the ASCII character set. The ASCII character set is based on the English language and has approximately 90 available characters. This includes letters, numbers and some symbols. Domain names are further limited to letters, digits, and hyphens (also known as the LDH subset). Unicode, on the other hand is a much larger character set, which as of the time of publication of this article, has some 160,000 available characters to use (this includes emojis).

The Internet, which has been around since the 1960s, has had to cope with a number of gaps in functionality. Although it would be theoretically possible to require all DNS resolvers in the world to support the complete Unicode character set, it requires a massive infrastructure coordination effort that would be, for practical purposes, impossible.  In the early 2000s, discussion about how to support a wider character set for domain names began with RFC 34901, which attempted to address the need for Internationalized Domain Names (IDN).

What resulted was an algorithm called Punycode. Punycode allows an application to take a Unicode string and represent it using the LDH subset of ASCII.  The owner can then register the Punycode domain and set up appropriate DNS records. For example, let’s say Bob wants to have a website in Hebrew called  םלועםולש.com. When he configures the DNS records for this domain, he would configure its Punycode representation, which is xn--9dbatcdc6a4c.com. From there, Bob can ensure that all the DNS resolvers in the world will be able to connect to his designated endpoints.

Within the Unicode character set are several subsets of lookalike characters.  In some cases, Cyrillic characters look exactly the same as Roman characters. So if Mr. Orange wanted to unisquat the example.com domain, he could buy and register xn--xample-2of.com. This domain has a mix of ASCII and Unicode, but as you can see here, the “e” looks exactly like an English “e” from the Unicode table. (See Figure 1).

 

chart 1
Figure 1 (Screenshot from unicode-tablecom)

 

From this, he could set up DNS records to point to the fake Example Bank website. There are many other characters available in the Unicode character set that could also be used for such a deception.  Tools like https://www.punycoder.com/ can be used to generate these lookalike domains.

This sort of attack is different from typosquatting.  It’s unlikely that Alice will accidentally type a Cyrillic character while typing in her browser location bar.  As a result, Mr. Orange will need to combine unisquatting with a phishing campaign.  Consequently, Mr. Orange decides to register a mail exchanger (MX) record, making the unisquat domain a valid email sending domain, and crafts an email with links to his cloned site.  Most anti-phishing security training exercises suggest hovering over a link to ensure that a visual link is not pointing to a different website. In this case, the from address and the links in the email will be visually identical to example.com.

This attack is very difficult to spot, even to a trained eye. There are mechanisms in newer versions of some browsers2 that will automatically convert the string to Punycode, even though all the characters are not from the same Unicode subset. This however, would not prevent an email click from opening a page and causing malware to infect the victim’s system. By the time the Punycode is observed in the location bar, it would be too late.  There are also some browsers that will allow disabling Unicode altogether, but this would prevent users from visiting a valid site. (See Figure 2)

The attack can also be detected by reviewing a Secure Sockets Layer (SSL) certificate because SSL certificates must be registered to the Punycode equivalent.  Although some of this can be handled through training, it’s impractical to manually review certificates on each web visit.   As can be seen, there’s a lot of room for improvement here.

One solution would be to have browsers prevent mixing Unicode and ASCII in the same domain name. Another would be to have lookalike characters trigger a prominent warning and temporarily block the site until Alice has a chance to review it. This would be similar to a browser detecting that a site is infected with malware or that it fails a reputation check. This would encourage the Bobs of the world to carefully consider decisions to create their domain names with Unicode.  If they took such precautions, Mr. Orange would have a more difficult time convincing victims that his version of the Example Bank website was legitimate.

From an organizational perspective, there are some good tools out there that can alert on registered domain squatting.  Ensuring that internal security awareness training is up-to-date can mitigate the risks posed by phishing attacks.  Performing periodic phishing campaigns internally is good practice and can help identify team members that need extended training.  Additionally, ensure that single sign-on and multi-factor authentication is provisioned in as many places as possible to reduce the attack surface.


Sources

  1. https://tools.ietf.org/html/rfc3490
  2. Chrome browsers 58.0.3029.81 and later, Opera 44.0.2510.1449 and later, Firefox 57 and later

Danny Gershman

Leave a Comment