Class InternetDomainName


  • public final class InternetDomainName
    extends java.lang.Object
    An immutable well-formed internet domain name, such as com or foo.co.uk. Only syntactic analysis is performed; no DNS lookups or other network interactions take place. Thus there is no guarantee that the domain actually exists on the internet.

    One common use of this class is to determine whether a given string is likely to represent an addressable domain on the web -- that is, for a candidate string "xxx", might browsing to "http://xxx/" result in a webpage being displayed? In the past, this test was frequently done by determining whether the domain ended with a public suffix but was not itself a public suffix. However, this test is no longer accurate. There are many domains which are both public suffixes and addressable as hosts; "uk.com" is one example. Using the subset of public suffixes that are registry suffixes, one can get a better result, as only a few registry suffixes are addressable. However, the most useful test to determine if a domain is a plausible web host is hasPublicSuffix(). This will return true for many domains which (currently) are not hosts, such as "com", but given that any public suffix may become a host without warning, it is better to err on the side of permissiveness and thus avoid spurious rejection of valid sites. Of course, to actually determine addressability of any host, clients of this class will need to perform their own DNS lookups.

    During construction, names are normalized in two ways:

    1. ASCII uppercase characters are converted to lowercase.
    2. Unicode dot separators other than the ASCII period ('.') are converted to the ASCII period.

    The normalized values will be returned from toString() and parts(), and will be reflected in the result of equals(Object).

    Internationalized domain names such as 网络.cn are supported, as are the equivalent IDNA Punycode-encoded versions.

    Since:
    5.0
    • Field Detail

      • DOTS_MATCHER

        private static final CharMatcher DOTS_MATCHER
      • DOT_SPLITTER

        private static final Splitter DOT_SPLITTER
      • DOT_JOINER

        private static final Joiner DOT_JOINER
      • MAX_PARTS

        private static final int MAX_PARTS
        Maximum parts (labels) in a domain name. This value arises from the 255-octet limit described in RFC 2181 part 11 with the fact that the encoding of each part occupies at least two bytes (dot plus label externally, length byte plus label internally). Thus, if all labels have the minimum size of one byte, 127 of them will fit.
        See Also:
        Constant Field Values
      • MAX_LENGTH

        private static final int MAX_LENGTH
        Maximum length of a full domain name, including separators, and leaving room for the root label. See RFC 2181 part 11.
        See Also:
        Constant Field Values
      • MAX_DOMAIN_PART_LENGTH

        private static final int MAX_DOMAIN_PART_LENGTH
        Maximum size of a single part of a domain name. See RFC 2181 part 11.
        See Also:
        Constant Field Values
      • name

        private final java.lang.String name
        The full domain name, converted to lower case.
      • parts

        private final ImmutableList<java.lang.String> parts
        The parts of the domain name, converted to lower case.
      • publicSuffixIndex

        private final int publicSuffixIndex
        The index in the parts() list at which the public suffix begins. For example, for the domain name myblog.blogspot.co.uk, the value would be 1 (the index of the blogspot part). The value is negative (specifically, NO_SUFFIX_FOUND) if no public suffix was found.
      • registrySuffixIndex

        private final int registrySuffixIndex
        The index in the parts() list at which the registry suffix begins. For example, for the domain name myblog.blogspot.co.uk, the value would be 2 (the index of the co part). The value is negative (specifically, NO_SUFFIX_FOUND) if no registry suffix was found.
      • DASH_MATCHER

        private static final CharMatcher DASH_MATCHER
      • DIGIT_MATCHER

        private static final CharMatcher DIGIT_MATCHER
      • LETTER_MATCHER

        private static final CharMatcher LETTER_MATCHER
      • PART_CHAR_MATCHER

        private static final CharMatcher PART_CHAR_MATCHER
    • Constructor Detail

      • InternetDomainName

        InternetDomainName​(java.lang.String name)
        Constructor used to implement from(String), and from subclasses.
    • Method Detail

      • findSuffixOfType

        private int findSuffixOfType​(Optional<PublicSuffixType> desiredType)
        Returns the index of the leftmost part of the suffix, or -1 if not found. Note that the value defined as a suffix may not produce true results from isPublicSuffix() or isRegistrySuffix() if the domain ends with an excluded domain pattern such as "nhs.uk".

        If a desiredType is specified, this method only finds suffixes of the given type. Otherwise, it finds the first suffix of any type.

      • from

        public static InternetDomainName from​(java.lang.String domain)
        Returns an instance of InternetDomainName after lenient validation. Specifically, validation against RFC 3490 ("Internationalizing Domain Names in Applications") is skipped, while validation against RFC 1035 is relaxed in the following ways:
        • Any part containing non-ASCII characters is considered valid.
        • Underscores ('_') are permitted wherever dashes ('-') are permitted.
        • Parts other than the final part may start with a digit, as mandated by RFC 1123.
        Parameters:
        domain - A domain name (not IP address)
        Throws:
        java.lang.IllegalArgumentException - if domain is not syntactically valid according to isValid(java.lang.String)
        Since:
        10.0 (previously named fromLenient)
      • validateSyntax

        private static boolean validateSyntax​(java.util.List<java.lang.String> parts)
        Validation method used by from to ensure that the domain name is syntactically valid according to RFC 1035.
        Returns:
        Is the domain name syntactically valid?
      • validatePart

        private static boolean validatePart​(java.lang.String part,
                                            boolean isFinalPart)
        Helper method for validateSyntax(List). Validates that one part of a domain name is valid.
        Parameters:
        part - The domain name part to be validated
        isFinalPart - Is this the final (rightmost) domain part?
        Returns:
        Whether the part is valid
      • parts

        public ImmutableList<java.lang.String> parts()
        Returns the individual components of this domain name, normalized to all lower case. For example, for the domain name mail.google.com, this method returns the list ["mail", "google", "com"].
      • isPublicSuffix

        public boolean isPublicSuffix()
        Indicates whether this domain name represents a public suffix, as defined by the Mozilla Foundation's Public Suffix List (PSL). A public suffix is one under which Internet users can directly register names, such as com, co.uk or pvt.k12.wy.us. Examples of domain names that are not public suffixes include google.com, foo.co.uk, and myblog.blogspot.com.

        Public suffixes are a proper superset of registry suffixes. The list of public suffixes additionally contains privately owned domain names under which Internet users can register subdomains. An example of a public suffix that is not a registry suffix is blogspot.com. Note that it is true that all public suffixes have registry suffixes, since domain name registries collectively control all internet domain names.

        For considerations on whether the public suffix or registry suffix designation is more suitable for your application, see this article.

        Returns:
        true if this domain name appears exactly on the public suffix list
        Since:
        6.0
      • hasPublicSuffix

        public boolean hasPublicSuffix()
        Indicates whether this domain name ends in a public suffix, including if it is a public suffix itself. For example, returns true for www.google.com, foo.co.uk and com, but not for invalid or google.invalid. This is the recommended method for determining whether a domain is potentially an addressable host.

        Note that this method is equivalent to hasRegistrySuffix() because all registry suffixes are public suffixes and all public suffixes have registry suffixes.

        Since:
        6.0
      • publicSuffix

        @CheckForNull
        public InternetDomainName publicSuffix()
        Returns the public suffix portion of the domain name, or null if no public suffix is present.
        Since:
        6.0
      • isUnderPublicSuffix

        public boolean isUnderPublicSuffix()
        Indicates whether this domain name ends in a public suffix, while not being a public suffix itself. For example, returns true for www.google.com, foo.co.uk and myblog.blogspot.com, but not for com, co.uk, google.invalid, or blogspot.com.

        This method can be used to determine whether it will probably be possible to set cookies on the domain, though even that depends on individual browsers' implementations of cookie controls. See RFC 2109 for details.

        Since:
        6.0
      • isTopPrivateDomain

        public boolean isTopPrivateDomain()
        Indicates whether this domain name is composed of exactly one subdomain component followed by a public suffix. For example, returns true for google.com foo.co.uk, and myblog.blogspot.com, but not for www.google.com, co.uk, or blogspot.com.

        This method can be used to determine whether a domain is probably the highest level for which cookies may be set, though even that depends on individual browsers' implementations of cookie controls. See RFC 2109 for details.

        Since:
        6.0
      • topPrivateDomain

        public InternetDomainName topPrivateDomain()
        Returns the portion of this domain name that is one level beneath the public suffix. For example, for x.adwords.google.co.uk it returns google.co.uk, since co.uk is a public suffix. Similarly, for myblog.blogspot.com it returns the same domain, myblog.blogspot.com, since blogspot.com is a public suffix.

        If isTopPrivateDomain() is true, the current domain name instance is returned.

        This method can be used to determine the probable highest level parent domain for which cookies may be set, though even that depends on individual browsers' implementations of cookie controls.

        Throws:
        java.lang.IllegalStateException - if this domain does not end with a public suffix
        Since:
        6.0
      • isRegistrySuffix

        public boolean isRegistrySuffix()
        Indicates whether this domain name represents a registry suffix, as defined by a subset of the Mozilla Foundation's Public Suffix List (PSL). A registry suffix is one under which Internet users can directly register names via a domain name registrar, and have such registrations lawfully protected by internet-governing bodies such as ICANN. Examples of registry suffixes include com, co.uk, and pvt.k12.wy.us. Examples of domain names that are not registry suffixes include google.com and foo.co.uk.

        Registry suffixes are a proper subset of public suffixes. The list of public suffixes additionally contains privately owned domain names under which Internet users can register subdomains. An example of a public suffix that is not a registry suffix is blogspot.com. Note that it is true that all public suffixes have registry suffixes, since domain name registries collectively control all internet domain names.

        For considerations on whether the public suffix or registry suffix designation is more suitable for your application, see this article.

        Returns:
        true if this domain name appears exactly on the public suffix list as part of the registry suffix section (labelled "ICANN").
        Since:
        23.3
      • hasRegistrySuffix

        public boolean hasRegistrySuffix()
        Indicates whether this domain name ends in a registry suffix, including if it is a registry suffix itself. For example, returns true for www.google.com, foo.co.uk and com, but not for invalid or google.invalid.

        Note that this method is equivalent to hasPublicSuffix() because all registry suffixes are public suffixes and all public suffixes have registry suffixes.

        Since:
        23.3
      • registrySuffix

        @CheckForNull
        public InternetDomainName registrySuffix()
        Returns the registry suffix portion of the domain name, or null if no registry suffix is present.
        Since:
        23.3
      • isUnderRegistrySuffix

        public boolean isUnderRegistrySuffix()
        Indicates whether this domain name ends in a registry suffix, while not being a registry suffix itself. For example, returns true for www.google.com, foo.co.uk and blogspot.com, but not for com, co.uk, or google.invalid.
        Since:
        23.3
      • isTopDomainUnderRegistrySuffix

        public boolean isTopDomainUnderRegistrySuffix()
        Indicates whether this domain name is composed of exactly one subdomain component followed by a registry suffix. For example, returns true for google.com, foo.co.uk, and blogspot.com, but not for www.google.com, co.uk, or myblog.blogspot.com.

        Warning: This method should not be used to determine the probable highest level parent domain for which cookies may be set. Use topPrivateDomain() for that purpose.

        Since:
        23.3
      • topDomainUnderRegistrySuffix

        public InternetDomainName topDomainUnderRegistrySuffix()
        Returns the portion of this domain name that is one level beneath the registry suffix. For example, for x.adwords.google.co.uk it returns google.co.uk, since co.uk is a registry suffix. Similarly, for myblog.blogspot.com it returns blogspot.com, since com is a registry suffix.

        If isTopDomainUnderRegistrySuffix() is true, the current domain name instance is returned.

        Warning: This method should not be used to determine whether a domain is probably the highest level for which cookies may be set. Use isTopPrivateDomain() for that purpose.

        Throws:
        java.lang.IllegalStateException - if this domain does not end with a registry suffix
        Since:
        23.3
      • hasParent

        public boolean hasParent()
        Indicates whether this domain is composed of two or more parts.
      • parent

        public InternetDomainName parent()
        Returns an InternetDomainName that is the immediate ancestor of this one; that is, the current domain with the leftmost part removed. For example, the parent of www.google.com is google.com.
        Throws:
        java.lang.IllegalStateException - if the domain has no parent, as determined by hasParent()
      • ancestor

        private InternetDomainName ancestor​(int levels)
        Returns the ancestor of the current domain at the given number of levels "higher" (rightward) in the subdomain list. The number of levels must be non-negative, and less than N-1, where N is the number of parts in the domain.

        TODO: Reasonable candidate for addition to public API.

      • child

        public InternetDomainName child​(java.lang.String leftParts)
        Creates and returns a new InternetDomainName by prepending the argument and a dot to the current name. For example, InternetDomainName.from("foo.com").child("www.bar") returns a new InternetDomainName with the value www.bar.foo.com. Only lenient validation is performed, as described here.
        Throws:
        java.lang.NullPointerException - if leftParts is null
        java.lang.IllegalArgumentException - if the resulting name is not valid
      • isValid

        public static boolean isValid​(java.lang.String name)
        Indicates whether the argument is a syntactically valid domain name using lenient validation. Specifically, validation against RFC 3490 ("Internationalizing Domain Names in Applications") is skipped.

        The following two code snippets are equivalent:

        
         domainName = InternetDomainName.isValid(name)
             ? InternetDomainName.from(name)
             : DEFAULT_DOMAIN;
         
        
         try {
           domainName = InternetDomainName.from(name);
         } catch (IllegalArgumentException e) {
           domainName = DEFAULT_DOMAIN;
         }
         
        Since:
        8.0 (previously named isValidLenient)
      • matchesWildcardSuffixType

        private static boolean matchesWildcardSuffixType​(Optional<PublicSuffixType> desiredType,
                                                         java.lang.String domain)
        Does the domain name match one of the "wildcard" patterns (e.g. "*.ar")? If a desiredType is specified, the wildcard pattern must also match that type.
      • matchesType

        private static boolean matchesType​(Optional<PublicSuffixType> desiredType,
                                           Optional<PublicSuffixType> actualType)
        If a desiredType is specified, returns true only if the actualType is identical. Otherwise, returns true as long as actualType is present.
      • toString

        public java.lang.String toString()
        Returns the domain name, normalized to all lower case.
        Overrides:
        toString in class java.lang.Object
      • equals

        public boolean equals​(@CheckForNull
                              java.lang.Object object)
        Equality testing is based on the text supplied by the caller, after normalization as described in the class documentation. For example, a non-ASCII Unicode domain name and the Punycode version of the same domain name would not be considered equal.
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object