|
The Domain Name System (DNS) is a hierarchical naming system for computers, services, or any resource
participating in the Internet. It associates various information with domain names assigned to such
participants. Most importantly, it translates domain names meaningful to humans into the numerical
(binary) identifiers associated with networking equipment for the purpose of locating and addressing
these devices world-wide. An often used analogy to explain the Domain Name System is that it serves as
the "phone book" for the Internet by translating human-friendly computer hostnames into IP addresses. For
example, www.example.com translates to 208.77.188.166.
The Domain Name System makes it possible to assign domain names to
groups of Internet users in a meaningful way, independent of each user's
physical location. Because of this, World-Wide Web (WWW) hyperlinks and
Internet contact information can remain consistent and constant even if
the current Internet routing arrangements change or the participant uses
a mobile device. Internet domain names are easier to remember than IP
addresses such as 208.77.188.166 (IPv4) or
2001:db8:1f70::999:de8:7648:6e8 (IPv6). People take advantage of this
when they recite meaningful URLs and e-mail addresses without having to
know how the machine will actually locate them.
The Domain Name System distributes the responsibility of assigning
domain names and mapping those names to IP addresses by designating
authoritative name servers for each domain. Authoritative name servers
are assigned to be responsible for their particular domains, and in turn
can assign other authoritative name servers for their sub-domains. This
mechanism has made the DNS distributed, fault tolerant, and helped avoid
the need for a single central register to be continually consulted and
updated.
In general, the Domain Name System also stores other types of
information, such as the list of mail servers that accept email for a
given Internet domain. By providing a world-wide, distributed
keyword-based redirection service, the Domain Name System is an
essential component of the functionality of the Internet.
Other identifiers such as RFID tags, UPC codes, International characters
in email addresses and host names, and a variety of other identifiers
could all potentially utilize DNS.
The Domain Name System also defines the technical underpinnings of the
functionality of this database service. For this purpose it defines the
DNS protocol, a detailed specification of the data structures and
communication exchanges used in DNS, as part of the Internet Protocol
Suite (TCP/IP). The DNS protocol was developed and defined in the early
1980s and published by the Internet Engineering Task Force.
The practice of using a name as a more human-legible abstraction of a
machine's numerical address on the network predates even TCP/IP. This
practice dates back to the ARPAnet era. Back then, a different system
was used. The DNS was invented in 1983, shortly after TCP/IP was
deployed. With the older system, each computer on the network retrieved
a file called HOSTS.TXT from a computer at SRI (now SRI International).
The HOSTS.TXT file mapped numerical addresses to names. A hosts file
still exists on most modern operating systems, either by default or
through configuration, and allows users to specify an IP address (eg.
208.77.188.166) to use for a hostname (eg. www.example.net) without
checking DNS. Systems based on a hosts file have inherent limitations,
because of the obvious requirement that every time a given computer's
address changed, every computer that seeks to communicate with it would
need an update to its hosts file.
The growth of networking required a more scalable system that recorded a
change in a host's address in one place only. Other hosts would learn
about the change dynamically through a notification system, thus
completing a globally accessible network of all hosts' names and their
associated IP Addresses.
At the request of Jon Postel, Paul Mockapetris invented the Domain Name
System in 1983 and wrote the first implementation. The original
specifications appear in RFC 882 and RFC 883. In November 1987, the
publication of RFC 1034 and RFC 1035 updated the DNS specification and
made RFC 882 and RFC 883 obsolete. Several more-recent RFCs have
proposed various extensions to the core DNS protocols.
In 1984, four Berkeley students.Douglas Terry, Mark Painter, David
Riggle and Songnian Zhou.wrote the first UNIX implementation, which was
maintained by Ralph Campbell thereafter. In 1985, Kevin Dunlap of DEC
significantly re-wrote the DNS implementation and renamed it BIND :
Berkeley Internet Name Domain. Mike Karels, Phil Almquist and Paul Vixie
have maintained BIND since then. BIND was ported to the Windows NT
platform in the early 1990s.
BIND was widely distributed, especially on Unix systems, and is the
dominant DNS software in use on the Internet. With the heavy use and
resulting scrutiny of its open-source code, as well as increasingly more
sophisticated attack methods, many security flaws were discovered in
BIND. This contributed to the development of a number of alternative
nameserver and resolver programs. BIND itself was re-written from
scratch in version 9, which has a security record comparable to other
modern Internet software.
The domain name space consists of a tree of domain names. Each node or
leaf in the tree has zero or more resource records, which hold
information associated with the domain name. The tree sub-divides into
zones beginning at the root zone. A DNS zone consists of a collection of
connected nodes authoritatively served by an authoritative nameserver.
(Note that a single nameserver can host several zones.)
Administrative responsibility over any zone may be divided, thereby
creating additional zones. Authority is said to be delegated for a
portion of the old space, usually in form of sub-domains, to another
nameserver and administrative entity. The old zone ceases to be
authoritative for the new zone.
A domain name usually consists of two or more parts (technically
labels), which are conventionally written separated by dots, such as
example.com.
The rightmost label conveys the top-level domain (for example, the
address www.example.com has the top-level domain com).
Each label to the left specifies a subdivision, or subdomain of
the domain above it. Note: .subdomain. expresses relative dependence,
not absolute dependence. For example: example.com is a subdomain of the
com domain, and www.example.com is a subdomain of the domain
example.com. In theory, this subdivision can go down 127 levels. Each
label can contain up to 63 octets. The whole domain name may not exceed
a total length of 253 octets. In practice, some domain registries
may have shorter limits.
A hostname refers to a domain name that has one or more associated
IP addresses; ie: the 'www.example.com' and 'example.com' domains are
both hostnames, however, the 'com' domain is not.
The Domain Name System is maintained by a distributed database system,
which uses the client-server model. The nodes of this database are the
name servers. Each domain or subdomain has one or more authoritative DNS
servers that publish information about that domain and the name servers
of any domains subordinate to it. The top of the hierarchy is served by
the root nameservers: the servers to query when looking up (resolving) a
top-level domain name (TLD).
The client-side of the DNS is called a DNS resolver. It is responsible
for initiating and sequencing the queries that ultimately lead to a full
resolution (translation) of the resource sought, e.g., translation of a
domain name into an IP address.
A DNS query may be either a recursive query or a non-recursive query.
A non-recursive query is one in which the DNS server may provide a
partial answer to the query (or give an error). A recursive query is one
where the DNS server will fully answer the query (or give an error). DNS
servers are not required to support recursive queries.
The resolver (or another DNS server acting recursively on behalf of the
resolver) negotiates use of recursive service using bits in the query
headers.
Resolving usually entails iterating through several name servers to find
the needed information. However, some resolvers function simplistically
and can communicate only with a single name server. These simple
resolvers rely on a recursive query to a recursive name server to
perform the work of finding information for them.
In theory a full host name may have several name segments, (e.g
ahost.ofasubnet.ofabiggernet.inadomain.example). In practice, full host
names will frequently consist of just three segments
(ahost.inadomain.example, and most often www.inadomain.example). For
querying purposes, software interprets the name segment by segment, from
right to left. At each step along the way, the program queries a
corresponding DNS server to provide a pointer to the next server which
it should consult.
A DNS recursor consults three nameservers to resolve the address
www.wikipedia.org.
The mechanism in this simple form has a difficulty: it places a huge
operating burden on the root servers, with every search for an address
starting by querying one of them. Being as critical as they are to the
overall function of the system, such heavy use would create an
insurmountable bottleneck for trillions of queries placed every day. In
practice caching is used to overcome this problem, and in actual fact
root nameservers deal with very little of the total traffic.
Name servers in delegations appear listed by name, rather than by IP
address. This means that a resolving name server must issue another DNS
request to find out the IP address of the server to which it has been
referred. Since this can introduce a circular dependency if the
nameserver referred to is under the domain that it is authoritative of,
it is occasionally necessary for the nameserver providing the delegation
to also provide the IP address of the next nameserver. This record is
called a glue record.
DNS also supports wildcard DNS records that will match requests for
non-existent domain names. A wildcard DNS record is specified by using a
"*" as the left most label (part) of a domain name, e.g. *.example.com.
The exact rules for when a wild card will match are specified in RFC
1034, but the rules are neither intuitive nor clearly specified. This
has resulted in incompatible implementations and unexpected results when
they are used.
Because of the huge volume of requests generated by a system like DNS,
the designers wished to provide a mechanism to reduce the load on
individual DNS servers. To this end, the DNS resolution process allows
for caching (i.e. the local recording and subsequent consultation of the
results of a DNS query) for a given period of time after a successful
answer. How long a resolver caches a DNS response (i.e. how long a DNS
response remains valid) is determined by a value called the time to live
(TTL). The TTL is set by the administrator of the DNS server handing out
the response. The period of validity may vary from just seconds to days
or even weeks.
As a noteworthy consequence of this distributed and caching
architecture, changes to DNS do not always take effect immediately and
globally. This is best explained with an example: If an administrator
has set a TTL of 6 hours for the host www.wikipedia.org, and then
changes the IP address to which www.wikipedia.org resolves at 12:01pm,
the administrator must consider that a person who cached a response with
the old IP address at 12:00 noon will not consult the DNS server again
until 6:00pm. The period between 12:01pm and 6:00pm in this example is
called caching time, which is best defined as a period of time that
begins when you make a change to a DNS record and ends after the maximum
amount of time specified by the TTL expires. This essentially leads to
an important logistical consideration when making changes to DNS: not
everyone is necessarily seeing the same thing you're seeing. RFC 1912
helps to convey basic rules for how to set the TTL.
Note that the term "propagation", although very widely used in this
context, does not describe the effects of caching well. Specifically, it
implies that when you make a DNS change, it somehow spreads to all other
DNS servers (instead, other DNS servers check in with yours as needed),
and that you do not have control over the amount of time the record is
cached (you control the TTL values for all DNS records in your domain,
except your NS records and any authoritative DNS servers that use your
domain name).
Some resolvers may override TTL values, as the protocol supports caching
for up to 68 years or no caching at all. Negative caching (the
non-existence of records) is determined by name servers authoritative
for a zone which MUST include the Start of Authority (SOA) record when
reporting no data of the requested type exists. The MINIMUM field of the
SOA record and the TTL of the SOA itself is used to establish the TTL
for the negative answer. RFC 2308
Many people incorrectly refer to a mysterious 48 hour or 72 hour
propagation time when you make a DNS change. When one changes the NS
records for one's domain or the IP addresses for hostnames of
authoritative DNS servers using one's domain (if any), there can be a
lengthy period of time before all DNS servers use the new information.
This is because those records are handled by the zone parent DNS servers
(for example, the .com DNS servers if your domain is example.com), which
typically cache those records for 48 hours. However, those DNS changes
will be immediately available for any DNS servers that do not have them
cached. And any DNS changes on your domain other than the NS records and
authoritative DNS server names can be nearly instantaneous, if you
choose for them to be (by lowering the TTL once or twice ahead of time,
and waiting until the old TTL expires before making the change).
The character set of a DNS label is modified US-ASCII. It.s eight-bit
clean except for the values 0x41 to 0x5A (uppercase letters) and the
values 0x61 to 0x7A (lowercase letters). These are considered equivalent
ranges for the purpose of searching and matching, but their distinctions
are retained on the wire and in presentation, in support of possible
mixed-case English language trademarks encoded as domain names.
Therefore, out of the 256 possible values that an octet can contain,
only 230 are unique. In practice, only printable US-ASCII letters and
numbers are used, and sometimes a hyphen for internal punctuation.
Internationalization of the DNS protocols has been ongoing for 10 years
and has no virtual market presence thus far.
Label case is supposed to be preserved when DNS data is cached and
forwarded, in support of possible trademarks. The internal data
structures that are universally used to support DNS caching, however,
keep only one copy of each label. For example, there will be only one
com TLD (top-level domain) label in a DNS cache, even though millions of
other domain names can be stored .under. that TLD. The net effect here
is that if the first .com domain you encounter uses all uppercase
letters for its TLD domain label, then all other .com domains you
encounter will appear the same way. Therefore, if you cache vix.com
first, then you will cache example.com, even if you really did hear
example.com next.
The trailing period (.) of a fully qualified domain name can be omitted
in presentation, which can mean either that the name is not fully
qualified and has to be searched in the default context or that it is
fully qualified. For example, if you are inside ACM world headquarters
and point your Web browser at internal, your resolver library will
likely assume that you mean internal.hq.acm.org. Disambiguation is
either by application convention or by actual searching (maybe you
really meant internal.acm.org). One common application convention is,
.If there are other period characters in the domain name, then assume
that the name is fully qualified.. (So, if you were in the San Francisco
office, your resolver library might assume that internal means
internal.acm.org but it would never guess that internal.hq means
internal.hq.acm.org.)
RRs used to describe downward delegations must be present both at the
bottom edge of the parent (delegating) zone and at the apex of the child
(delegated) zone. These records are expected to be identical, but
differences are common and the meaning of such differences is undefined.
The system is very robust in the face of this and other undefined
conditions, and protocol agents are prepared to retry pretty hard.and
try every possible data path.before giving up. (Thus are local
configuration errors transformed into silent resource drains on the
world at large.)
RRs are stored and transmitted in semi-atomic sets ( .
{data}). If a message cannot contain a full RR set, then the
transmitting agent can indicate that truncation has occurred. Receiving
agents are supposed to choose whether the omitted data warrants a new
transaction, but in practice, truncation always leads to a new
transaction since receivers literally don.t know what they.re missing.
The retry-after-truncation transaction will most likely use a more
expensive transport protocol (TCP vs. UDP).
RR sets expire from caches atomically, yet each resource record has its
own TTL. The implication is that the lowest TTL governs expiration, yet
these separate TTLs are maintained during transmission and storage. TTL
can be clamped to an implementation-specific value to help manage cache
size. Records received with TTL 0 expire at the conclusion of the
current transaction even if that transaction takes considerable total
time, possibly because of the need to fetch other data from other
servers to gather all the bits and pieces necessary for completion.
Every RR has a class, yet the protocol specification is unclear as to
whether the class is part of the tuple designator for an RR set or
whether it is, like the rdata field, part of each record.s payload.
Modern software interprets class as a zone qualifier, such that every
class can have its own namespace, and all records within any zone.and
therefore within any RR set.will have the same class. Other
interpretations are supported by scripture, but no current
implementation works any other way.
The question section of a query message is allowed to contain more than
one tuple, but this is undefined by the protocol and
is universally unimplemented. Modern convention requires the question
section of a response message to contain a copy of the question section
from the query. Implementers argue that the presence of the question
inside a response helps to disambiguate responses when transaction rates
are very high. This topic has been heavily debated within the DNS
standards community.
A response can contain an additional data section that carries
information that wasn.t requested but might, according to the responder,
be helpful in interpreting or consuming the answer. This data is
optional, and its absence can be the direct cause of subsequent
transactions. Its presence is often ignored, however, since this
optional data is by nature less secure than the answer that was actually
requested.
Two zones having different parents (msn.net and aol.com), whose name
servers are each inside the other zone (so, msn.net.s name server is
ns.aol.com, and aol.com.s name server is ns.msn.net), would be
unreachable. This is because an additional data record in a delegation
from the com zone for aol.com, which included the ns.msn.net address as
additional data, would be ignored as untrustworthy.likewise for a
delegation from net for msn.net containing ns.aol.com.s address as
additional data. There is no warning message when you do this, unless
you count your pager's incessant beeping as a warning message (since
that's what would happen next).
This information was gleaned from 20 years of implementation history
inside BIND (Berkeley Internet Name Domain). Most of it is not written
down anywhere, and some of it would still be considered arguable if you
got two or three DNS implementers in a room to talk about it.
Users generally do not communicate directly with a DNS resolver. Instead
DNS-resolution takes place transparently in client-applications such as
web-browsers, mail-clients, and other Internet applications. When an
application makes a request which requires a DNS lookup, such programs
send a resolution request to the local DNS resolver in the local
operating system, which in turn handles the communications required.
The DNS resolver will almost invariably have a cache (see above)
containing recent lookups. If the cache can provide the answer to the
request, the resolver will return the value in the cache to the program
that made the request. If the cache does not contain the answer, the
resolver will send the request to one or more designated DNS servers. In
the case of most home users, the Internet service provider to which the
machine connects will usually supply this DNS server: such a user will
either have configured that server's address manually or allowed DHCP to
set it; however, where systems administrators have configured systems to
use their own DNS servers, their DNS resolvers point to separately
maintained nameservers of the organization. In any event, the name
server thus queried will follow the process outlined above, until it
either successfully finds a result or does not. It then returns its
results to the DNS resolver; assuming it has found a result, the
resolver duly caches that result for future use, and hands the result
back to the software which initiated the request.
An additional level of complexity emerges when resolvers violate the
rules of the DNS protocol. A number of large ISPs have configured their
DNS servers to violate rules (presumably to allow them to run on
less-expensive hardware than a fully-compliant resolver), such as by
disobeying TTLs, or by indicating that a domain name does not exist just
because one of its name servers does not respond.
As a final level of complexity, some applications (such as web-browsers)
also have their own DNS cache, in order to reduce the use of the DNS
resolver library itself. This practice can add extra difficulty when
debugging DNS issues, as it obscures the freshness of data, and/or what
data comes from which cache. These caches typically use very short
caching times -- on the order of one minute. Internet Explorer offers a
notable exception: recent versions cache DNS records for half an
hour.
A Resource Record (RR) is the basic data element in the domain name
system. Each record has a type (A, MX, etc.), a TTL, a class and some
type-specific information. All resource records of the same type define
a Resource Record Set (RR set). The order that resource records in a RR
set are returned by the resolver to an application is undefined (the
server typically uses round-robin DNS). DNSSEC, however, works on
complete RR sets in a canonical order.
The NAME is the fully qualified domain name of the node in the tree. On
the wire, the name may be shortened using label compression where ends
of domain names mentioned earlier in the packet can be substituted for
the end of the current domain name.
The TYPE of the record indicates what the format of the data is, and
gives a hint of its intended use; for instance, the A record is used to
translate from a domain name to an IPv4 address, the NS record lists
which name servers can answer lookups on a DNS zone, and the MX record
is used to translate from a name in the right-hand side of an e-mail
address to the name of a machine able to handle mail for that address.
The RDATA is type-specific information, such as the actual IP address
for A records, or the mail host for MX records. Well known record types
may use label compression in the RDATA field, but "unknown" record types
can not (see RFC 3597).
The CLASS of a record is almost always set to "IN" or "Internet". There
are also the very rarely used "CH" (Chaos) and "HS" (Hesiod) classes. In
theory, each class can be completely independent trees with different
delegation DNS zones and different names, but in practice they all
mirrored the Internet class.
In addition to resource records defined in a zone file, there are also
some pseudo record types that are used only on the wire, such as to
perform zone transfers (AXFR/IXFR) or for EDNS (OPT).
While domain names technically have no restrictions on the characters
they use and can include non-ASCII characters, the same is not true for
host names. Host names are the names most people see and use for things
like e-mail and web browsing. Host names are restricted to a small
subset of the ASCII character set known as LDH, the Letters A.Z in upper
and lower case, Digits 0.9, Hyphen, and the dot to separate LDH-labels;
see RFC 3696 section 2 for details. This prevented the representation of
names and words of many languages natively. ICANN has approved the
Punycode-based IDNA system, which maps Unicode strings into the valid
DNS character set, as a workaround to this issue. Some registries have
adopted IDNA.
DNS was not originally designed with security in mind, and thus has a
number of security issues.
One class of vulnerabilities is DNS cache poisoning, which tricks a DNS
server into believing it has received authentic information when, in
reality, it has not.
DNS responses are traditionally not cryptographically signed, leading to
many attack possibilities; The Domain Name System Security Extensions
(DNSSEC) modifies DNS to add support for cryptographically signed
responses. There are various extensions to support securing zone
transfer information as well.
Even with encryption, a DNS server could become compromised by a virus
(or for that matter a disgruntled employee) that would cause IP
addresses of that server to be redirected to a malicious address with a
long TTL. This could have far-reaching impact to potentially millions of
Internet users if busy DNS servers cache the bad IP data. This would
require manual purging of all affected DNS caches as required by the
long TTL (up to 68 years).
Some domain names can spoof other, similar-looking domain names. For
example, "paypal.com" and "paypa1.com" are different names, yet users
may be unable to tell the difference when the user's typeface (font)
does not clearly differentiate the letter l and the numeral 1. This
problem is much more serious in systems that support internationalized
domain names, since many characters that are different, from the point
of view of ISO 10646, appear identical on typical computer screens. This
vulnerability is often exploited in phishing.
Techniques such as Forward Confirmed reverse DNS can also be used to
help validate DNS results.
The right to use a domain name is delegated by domain name registrars
which are accredited by the Internet Corporation for Assigned Names and
Numbers (ICANN), the organization charged with overseeing the name and
number systems of the Internet. In addition to ICANN, each top-level
domain (TLD) is maintained and serviced technically by a sponsoring
organization, the TLD Registry. The registry is responsible for
maintaining the database of names registered within the TLDs they
administer. The registry receives registration information from each
domain name registrar authorized to assign names in the corresponding
TLD and publishes the information using a special service, the whois
protocol.
Registrars usually charge an annual fee for the service of delegating a
domain name to a user and providing a default set of name servers. Often
this transaction is termed a sale or lease of the domain name, and the
registrant is called an "owner", but no such legal relationship is
actually associated with the transaction, only the exclusive right to
use the domain name. More correctly authorized users are known as
"registrants" or as "domain holders".
ICANN publishes a complete list of TLD registries and domain name
registrars in the world. One can obtain information about the registrant
of a domain name by looking in the WHOIS database held by many domain
registries.
For most of the more than 240 country code top-level domains (ccTLDs),
the domain registries hold the authoritative WHOIS (Registrant, name
servers, expiration dates, etc.). For instance, DENIC, Germany NIC,
holds the authoritative WHOIS to a .DE domain name. Since about 2001,
most gTLD registries (.ORG, .BIZ, .INFO) have adopted this so-called
"thick" registry approach, i.e. keeping the authoritative WHOIS in the
central registries instead of the registrars.
For .COM and .NET domain names, a "thin" registry is used: the domain
registry (e.g. VeriSign) holds a basic WHOIS (registrar and name
servers, etc.). One can find the detailed WHOIS (registrant, name
servers, expiry dates, etc.) at the registrars.
Some domain name registries, also called Network Information Centres
(NIC), also function as registrars, and deal directly with end users.
But most of the main ones, such as for .COM, .NET, .ORG, .INFO, etc.,
use a registry-registrar model. There are hundreds of Domain Name
Registrars that actually perform the domain name registration with the
end user (see lists at ICANN or VeriSign). By using this method of
distribution, the registry only has to manage the relationship with the
registrar, and the registrar maintains the relationship with the end
users, or 'registrants' -- in some cases through additional layers of
resellers.
In the process of registering a domain name and maintaining authority
over the new name space created, registrars store and use several key
pieces of information connected with a domain:
* Administrative contact. A registrant usually designates an
administrative contact to manage the domain name. The administrative
contact usually has the highest level of control over a domain.
Management functions delegated to the administrative contacts may
include management of all business information, such as name of record,
postal address, and contact information of the official registrant of
the domain and the obligation to conform to the requirements of the
domain registry in order to retain the right to use a domain name.
Furthermore the administrative contact installs additional contact
information for technical and billing functions.
* Technical contact. The technical contact manages the name servers
of a domain name. The functions of a technical contact include assuring
conformance of the configurations of the domain name with the
requirements of the domain registry, maintaining the domain zone
records, and providing continuous functionality of the name servers
(that leads to the accessibility of the domain name).
* Billing contact. The party responsible for receiving billing
invoices from the domain name registrar and paying applicable fees.
* Name servers. Most registrars provide two or more name servers as
part of the registration service. However, a registrant may specify its
own authoritative name servers to host a domain's resource records. The
registrar's policies govern the number of servers and the type of server
information required. Some providers require a hostname and the
corresponding IP address or just the hostname, which must be resolvable
either in the new domain, or exist elsewhere. Based on traditional
requirements (RFC 1034), typically a minimum of two servers is required.
|