About This Paper
A
first draft of this paper was released for review by members of the
CNI Access Management list on March 28, 1998 and generated a great
deal of electronic discussion within the closed CNI-AUTHENTICATE
mailing list. This was followed by a meeting in Washington DC on April
5, 1998 to review and discuss the draft paper and comments generated
on the list up to that date. The revision has also benefited from
discussions at a Digital Library Federation/National Science
Foundation Workshop held in Washington on April 6, 1998 on closely
related issues. My thanks to all who contributed.
This
version, which incorporates many of the ideas from this process, is
being prepared for distribution at the Spring CNI Task Force meeting
in Washington DC, April 14-15; it is also being placed on the CNI web
site (www.cni.org)
for wider dissemination. Note that in some places time did not permit
me to fully incorporate earlier comments or to research questions that
were identified, and I have tried to indicate where changes will be
made prior to the preparation of the final version. The paper also
still needs some considerable editorial work, and I ask readers to be
forgiving of editorial problems. Comments are invited and should be
sent to <cliff@cni.org>.
About 10 May, 1998, I will prepare a final version of the white paper
which will be placed on the CNI web site.
Return
to Contents
1.0
Introduction
As
institutions implement networked information strategies which call for
sharing and licensing access to information resources in the networked
environment, authentication and access management have emerged as
major issues which threaten to impede progress. While considerable
work has been done over the last two decades on authentication within
institutions and, more recently, in support of consumer-oriented
electronic commerce on the Internet, a series of new technical and
policy issues emerge in the cross-organizational authentication and
access management context. This white paper, which is being prepared
by the Coalition for Networked Information in conjunction with a large
group of volunteer reviewers and contributors, is intended to serve
several purposes:
- To identify and scope the new issues that emerge
in the cross-organizational setting and to provide a framework for
analyzing them.
- To map out the various best-practice approaches
to solving these problems using existing and emerging technology so
that institutions and information providers can make informed
choices among the alternatives and consider how these choices relate
to institutional authentication and access management strategies.
- To provide a common vocabulary and framework to
assist in the development of licensing and resource-sharing
agreements, and to highlight technical and policy considerations
that need to be addressed as part of these business negotiations.
- To lay the foundation for possible follow-on
formal or de facto community standards development in access
management. If large scale use of networked information resources is
to flourish, we need to move away from the specialized case-by-case
access management systems in use today and towards a small number of
general approaches which will let institutionally-based access
management infrastructures interoperate with arbitrary resources.
Return
to Contents
2.0
Defining The Cross-Organizational Access Management Problem
The
basic cross-organizational access management problem is exemplified by
most licensing agreements for networked information resources today;
it also arises in situations where institutions agree to share
limited-access resources with other institutions as part of consortia
or other resource sharing collaborations. In such an agreement, an
institution -- a university, a school, a public library, a corporation
-- defines a user community which has access to some network resource.
This community is typically large, numbering perhaps in the tens of
thousands of individuals, and membership may be volatile over time,
reflecting for example the characteristics of a student body. The
operator of the network resource, which may a web site, or a resource
reached by other protocols such as Telnet terminal emulation or the
Z39.50 information retrieval protocol needs to decide whether users
seeking access to the resource are actually members of the user
community that the licensee institution defined as part of the license
agreement.
Note
that the issue here is not how the licensee defines the user community
-- for example how a university might define students, staff members
and faculty (all of the problems about alumni, part time and extension
students, adjunct faculty, affiliated medical staff and the like); it
is assumed that the institution and the resource operator have reached
some satisfactory resolution on this question. Rather, the issue is
one of testing or verifying that individuals are really a member of
this community according to pre-agreed criteria, of having the
institution vouch for or credential the individuals in some way that
the resource operator can understand. Such arrangements are often
called "site" licenses, but this term is really inaccurate; while
physical presence at a specific site may be one criteria for having
access, a better term is "group" license or "community" license,
emphasizing that the key consideration is membership in some
community, and that physical location is often not the key membership
criteria.
Progress in inter-organizational access management will benefit
everyone. To the extent that resource operators and licensing
institutions can agree on common methods for performing this
authentication and access management function, it greatly facilitates
both licensing and resource sharing by making it quick, easy and
inexpensive to implement business arrangements. It benefits users by
making their navigation through a network of resources provided by
different operators more seamless and less cumbersome. The central
challenge of cross-institutional access management is not to set up
barriers to access; it is to facilitate access in a responsible
fashion, recognizing the needs of all parties involved in the access
arrangements.
While
this white paper will give some particular emphasis to issues that
arise in the higher education and library communities (particularly at
the policy level) the problem under consideration here is very
general, and in fact occurs in general corporate licensing of
networked information services, or cooperation among business
partners.
As we
will see in the next section, not only are there questions about how
best to accomplish this technically, there are also a series of
intertwined policy and management considerations which need to be
considered.
The
focus here is on group licenses that may be subject to some additional
constraints (for example concurrent user limits) rather than on
transactional models where individual users may take actions to incur
specific incremental costs back to the licensing institution over and
above base community licensing costs. Any incremental cost
transactional model will need to incorporate at least two additional
features: a set of user constraints that become part of the attributes
for each authenticated user and which are made available to the
resource operator, and a means by which the resource operator can
obtain permission for transactions by passing a query back to the
licensing institution. This involves a much more complex trust,
liability and business relationship between resource operator and
licensing institution, as well as consideration of financial controls
and a careful assessment of security threats. It will not be
considered further here.
Note
that there are several other cross-organizational authentication,
authorization and access management issues which are beyond the scope
of this paper, including the authentication of service providers and
verifying the integrity and provenance of information retrieved from
networked resources.
2.1
Terminology and Definitions
Throughout the rest of this paper we'll use the general terms
"resource operator" to cover publishers, web site operators, and other
content providers (including libraries and universities in their roles
as providers of content), and "licensee institution" to cover
organizations such as universities or public libraries that arrange
for access to resources on behalf of their user communities.
Authentication and authorization actually have very specific meanings,
though the two processes are often confounded, and in practice are
often not clearly distinguished. We will use the term "access
management" to describe broader systems that may make use of both
authentication and authorization services in order to control use of a
networked resource.
Authentication is the process where a network user establishes a right
to an identity -- in essence, the right to use a name. There are a
large number of techniques that may be used to authenticate a user --
passwords, biometric techniques, smart cards, certificates. Note that
names need not correspond to the usual names of individuals in the
physical world. A user may have the rights to use more than one name:
we view this as a central philosophical assumption in the
cross-organizational environment. There is a scope or authority
problem associated with names; in essence, when a user is authorized
to use an identity this is a statement that some organization has
accepted the user's right to that name. For authorization within
an institution this issue often isn't important, and in some schemes a
user may only have a single identity; for cross-organizational
applications such as those of interest here, this relativistic
character of identity is of critical importance. A user may have
rights to use identities established by multiple organizations (such
as universities and scholarly societies) and more than one identity
may figure in an access management decision. Users may have to decide
what identity to present to a resource: they may have access because
they are a member of a specific university's community, or a member of
a specific scholarly society, for example. Making these choices will
be a considerable burden on users, much like trying to shop for the
best discount rate on a service that offers varying discounts to
different membership and affinity groups (corporate rate, senior
citizen rate, weekly rate, government rate, etc.).
A
single, network-wide (not merely institution wide) access management
authority would simplify many processes by allowing rights assigned to
an individual by different organizations to become attributes of a
master name rather than having them embodied in different names
authorized by different organizations; yet such a centralized identity
system probably represents an unacceptable concentration of power, as
well as being technically impractical at the scale we will ultimately
need. It should be noted that within the UK Athens project we can see
a model of a rather centralized authorization system which has been
scaled successfully to quite a large number of users, and which by
virtue of its centralized nature has allowed rapid progress in wide
access to networked information. The Athens experience and the factors
-- technical, social, cultural, and legal -- that have enabled it to
work in the UK call for very careful study as we consider approaches
for other nations such as the US.
A name
or identity has attributes associated with it. These may be
demographic in nature -- for example, this identity signifying a
faculty member in engineering, or signifying a student enrolled in a
specific course -- or they may capture permissions to use resources.
Attributes may be bound closely to a name (for example, in a
certificate payload) or they may be stored in a directory or other
demographic database under a key corresponding to the name. Attributes
may change over time; for example, from semester to semester the set
of courses that a given identity is associated with may well change.
Just because some system on a network has knowledge of a name does not
necessarily imply that it has access to attributes associated with
that name. There is a fine line between rights to names
(authentication) and attributes; for some purposes, simply knowing
that a user has a right to a name from a given authorizing authority
may itself represent sufficient information (an implicit attribute, if
one wishes) that can support access management decisions.
Authorization is the process of determining whether an identity (plus
a set of attributes associated with that identity) is permitted to
perform some action, such as accessing a resource. Note that
permission to perform an action does not guarantee that the action can
be performed; for example, a common practice in cross-organizational
licensing is to further limit access to a maximum number of concurrent
users from among an authorized user community.
Note
that authentication and authorization decisions can be made at
different points, by different organizations.
Some
libraries are establishing consortia which involve reciprocal
borrowing and user-initiated interlibrary loan services; in a real
sense these consortia are developing what amounts to a union or
distributed shared patron file. One can view this as moving beyond
just common authentication and access management to a system of shared
access to a common directory structure for user attributes, and a
common definition of user attributes among the consortium members.
This is an example of a situation where very rich attributes are
available to each participant in the consortium as they make
authorization decisions; interlibrary loan and reciprocal borrowing
represent a much richer and more nuanced set of actions than would be
typical of a networked information resource.
A subsection on models for access management,
discussing the locus of authorization decisions and trust
relationships between there resource operator and licensing
institution, will probably be added here in the next revision.
Return
to Contents
3.0
Evaluation and Analysis Criteria
We will
be examining a number of different proposed solutions to the access
management problem. Before describing and analyzing these proposed
solutions, this section considers the various requirements that a
viable solution needs to address. Obviously, there are trade-offs
which will need to be made among the conflicting goals in the context
of each specific resource access arrangement, and institutions will
have to make policy choices about the relative importance of the
various requirements.
3.1
Feasibility and Deployability
First
and foremost, the authentication and access management solution needs
to work at a practical level. From the user's perspective, it should
facilitate access, minimizing redundant authentication interactions
and providing a single-signon, user-friendly view of the array of
available networked information resources. It needs to scale; it must
be feasible for institutions to deploy and manage for large and
dynamic populations of community members. It needs to be sufficiently
robust and simple so that user support issues are tractable; for
example, a forgotten password should not be an intractable problem. It
needs to be affordable.
From
the resource operator viewpoint, a viable access management system
should not require a vast amount of ongoing production and
maintenance. Configuration to add a new licensing institution should
be simple, and ongoing maintenance of that configuration should not
call for large amounts of information to be interchanged between
resource operator and licensing institution on an ongoing basis (such
as file updates). Software parameter changes -- not new software --
should be necessary to add additional institutions. There should be a
clean, simple, and well-defined (standard) interface between resource
operator and licensing institution. A systems or network failure at
one institution should not degrade a resource operator's service to
other licensing institutions.
Practical solutions are inextricably linked to the installed base of
software. Ideally, all of the software needed to implement an
authentication and access management solution should be available
either commercially or as free software. Good solutions will leverage
off of the installed technology base, and also current investments in
upgrading that technology base: they should not be specific to
libraries or even to higher education if possible, at a mechanism
level (though libraries or higher educational institutions may use
these mechanisms in conjunction with policies that vary from those
common in the corporate or consumer markets). Most importantly, the
software support that end users require should be available in common
packages -- such as web browsers -- that are already part of the
installed base. Any solution that requires custom specialized software
to be installed on every potential user's desktop machine starts with
a severe handicap. Similarly, any solution requiring specialized
hardware, such as biometric systems or smart card readers, is
certainly not going to be feasible on a cross-institutional basis, and
while it might imaginably be workable within an institution's internal
authentication system, some other technique would be needed to convey
cross-organizational access management data. Few resource providers
will be willing to limit access to users equipped with such
specialized facilities.
Software isn't enough; there is also the question of whether the user
knows how to configure and employ it. For example, current web
browsers contain considerable support for client-side certificates and
proxies, but few users know how to use these features. Education about
an existing software base is easier than first replacing or upgrading
an installed software base and then teaching users how to
employ the new software, but it's still a substantial issue.
Kerberos is an interesting case study of the feasibility constraints.
An institution could certainly make a successful decision to deploy
Kerberos as a local authentication system by placing Kerberos
support software on each user's workstation (perhaps via a site
license to a vendor); however, inter-realm Kerberos is probably too
intimate a connection between resource operators and licensee
institutions to be viable, and most resource operators would also
reject Kerberos as an inter-organizational approach because of the
requirements it places on end user systems at institutions that were
not using Kerberos for local authentication. In the cases where
Kerberos is being used for inter-organizational resource sharing, I
believe that one could argue that the participating institutions
(typically consortium members) have made commitments to link their
administrative and other support systems at a much more sophisticated
level than one would find in the typical resource operator - licensing
institution relationship and are coming more to resemble a single
"consortium institution" with an internal (local) authentication
system.
Any
solution also needs to reflect current realities; in particular, it
must be able to recognize the need for a user community member to
access a resource both independent of his or her physical location
(for example, a user must be able to connect to the internet via a
commercial ISP, a mobile IP link, or a cable television internet
connection from home), and also the need for people to access
resources by virtue of their location (for example, access may be
granted to anyone who is physically present in a library, whether or
not they are actually members of the licensee institutional
community).
3.2
Authentication Strength
The
solution needs to be reasonably secure. The resource operator needs
confidence that an attacker can't forge a credential easily. All
parties need confidence that credentials cannot easily be stolen by
eavesdropers on the net (for example, through sniffer attacks), and
that they cannot be stolen easily from a user that exercises
reasonable precautions. Also, systemic compromise is a concern: this
is a very real difference between having an individual user's
credentials compromised (in which case they can be canceled and new
ones issued) and having the system as a whole compromised, which might
call for reissuing credentials to everybody in the user community.
Authentication strength is a somewhat subjective question. For many of
the approaches that we will discuss, strength comes from the details
of cryptographic algorithms and key lengths used; but part lies also
in overall system design and implementation and in the realities of
user behavior, and this can often be the source of the largest number
of vulnerabilities. Some level of reason is called for here; most of
the resources being access controlled, while certainly valuable
assets, do not represent immanent dangers to public safety or national
security if access control is breached. An access management system
needs to be complemented by monitoring and other controls on the part
of the resource operator to limit the impact of a breach. Further,
there are after-the-fact legal remedies which can be applied to limit
the damage caused by such a breach.
The
cryptographic technology underlying many access management systems is
legally sensitive on an international export and import basis, and may
also be constrained by various national laws (though within the US,
cryptographic technology can be employed freely, at least today).This
is important for several reasons: resource access may cross national
boundaries, and also because members of an institution's user
community may need to access networked information resources when
traveling outside of their home nation. We will see international
resource sharing consortia, and also see institutions in one nation
licensing access to resources in other nations.
It
should be noted that virtually any strong access management system
that incorporates general purpose cryptographic services will be
illegal for export since all strong cryptographic implementations for
general encryption/decryption are export controlled in the United
States under current laws governing trafficking in arms. Note however
that it may be possible for members of a user community traveling
abroad to export cryptographic software for temporary personal use
under some specific limitations; depending on where they are traveling
it may or may not be legal for them to use it under the laws of the
country they are in at the time. Matters are more complex than they
may seem, however, because US export control laws are mostly concerned
with cryptography that can support encryption (for confidentiality or
concealment); export licensing of systems specifically for
authentication or digital signatures which do not serve dual use as
encryption systems has been much less of a problem. Consideration of
the legality of developing, importing, exporting, and operating of
access management systems outside the US needs to be analyzed on a
country-by-country basis; laws vary considerably.
3.3
Granularity and Extensibility
There
is a need for fine-grained access control where institutions want to
limit resource access to only individuals registered for a specific
class; this arises in electronic reserves and distance education
contexts, especially when a class may be offered to students at
multiple institutions. Other variations are also possible: limiting
access to law students, to faculty, to graduate students and faculty
in physics. This sort of fine-grained access management is likely to
be very complex, since there will be great variation from institution
to institution in how groups of users are identified, named and
specified. There is also some overlap between fine-grained
authentication and demographic information that may be needed to
generate management information (discussed below).
Granularity of access has been one of the most controversial issues in
the discussions of the first draft of this paper and related issues.
Without arguing against the need for fine-grained access control for
some applications, I will summarize a few observations:
- At present, most access to network information
resources is not controlled on a fine-grained basis. There is
a very real danger that by accommodating all of the needs for
fine-grained access management into the basic access management
mechanisms we will produce a system that is too complex and costly
to see wide-spread implementation anytime soon.
- The information needed to support fine-grained
access management probably needs to be kept within institutions for
privacy reasons, and should be treated as attributes to an identity
rather than expressed as additional identities (in other words, one
should record that a user with a given identity happens to be
enrolled as a member of course X, rather than issuing the user an
identity as member-21-of-course-X). This also has implications for
the locus of authorization decisions for fine-grained access
management.
- In many -- but certainly not all -- cases, the
resources (such as electronic course reserves) that are subject to
fine-grained access management will be within an institution, or
within one of the institutions in a consortium of institutions that
are collaborating closely through shared courses or similar
projects. The case where an external commercial networked resource
will be access controlled to members of a small group like a class
will be rare.
- In some cases, the presence of fine grained
access management mechanisms may encourage irrational license
economics. For example: suppose there is an electronic journal that
prices based on the number of people that have access, rather than
on the number of people that actually use it. This would encourage
an institution to define a fine-grained group of authorized users to
this journal in order to save money. Such an arrangement is complex
and sets up barriers to access for the rest of the university
community. It would probably make more sense to initially price
access for the entire university community based on the approximate
number of people who will actually use the journal, and then if it
turns out a few more people are using it that were originally
expected, negotiate a slightly higher fee at license renewal rather
than defining a special access group. Revenues to the publisher will
be roughly the same in either case, but additional use would be
encouraged rather than discouraged. Note that of course this
reasoning doesn't apply in cases where there is wide demand for a
resource, and the licensing institution is making a policy decision
to deliberately and systematically limit access to the resource to a
specific closed user community; but this is, reviewers believed, the
exception rather than the common case.
3.4
Cross-Protocol Flexibility
Some
approaches work for a wide range of applications protocols that might
be used for accessing information. Others are designed to work only
with specific protocols, or would require the development of special
software extensions or modifications in order to support a full range
of protocols. For our purposes, HTTP-based Web access is the critical
application protocol; we will also consider Telnet terminal emulation
and the Z39.50 information retrieval protocol, although these are far
less critical. The main locus of concern here is the user's desktop
machine, which normally uses HTTP or Telnet to connect to machines
that are part of the system of networked information resources; Z39.50
is seldom used at the desktop today and finds its main application in
linking major networked information resources together.
Reviews
of the earlier draft of the paper felt that the X Window protocol was
not an issue, as this was primarily a local access application. The
ability to sign electronic mail messages is certainly an issue for
email-enabled networked information applications, though probably not
a major one. Secure email access -- authenticated SMTP, POP, or
IMAP, for example -- are viewed as primarily issues within an
institution rather than cross-organizational questions; while it is
certainly useful to have an authentication infrastructure which will
support these applications, as well as local administrative
applications, this is again not central to the cross-organizational
problem. Directory access protocols such as LDAP are also potentially
serious issues.
CORBA
and DCOM are potential questions, though it is not clear to what
extent these will be used from desktop machines in the future. There
are also a set of issues involving authentication in conjunction with
JAVA applets and systems like Authenticode or PICS which are not well
understood at this point. Many of the authentication and authorization
problems in this area deal with a user's machine making decisions
about what applets it is willing to accept and to execute, and what
authorizations it is willing to assign them; these are similar to
questions about document authenticity and integrity and are out of
scope for this paper. The other set of problems center around an
applet making decisions about a user's rights; while technology
and standards in this area are still in flux, most of the current
approaches seem to assume some kind of certificate infrastructure.
This is an area where more work is clearly needed.
3.5
Privacy Considerations
The
application scenarios here involve access to information resources. In
many cases libraries will pay for these licenses to electronic
resources as a replacement for physically acquiring information in
paper form.
The
licensee institution, in the print world, has a set of internal
policies about record-keeping and use reporting (both who used it and
how often it was used); generally these are very restrictive and
stress user privacy. The institution then has a separate set of
policies (which may in fact never have been explicitly codified) about
sharing this usage information with the content supplier: in general
this policy has been very simple -- the supplier got no information
about usage other than that which the institution chose to make public
for other reasons.
In the
electronic environment, the situation changes. Because information is
often accessed at the publisher site, the publisher may know a great
deal about who is accessing what material and how often. Aggregators
and service bureaus may also complicate both the collection and flow
of information. To some extent the collection, use, retention, and
even potential resale of this information can be covered by license
contract; and should be. Institutions will have to develop realistic
policies about privacy of readers in the networked information
environment which are acceptable to their user communities and well
understood by readers. However, some authentication and access
management approaches offer licensee institutions much greater
flexibility than others to limit the amount of information that can
technically be collected by the resource operator. In general, it is
desirable that the amount of privacy at risk which needs to be
controlled by contractual provision be minimized.
Clearly, one strategy for ensuring user privacy is to ensure that
users remain anonymous in their use of information resources. We can
distinguish several common situations:
- Repeat users cannot be identified; each session
is completely anonymous. We will call this anonymous access.
- Repeat users can be identified, but the identity
of a user cannot be determined. The resource operator knows only
that some specific individual is accessing the resource repeatedly,
not who that individual is. The user may be identified by some
arbitrary identifier, such as USER123. We will call this
pseudononymous access.
- Demographic characteristics of users can be
determined, but not actual identities. We will call this
pseudonymous access with demographic identification.
- Actual identities can be associated with
sessions. We will call this identified access. It may be
supplemented with demographics; just because the resource operator
knows who someone is does not mean that they automatically know the
user's demographic characteristics as well as his or her name.
Note
that many users choose to identify themselves in order to obtain added
value services, such as electronic mail notification of changes to a
resource, or to preserve context from one session to the next, or to
maintain a user profile at a resource. It's important to distinguish
voluntary user self-identification from automatic identification that
is generated as a byproduct of an authentication and access management
system. It is also worth considering, at least briefly, how an
institution might provide services for its community that permits
community members to enjoy these added value services without
identifying themselves to resource operators, and whether it's worth
going to the trouble to make this possible.
Understanding the coupling between pseudononymous or identified access
as provided by an access management system and the desire to implement
such capabilities as part of an information access system is a crucial
issue. A given information resource may rely on an authentication and
access management system to provide identified or pseudononymous
access automatically, or it may offer some weak or strong higher level
functions (using a userid/password or cookie scheme, for example) that
give the already authenticated and authorized user the option of
identifying him or her self (literally or pseudononymously) in order
to obtain personalized services from the information resource. In the
latter case, assuming that it's a real choice and the level of service
offered to the anonymous user is meaningful, this isn't an
authentication and access management system issue at all: it's a
choice that users of the information resource are free to make on an
individual basis.
Privacy
is not a purely political or moral issue. To the extent that
researchers are pursuing patents, developing grant applications in a
competitive environment, or seeking precedence for discoveries,
confidential access to information resources is a critical issue with
potentially significant economic consequences. Many higher education
institutions are bound by laws about privacy of student records; some
public libraries may face legislative constraints on patron privacy;
and medical institutions (including university hospitals) may have to
consider issues involving privacy of medical records. And, of course,
beyond the United States -- for example in Europe -- the overall legal
framework grants stronger privacy rights for all citizens.
Finally, in discussing privacy, we should recognize the overall need
for a secure environment; this goes beyond authentication and access
management. If user interactions with networked information resources
are conducted in the clear, they are subject to eavesdropping by other
machines on a local area network near the user (for example, by
sniffer-based attacks within the campus network) or by attackers
anywhere along the network path to the resource. Very few information
resources today support searching and information retrieval (as
opposed to ordering) via encrypted SSL-secured HTTP. If privacy is to
be honored in the licensing of networked information resources, then
contractual arrangements, resource sharing designs, and procurements
must recognize the importance of providing such support.
In some
situations privacy and confidentially issues go beyond access
management and session encryption. Some users may be concerned that
even knowledge that they are using a resource (not necessarily what
they are doing with it) becomes known through traffic analysis. Link
level encryption helps with this to an extent, but is not widely
deployed and is unlikely to be widely deployed anytime soon. Very
large scale aggregating proxies and experimental systems such as
Crowds, which build on work done with anonymizing emailer systems such
as Mixmaster also help to address these needs. Robust protection
against traffic analysis in the public internet requires very large
overheads. We will not consider this problem further here, other than
to observe that credential-based approaches seem likely to be most
flexible in these environments, and that if they are used it will be
necessary to consider traffic analysis vulnerabilities created by the
credentials verification process as well as the submission process.
Similarly, there are situations where some users are unwilling to
permit a resource operator to know what sort of information they are
searching for (even beyond contractual restrictions on the collection
and use of this information); in these cases it may be necessary for
such users to locally replicate an entire resource or large subsets of
it.
3.6
Accountability
In
negotiating a license agreement, all parties recognize that the
resource being licensed is of value and that the rights of the
licenser must be respected. Typically, a licensee institution will
agree to educate members of the user community about the license terms
and restrictions relevant to the information resource in question, and
to work with the resource operator to identify, investigate and put a
stop to improper use of the resource. Thus, both the resource operator
and the licensee institution share a common interest in having some
individual user accountability as part of an authentication and access
management system, so that if inappropriate use is detected (for
example, if a single user seems to be accessing the resource thousands
of times a day from computers on three continents) the organizations
know where to begin investigating.
Of
course, there's a tension between accountability and privacy; to the
extent that privacy is achieved through anonymity, there is no
accountability. Note that this balance may be managed by
compartmentalizing information, for example: if a specific user is
identified to the resource operator simply as USER2345, and the
licensee institution knows who USER2345 actually is (but the resource
operator does not) then the resource operator could call for an
investigation of what USER2345 is doing, and the licensee institution
might then follow its own due process in that investigation, which
might result in internal disciplinary action but might never result in
revealing the individual's actual identity to the resource operator.
In a real sense, the obligation of the members of the user community
are to the licensee institution, and the licensee institution in turn
has obligations to the resource operator to ensure that members of its
user community behave responsibly; it is not at all clear that it's
appropriate for the resource operator to be dealing with individual
members of the user community directly.
Accountability will also have some interactions with institutional
policies about inappropriate use of network resources, particularly to
the extent that interaction with these resources may go beyond simply
retrieving information to participation in interactive communications.
For example, policies that typically govern the use of electronic mail
may come into play. But even if resources are used purely for
information retrieval purposes some accountability (coupled with
management data) may be desired in support of policies prohibiting use
of university resources for personal commercial gain, for example; a
useful analogy may be drawn to practices and policies in areas such as
telephone logs.
3.7
Ability to Collect Management Data
The
licensing institution has a legitimate need to gather management data
in order to guide future decisions; if it is spending a great deal of
money to license access to a resource, or to participate in a
consortium resource sharing arrangement it is only reasonable that it
will want to know how much various resources are being used and what
sectors of the user community is making use of them. For public
institutions, in particular, collection of management data is an
essential part of institutional accountability, and some collection of
management data may even be considered part of public records
responsibilities for these institutions.
There
are many reasons to collect management data besides guiding licensing
or resource sharing decisions. These include the allocation of costs
within a licensing organization or even the development of enhanced
services such as collaborative filtering systems.
It's
useful to define some terms. Management data can be faceted in two
ways. The first is by user: this might include faceting by source IP
address, by identity (name), or by user attributes that figure into a
contractually based authorization decision (i.e. a resource is limited
to faculty and graduate students; this user had the faculty
attribute), or by demographic information that the licensee
institution knows and wants to correlate with usage patterns (i.e.
this is a first year graduate student in civil engineering, or even,
in theory though likely not in practice, this is a male student). They
second way to facet management data is by the objects being accessed
or the services being used: which pages of which articles are being
read, which one of several different databases on a server is being
searched, how often searching is by author rather than by date, etc.
Collecting highly aggregated data is not particularly problematic;
there's no way to prevent the resource operator from having aggregated
data (although its use can obviously be managed by contract). The only
question is whether the licensee institution can collect its own
aggregate data or whether it must take it as a return feed from the
resource operator; in the latter case, there are a whole series of
scaling issues related to standards, since it will be a significant
burden for the licensee institution to receive use statistics feeds
from potentially hundreds of resource operators in different formats,
reflecting different conceptual models about what is being counted,
and with different delivery schedules.
The
larger problems arise when one wants demographically faceted use data,
or even individual use data. In the case of demographically faceted
data, either the licensee institution must use the authentication and
access management system to pass demographic faceting to the resource
operator so that it can become part of the usage data that the
resource operator returns, or the licensee institution must be able to
capture its own demographically faceted use data. Privacy
considerations begin to emerge when demographic data must be passed to
the resource operator.
In the
case of individual use data the problems become even more sensitive.
Clearly, if users are individually tracked by the resource operator
(whether or not their identities are known -- i.e. whether they are
pseudononymous or identified) then the resource operator can collect
individual level data and return it to the licensee institution. The
resource operator may even get supplemental demographic data about the
individuals from the licensee organization. There are also a series of
institutional policy problems having to do with individual level data
at the licensee institution: who can see this data -- for example, can
a faculty member look at the statistics for his or her students use of
specific information resources? Under what procedures are usage
records subject to audit to detect misuse? Again, we need to consider
when these issues should be defined by policy and trust in
implementation of policy as opposed to being managed by technical
means.
While
many scenarios are possible, I suspect that the most common practical
situations today will be these:
- usage is tracked on an aggregated basis either by
the institution or the resource operator; I suspect tracking by the
resource operator will be more common since the resource operator
will be able to count events that are more meaningful in measuring
resource utilization (for example, by journal rather than just page
accesses).
- usage is tracked on an individual (pseudononymous
or identified) basis by the resource operator, who then passes use
logs back to the institution ,which processes them to factor in
demographic data and obtain a demographically faceted usage report.
- institution and resource operator agree on some
very simple demographic faceting and demographic data is passed to
the resource operator by the access management system; these
demographics are then factored into the usage reports developed by
the resource operator.
Management data is a major problem in the current access framework.
Part of the problem is the conflict between privacy and a desire for
demographic or individual data. Most of this is going to have to be
sorted out at the institutional policy level, and may involve making
sacrifices in order to ensure privacy. Some institutions may be
legally limited in their ability to collect certain management data.
It would be very useful to have some real-world examples of how this
trade-off has been settled.
A very
insightful comment was made at the meeting to review the first draft
of this paper. From the perspective of the licensing institution,
particularly when facing difficult collection and resource allocation
decisions, the observation was "there's never enough management
information -- this issue here is to define what you absolutely have
to have, not would you would ideally like".
Return
to Contents
4.0
Approaches to Access Management
Having
summarized the many and sometimes conflicting requirements that an
access management system must address, we now consider a number of
actual schemes currently in use or under consideration and analyze how
well they meet these requirements.
It's
important to recognize that in solving real-world problems more than
one approach may be relevant at a single institution; one might use
one scheme for one class of users and a different scheme for another
class. For example, an institution might choose to manage access for
kiosks and public workstations by IP source address, and to use a
credential scheme for other users. Indeed, virtually all of the major
institutional systems that are currently being deployed combine
multiple approaches. Also, note that approaches can be cascaded in a
hierarchy; for example, a resource might be set up to first check
whether a user could be validated by an IP source filtering approach
but if the IP source address isn't valid for access, the resource
might then apply a credential-based access management test.
At the
most general level, there are three approaches -- proxies, IP source
filtering, and credential-based access management.
Basically, with IP filtering, the licensee institution guarantees to
the resource operator that all traffic coming from a given set of IP
addresses (perhaps all IP addresses on one or more networks) represent
legitimate traffic on behalf of the licensee institution's user
community. The resource operator then simply checks the source IP
address of each incoming request.
In the
case of a proxy, the licensee institution has deployed some sort of
local authentication system, and users employ specific proxy machines
to send traffic to the resource and receive responses back from that
resource; the local authentication system (which is invisible to the
resource operator, except that the resource operator knows that it is
in place in order to guarantee that traffic coming from the proxy
machines is legitimate) is used to control who can have access to the
proxy machine. As a business matter, the resource operator may want to
know something about how the local authentication system works in
order to have confidence in the proxy, but this does not enter into
the actual authentication which is performed operationally by the
resource operator. The resource operator will most commonly identify
the proxy machines by their IP addresses (or some variation such as
reverse DNS lookup), and for this reason from the resource operator's
point of view proxies are often just considered to be a special case
of IP source address filtering -- a resource operator who is set up to
do IP source address filtering can accommodate a licensing institution
employing proxies with essentially no additional work. However,
proxies can actually be identified using either IP addresses or any
credential-based cross-organizational authentication scheme (such as
certificates). Because of this, and also because many of the policy
and technical issues surrounding proxies at a higher level are quite
distinct from those involved in IP source address filtering, we will
treat proxies as a separate approach.
The
third approach is credential-based. Here the user presents some form
of credential -- a user id and password, or a cryptographic
certificate, for example -- to the resource operator as evidence that
he or she is a legitimate member of the user community. The resource
operator then validates this credential with some trusted
institutional server (or third party server operating under contract
to the institution) before deciding whether to allow access. Note that
there needs to be advance agreement (most likely as part of the
license contract or resource sharing agreement) as to how the mutually
trusted institutional servers or third parties (such as certificate
authorities) are identified and authenticated themselves.
For
completeness, it is worth noting that there is one other possibility:
the resource operator assigns credentials to individual members of the
licensee community (perhaps in cooperation with the licensee
institution). This is what was done historically when small numbers of
users needed access to a few specialized information resources. The
trouble is that it does not scale manageably to large numbers of users
or large numbers of resources, and particularly not to both. While
it's reasonable for an institution to distribute one set of
credentials to each member of its user community (for example, in
conjunction with an internal authentication system) it's not
reasonable to distribute hundreds of different credentials for
different resources to each user, or to expect the users to manage
them or to keep straight which credentials are for use with which
resource. Thus, we will not consider this model further, other than to
recognize that it may have its place for specialized resources that
serve only a handful of users.
4.1
IP source address filtering
Currently, IP source address filtering is the major mechanism used to
implement authentication and access management for cross-institutional
resource access. The way this works is that the licensee institution
provides the resource operator with a list of IP addresses that are
authorized access; this can include some wildcarding to permit entire
subnets or networks to have access, and also occasionally incorporates
exclusion lists (all hosts on a given net or subnet EXCEPT for the
following specific hosts). There is general agreement that it is
unsatisfactory for a number of reasons, and it is instructive to
evaluate it against our seven functional requirements both to see
where it works and where it actually falls short.
Feasibility and Deployment: This is relatively easy to deploy and
manage from the perspective of both the institution and the resource
operator. No special software is needed at the user side, and at the
resource operator side the support is not difficult. There is some
maintenance involved in keeping the tables at the resource providers
up to date, but this is not unmanageable. It is necessary for the
licensee institution to perform some analysis on access and use
policies for the machines within the institution to make sure that
machines that aren't access-limited to the institutional community are
excluded where necessary, and to educate members of the community that
giving outsiders an account on a machine also gives them access to
institutional resources that they may not be entitled to; there are
some real dangers of access control breaches by the creation of
proxies either through ignorance of the implications or deliberately.
The
major problem, from a feasibility point of view, is that many
legitimate users are not coming through the institutional network at
all times; they may want access through commercial ISPs, at their
workplaces outside of the institution, or from home. Some other
solution is needed to handle these users.
One
should not underestimate the management complexities of IP source
address based access management, particularly from the point of view
of a resource operator. Configuration changes are frequent, and
configurations for a large licensee institution can be quite complex.
Also, the move from the older class-based network addresses owned by
institutions to classless IP network addressing with the address space
managed by the ISP has introduced new problems; not only must the
licensee institution get the network masks right, but there's no easy
way for the resource operator to independently verify this (for
example, that an institution's network is a /18 rather than a /19).
Authentication Strength: Source IP filtering is actually relatively
strong. While it's not difficult to introduce packets to the network
with spoofed source addresses unless appropriate packet filters are in
place (and this has become a major problem in the context of network
denial of service attacks), getting responses back to a spoofed
network address is much harder, and basically involves hijacking
entire network addresses within the routing infrastructure. This is
relatively unlikely; it's a sophisticated and complex attack, and is
very likely to be noticed quickly. Resolving the threat of IP spoofing
needs to be addressed at the network routing infrastructure level, and
considerable work is going on in this area (packet filters and
authenticated BGP peering, for example).
A
specific machine with an excluded source IP address that sits on a
generally authorized network can circumvent that restriction more
easily, if the machine isn't under institutional administration (for
example, its owner can just give it a new IP address on the same
network.)
Source
IP filtering isn't subject to systemic compromise, and doesn't come
with export control restrictions.
Granularity and Extensibility: To the extent that membership in
specific groups can be linked unambiguously to specific network
addresses (for example, in an office, a dorm room, or a computer lab)
fine grained access is feasible. Such direct linkage is often not the
case, however; students in a class may share use of a computer lab, or
need to use public workstations in a library.
Cross-Protocol Flexibility. Since all protocols of interest run on top
of IP, source IP address based access control is quite universal.
Privacy
Considerations: To the extent that source IP addresses can be linked
to individuals (for example, personal workstations in offices) there
are some privacy issues. And certainly source IP addresses are
correlated to demographics, if the resource provider is willing to
invest in understanding the campus network architecture. Access in a
source IP filtering authentication environment is probably somewhere
between anonymous and pseudononymous, with some ability to move from
pseudononymous to identified access in individual cases if the
resource provider is willing to go to the trouble to do so (this is
the case of personal workstations used primarily by a single
individual).
Accountability: There is limited accountability -- at the level of
machines rather than people -- which mirrors the privacy situation.
One has relatively good accountability for individually-owned personal
workstations and relatively poor accountability for everything else;
for a large, shared machine one gets accountability to the machine
level, and then has to work with the administrator of that machine to
identify a specific user or users. If dynamic IP address assignment is
used (as is often the case for laptops in public areas, for example),
then accountability is particularly weak.
Management Data: An institution can collect some usage data at a
highly aggregated level that is not well correlated to
application-level constructs through a border router, or get
aggregated usage data from the resource operator. Demographic data can
be obtained to the extent there is correlation between IP address
blocks and demographics (for example, there might be a campus subnet
for a medical school); this demographic data will be sketchy and
imperfect at best, and some differentiations (such as students as
opposed to faculty) will be very hard to extract. Individual level
usage data will be possible only in the case where there are personal
workstations, and all work by an individual is done on that
workstation.
Summary: IP source address based access management tracks the
activities of machines rather than people. To the extent that there's
a very close correlation between the two, it works reasonably well.
Unfortunately, the correlation has never been that good and many
trends (such as the move from institutional modem banks to purchase of
commercial dial up access to the internet) continue to weaken this
correlation. IP source address access management may work particularly
well for fixed-location, institutionally managed public terminals,
such a public workstations in libraries or computer labs.
There
are several additional issues and variations on source IP filtering
which deserve some additional comment.
Many
organizations are moving to dynamic assignment of IP addresses, either
for limited situations such as laptops that may be docked in
classrooms, computer labs, or public areas such as library reading
rooms, or in some cases, campus wide in order to simplify address
management. This dynamic assignment weakens accountability,
strengthens privacy, and complicates the collection of meaningful
management data. However, since dynamic IP addresses are assigned
within an organizational network number, use of dynamic IP addresses
does not invalidate the use of IP source address based access
management.
To
mitigate the problems with access via dialup ISP connections, a few
universities have negotiated special arrangements with specific ISPs
so that members of their community are assigned addresses on a
specific (private) net or subnet when connecting via the ISP (since
the ISP does authentication on the users as part of the establishment
of the dialup connection, this is feasible if the ISP can maintain
this information as part of its user attribute database). While this
makes it possible to extend IP source authentication to dialup users
obtaining service through the ISP, it should be clear that this
approach will not scale reasonably to offer users a wide range of
choice in the ISP marketplace (including wireless and cable TV based
ISPs); it is most practical in situations with large educational
institutions who have the marketplace power to negotiate such
arrangements and where members of the institution's user community are
willing to select from at most a small number of competing ISPs.
Approaches using IP tunneling and/or Mobile IP type support can be
used to mitigate some of the limitations of traditional source IP
based access management schemes, though they may have considerable
performance and complexity drawbacks. The next revision of the paper
will include a discussion of these approaches.
Some
organizations have used reverse Domain Name System (DNS) lookups on
source IP addresses and then checked the DNS name in order to perform
access management. This changes matters very little except that it
means that access management must also rely on the security of the DNS
system itself (which can be a problem; secure DNS is not yet deployed
widely) and requires that all hosts have DNS names tabled, which is
often not the case. This approach also does not work well with DHCP
(dynamic assignment of IP addresses) which is often used to support
laptop machines.
4.2
Proxies
In some
sense, proxy based approaches simply shift the problem, since an
institution will still have to deploy an internal authentication and
access management system in order to control use of the proxy servers.
However, it may be easier to implement an internal system than to
implement a system that must be used by a wide range of resource
providers; proxies modularize and compartmentalize the authentication
problem.
Let us
assume for the time being that an institution has implemented a viable
internal authentication system and analyze various proxy schemes under
that assumption. Our comments, then, will only cover the proxy scheme
itself, not the institutional authentication system necessary to
support the proxy.
We need
to distinguish between two different kinds of services that are
sometimes referred to as proxies. The first, which we will call
mechanical proxies, are services which take make use of facilities
designed directly into implementations of protocols such as HTTP. To
use a web proxy server, one configures a browser to pass all HTTP
requests not directly to the destination host, but instead to a proxy
server, which intercepts these requests and when necessary retransmits
them to the true destination host. In this case, the operation of the
proxy should be invisible to the end user.
The
second type of proxy is what we will call an application-level proxy
(historically, these have often been called "protocol translation
systems" or "gateways"). An application level proxy functionally
forwards requests where appropriate, but does not rely on protocol
mechanisms. An example might be a Telnet proxy, where in order to
reach an access-controlled Telnet based resource, one telnets to an
institutional system; this might engage the user in an authentication
and authorization dialog, and then mange a Telnet session to the
remote resource, with some editing. In the web environment, a service
such as the anonymizer (www.anonymizer.com) is a good example; here,
one accesses the web page of the service and provides the URL of the
remote resource one really wishes to access. The anonymizer service
not only forwards requests on, but also dynamically re-writes each
page coming back from the remote resource prior to presenting it to
the end user, for example, replacing each URL in the retrieved page
with a URL that accesses the anonymizer with a parameter of the actual
remote page that is being requested. As the environment becomes more
sophisticated, applications proxies become increasingly problematic:
for example, an applications-level proxy generally will not handle
pages that contain Java applets properly.
Feasibility and Deployment: This is not entirely straightforward.
Proxies introduce a considerable amount of overhead, and the
institution will need to invest in the installation and operation of
proxy servers. Some overhead may be mitigated by having the proxy
server perform caching operations as well as access management,
although this introduces a range of other responsibilities and
problems. Also, proxy servers become mission critical systems; they
need to be available and reliable, and to be sized so that they do not
represent a performance bottleneck.
Proxies
-- and in particular application level proxies -- have scaling
problems not only in terms of computational resources to support a
large user community, but also in terms of configuration management
and support as the number of resources available to the user community
multiply. Each resource needs to be configured, and as resources
change, configuration changes will be needed in the proxy.
In the
case of mechanical proxies, user browsers have to be properly
configured to make use of the proxy rather than communicating directly
with resources on the network. This will be a particular problem when
pre-configured browsers are supplied by sources other than the
licensee institution; for example, cable-TV based internet service
providers like @home make extensive use of proxies and caching within
their own networks, and supply browsers that are configured to use the
ISP's network. In the case of applications level proxies, users will
have to be taught to go through the application in order to reach
remote information resources.
Integrating a local authentication system with a commercial (usually
mechanical) proxy server may be non-trivial. Programming for an
application level proxy can become quite complex. One useful
distinction is the locus and complexity of decision making that the
proxy must perform. At the simplest level, a proxy can just screen all
potential users without regard to the resource that they want to
access; essentially there's a single authorization to use the proxy,
and through it all of the resources that it permits access to. At a
more complex level, the proxy might consider both the user and the
resource in order to make an authorization decision; at the most
complex level, it may track in detail the user's interaction with
various resources and make very specialized decisions about what
requests it will and will not pass through to the resources.
Telnet
application proxies are tricky to build (consider problems like the
handling of break signals as they are propagated across the proxy),
and as far as I know, standard commercial software to support
construction of such proxies doesn't exit. For Z39.50 applications,
it's certainly possible to construct custom proxies, although I am not
aware of general purpose software to do this. The proxy strategy is a
very general one architecturally.
From
the point of view of the resource operator, proxies are easy to work
with; they usually just look like a particularly simple form of IP
source address authentication. However, they may raise some user
support problems; if an institutionally-provided proxy is out of
service or overloaded, the resource operator can expect complaints
about bad service for reasons that are outside of its control.
Authentication strength: obviously, this depends on the local
authentication system. There is the danger of systemic compromise if
the proxy server is successfully attacked (that is, the local
authentication built into the proxy server is broken) or the proxy is
misconfigured. A breach of the local authentication system is likely
to be a very high visibility event which will receive rapid response
from the licensing institution; a breach of the proxy may be more
insidious and more difficult to detect. The communication between the
proxy server and the resource can be very strongly secured and
authenticated using certificates and session level encryption.
Granularity and extensibility: in theory, anything is possible if
enough work is done on the proxy server. For fine-grained access
control, however, it's necessary for the proxy to consider who is
trying to access what, rather than just having the proxy server
authenticate members of the user community prior to any use of the
proxy. It's not clear how hospitable commercial proxy software is to
this kind of application, or how complex the institution-specific
programming will have to be; the more complex it gets, the more likely
there are going to be security vulnerabilities.
Cross-Protocol Flexibility: Because the authentication mechanism used
between proxy and user and between proxy and resource need not be the
same, there's a particularly high level of cross-protocol flexibility.
In the worst case, the proxy can use a very general authentication
approach like source IP filtering to support protocols between the
proxy and the resource, and can use specialized methods (even embedded
within application proxy code) to authenticate users to the proxy
server.
Privacy: proxies can provide real anonymity of use if they are set up
properly; the resource operator need not even get a source IP address
for the end user. On the other hand, they provide a choke point for
potential systematic institutional monitoring of what the user
community is doing, which may be some cause for concern.
Accountability: in general, proxies provide poor accountability, since
they offer anonymous access. At best, some level of accountability can
be provided by correlating local logs at the proxy (which is tied into
the local authentication system) and monitoring at the resource. In
theory it would be possible for the proxy to pass some pseudonym or
identity to the resource, but it's not clear how this would be
accomplished in a standard and interoperable fashion.
Management data: just as a proxy is a choke point for monitoring, it
is also a choke point for collecting management data, including
demographically faceted data or individual data since it authenticates
users and then sees all of their requests to resources. Of course,
correlating this to applications-level events and terminology is hard.
It is not clear how a proxy could pass demographic data along with
requests to a resource to permit faceted statistics collection at the
resource side.
Summary: it's hard to fully evaluate the proxy approach for two
reasons. To some extent it just moves the authentication problem
because it presupposes the existence of an institutional
authentication system, and the problems of deploying such a system
really need to be considered. Second because a proxy -- particularly
an applications level proxy -- is a point at which custom programming
can be inserted almost anything is possible, at least in theory, but
it's hard to evaluate the implementation and maintenance cost of such
a system, and the extent to which it demands custom interfaces to the
resources themselves, as opposed to using completely standard
interfaces.
4.3
Credential based approaches
In a
credential based approach, the user interacts directly with resources
on the net rather than working through an institutionally-provided
proxy intermediary. The key problems here are:
- What are the credentials that the user presents
to the resource?
- how are these credentials presented securely?
- how are the credentials validated with the
issuing institution?
For a
credential based approach to scale, all of these activities need to
take place in a standardized fashion. The most commonly discussed
credentials are X.509 certificates, which are attractive because
browsers and servers already have some support for them (designed to
enable electronic commerce) and because other software components
needed for an X.509 public key infrastructure are already becoming
available on the marketplace. However, many other forms of credentials
are possible, including userids and passwords, one time passwords, and
the like. Indeed, it's useful to differentiate between
application-level credentials -- where the collection of the
credential and its validation is packaged into the application itself,
such a obtaining and checking a userid and password -- and credentials
which are built into protocol mechanisms, such as the use of
certificates with HTTP and SSL. The protocol based mechanisms are more
general and often require less work to implement on the part of the
resource operator, but are less familiar to end users, calling for a
larger investment in infrastructure and user education.
Credentials can be confusing to analyze because they can potentially
carry both authentication and attribute information together, or they
can be used purely (or almost purely) for authentication.
We will
analyze two credential-based approaches: a userid/password scheme at
the application level, and a certificate based approach.
4.3.1 Password based credentials
Assume
that institutions simply maintained databases of (pseudonymous or
identified) user ids and passwords. Note carefully that the idea here
is that a member of the institutional user community has a single
userid and password for access to all licensed resources, and not a
separate userid and password for each licensed resource.
Using
SSL-encrypted forms (which eliminates the problems of transmitting
passwords in the clear), it would be fairly easily for a resource to
ask for this userid and password securely; one could then have a
special purpose protocol so that a resource could securely check
whether the userid and password were valid by querying an
institutional userid/password database server. Note that SSL can set
up an encrypted connection with a server certificate but no
client-side certificate.
The
special purpose userid/password checking protocol doesn't exist today,
but is not hard to design or implement, and since it only needs to be
implemented by the resource operator and by an institutional server or
two at each licensee institution, it might be much less problematic
than making all licensee community users go through the complications
of obtaining and installing certificates on their machines. Further,
similar protocols for userid/password checking are already in use for
validating users to terminal servers (i.e. TACACS, RADIUS); these
might be used, or at least adapted.
Users
are already familiar with user ids and passwords, including the need
to keep passwords secure, to change them, and to pick them well (or at
least they are more familiar with these issues than, for example,
certificate use). Userids and passwords can be carried in the minds of
people rather than being installed on specific machines the way that
certificates are; this helps with kiosks, computer labs, libraries and
other shared machine settings -- assuming that one can teach the user
to log off when he or she is finished, rather than just leaving the
machine signed on. Probably the biggest problem with this approach --
which is not shared with certificates -- is that the resource operator
obtains a set of globally valid credentials for the user, and has to
be trusted to keep them secure. There are also some secondary problems
-- Trojan horse resources that capture user ids and passwords under
false pretenses, for example, are a much more serious threat than they
are in a certificate exchange environment.
Let's
consider passwords and user ids carried over SSL encryption from the
perspective of our requirements definition. It's clear that they are
feasible and deployable. Assuming that a protocol for verifying user
ids and passwords with an institutional server is standardized and
deployed, the amount of work faced either by a licensee institution or
a resource operator is quite manageable. Special desktop software is
not required for web access; for other protocols, such as Telnet, an
SSL- capable Telnet is needed (my understanding is that some of these
are under development). Z39.50 credentials are a particular problem
because no Z39.50 interface to a service like SSL is currently
defined. User ids and passwords are clearly linked to people rather
than network addresses of machines. One problem with userids and
passwords is that they don't encourage seamless navigation among
resources; each resource is going to explicitly annoy the user by
asking for his or her userid and password on each visit.
While
passwords represent relatively weak security, a system can be put in
place to require them to be difficult to guess (by forcing the use of
pass phrases rather than passwords, or avoiding use of words in a
dictionary), and also insisting that they be changed frequently. The
use of an SSL based transport removes the security problems of
transmitting them in the clear. The protection provided by SSL will
depend on whether US-only (long key) or international (short key)
versions of SSL are supported by the user's browser. Userids and
passwords are subject to systemic compromise from two perspectives; if
the institutional password verification server is compromised, new
passwords would have to be issued to all members of the user
community. Also, each resource operator now shares in the
responsibility for keeping userids and passwords secure; if any
resource operator's site is retaining user ids and passwords, and is
compromised, this will compromise all other resource operators as well
as the home institution (if the institution is using the same userid
and password for internal and external authentication and
authorization purposes).
Granularity and extensibility. An institutional password server will
just verify that a particular userid/password combination is valid (it
would also know what resource operator was asking). In situations
where an access management decision needs to be made that goes beyond
validity of the userid/password pair, the key question is the locus of
that decision. The resource operator will either have to maintain a
list of valid Ids (identities) or the password server will have to
keep information about what resources a userid has access to. Or the
institution would have to offer resource operators access to a user
attribute database keyed on userid.
Cross-protocol flexibility: because passwords operate at a higher
level of abstraction than protocols they are general. Telnet and
Z39.50 support should be straightforward, assuming that there is
encryption on the link over which the passwords are transmitted, as
discussed above.
Privacy
and accountability. The use of user ids and passwords transfers
personal information directly to the resource operator. This
information may be pseudononymous or identified; it will not be
anonymous. To this extent, it undermines privacy but offers
accountability. Management data faceted by demographic categories will
be available from the resource operator only to the extent that the
licensee institution provides demographic data as a byproduct of
userid/password validation. there is no opportunity for the licensee
institution to collect statistical information directly, other than a
count of how often userid/password pairs are validated by the various
resource operators.
Summary: to the extent that an institutional password verification
server controls the export of individual and demographic information,
passwords could work surprisingly well in an SSL-protected context. A
primary benefit is that users are familiar with the model. There are
important missing pieces here, particularly the protocol to permit
resource operators to verify userid/password pairs with institutions
that issued them. Probably the greatest weakness of this approach is
the dependency on each resource operator to protect userid/password
pairs, and the danger of systemic compromise due to a security failure
on the part of a single resource operator.
Further
comments. Clearly, by issuing different passwords and userids for
different resources, it is possible to reduce the interdependence
among resource operators and the dependence on each resource operator
in maintaining security. However, large numbers of passwords and
userids are extremely unfriendly and confusing for users, and probably
impractical. For users who only use a single machine (or who are
willing to store a cookie file in a network file system), and for
resources that don't require high security, it's certainly possible to
store userids and passwords as cookies on the user's machine (though
many users have become "cookie-phobic" due to the overly dire
publicity surrounding cookies); once stored, the user doesn't have to
enter them at all, improving seamless cross-resource navigation. This
is the approach that is taken by many low-security commercial services
in the consumer marketplace today.
4.3.2 Certificate based Credentials
X.509
certificate based credentials are substantially more complex than
passwords, but offer a number of advantages. In essence, an X.509
certificate (plus the private key that goes with the certificate)
gives a machine credentials that support its right to make use of a
name, and allows this assertion to be verified by checking with a
certificate authority (which might be operated by the licensee
institution, or operated by a third party under contract to the
licensee institution). X.509 certificates include expiration dates,
and certificate authorities can also provide revocation lists to
invalidate certificates prior to their expiration date (though
checking such lists can involve substantial overhead, and not all
systems supporting certificates currently check revocation lists.)
Rather
than making a complete analysis of certificate based credentials, we
will simply highlight how they differ from the password based
credential approach already discussed.
X.509
certificates and corresponding private keys are messy to distribute
(much more so than, for example, a starter single use password for a
local authentication system), and complicated for users to install,
particularly in cases where the certificate needs to be installed in
multiple machines owned by a single user. Backup and recovery needs to
be considered carefully lest a user lose his or her certificates
permanently as the result They are highly intractable in cases where
users share machines, such as public workstations. X.509 certificates
can contain demographic data (though there are standardization
problems here about how to encode them in the certificate payload)
which could be used for resource-operator based statistics gathering
or fine-grained authorization decisions.
In
contrast to passwords, there is already a well defined
protocol/process which can be used to validate an X.509
certificate-based credential that has been presented to a resource
operator.
Note
that an X.509 certificate based credential does not consist of simply
the certificate itself, but rather a complex object that includes the
certificate and is signed with the (secret) private key corresponding
to the certificate; since this is computed anew each time a credential
is needed, X.509 based certificates do not share the password-approach
problem that security depends on each resource operator carefully
protecting the user's credentials.
Userids
and passwords are application level constructs; they can be designed
into an application using any protocol, assuming only that the
connection can be encrypted. The exchange of X.509 certificates is a
lower level, protocol-integrated operation and does not rely on
encryption. Thus, there is work involved in extending the use of X.509
certificates to work with protocols other than HTTP, such as Telnet.
(Z39.50 already contains facilities for certificate exchange). There
is also still a need for an SSL-type service to encrypt the connection
where confidentiality is desired; SSL can also handle many aspects of
certificate exchange without the need for upper level protocol
engineering, if it is available (though the application -- if not the
applications-level protocol -- still needs to know something about
certificates). One advantage of certificates is that they are more
flexible than most other mechanisms; they can be used for signing
electronic mail messages, for example (though generally a separate key
is used for signing). And much of the current work on new protocols
and services -- for example in the Java environment -- seems to be
based on certificate models.
The
issues involving privacy, accountability and management data change
little from the password scenario already discussed. One point worth
noting that if the user has several certificates -- for example, an
identified one for use with an internal institutional authentication
and authorization system and a pseudononymous one for use with
external services -- he or she must select the correct certificate for
presentation in order to maintain privacy.
4.4
Proxy/Credential Hybrid Schemes
There
are several interesting and confusing schemes that after much
discussion the initial reviews of the paper recognized are really
hybrids of the proxy and credential approaches. In these schemes, the
user contacts an applications proxy in order to gai