Persistent Browser Cache Poisoning
Matthias Vallentin mavam@cs.berkeley.edu
Yahel Ben-David yahel@cs.berkeley.edu
Abstract
Caching web content increases the performance on both the client side(by avoiding unnecessary requests)and the server side(by reducing bandwidth and I/O load). To this end,the HTTP protocol uses expiration and val-idation mechanisms to ensure that stale content is up-dated when necessary.It performs the corresponding in-tegrity checks,however,only on the server side.Worse, current browsers lack proper cross-domain cache au-thentication.We exploit this absence of security checks, and show that an attacker who accesses the network through the same shared medium as the victim can mod-ify cacheable content in the victim’s browser without at-tracting the notice of the victim.This attack is partic-ularly effective against web applications that offload a large part of stable functionality to the client’s cache. To demonstrate the feasibility of the attack,we imple-ment air-poison,a proof-of-concept tool designed to be used in wireless networks that answers HTTP re-quests for JavaScript documents by prepending a mali-cious payload of the attacker’s choice.Each time the victim loads the document,r
egardless of the security ori-gin context,the malicious script executes and establishes a bidirectional JavaScript communication channel to a host controlled by the attacker.The systematic study of the severity of this threat is an interesting future avenue of research.
1Introduction
The purpose of browser cache is to store web content for performance reasons in order to both reduce the server load and avoid unnecessary requests for an unchanged re-source.The continuing trend towards faster,richer,and more sophisticated web applications makes caching an indispensable component of fulfilling the user’s expecta-tions for responsiveness.Unfortunately,the HTTP proto-col providing the caching mechanisms was not designed with security considerations in mind,which has opened it up to a variety of trivial attack vectors in the presence of an adversary.
In this paper,we discuss the practical security impli-cations of web caching,and show that current caching practices violate basic integrity assumptions.Worse,the lack of per-site cache isolation in current browser im-plementations enables particular powerful attacks against web applications used by multiple sites.We present an attack to persistently implant executable content into a victim’s browser cache,that merely requires that the at-tacker and victim to access the network via the same sh
ared medium.While the concept of this type of attack has been around for some time[12,20,13],we are the first to examine it in context with the HTTP cache primi-tives,demonstrate the ease at which it can be conducted, and implement a proof-of-concept attack tool.
The remainder of the paper is structured as follows. After recapitulating the basics of HTTP caching in§2,we illustrate how to persistently poison the browser cache in§3.We then turn to air-poison,our proof-of-concept implementation of the attack and point out a particular attractive target in§5.Thereafter in§6,we describe ways how the attacker can get the necessary position to conduct the attack.We summarize related work in§7and suggest avenues for future research in§8. Lastly,we close the paper with concluding remarks in§9. 2HTTP Caching
The fundamental goal of caching is to improve perfor-mance.Clients benefit from caching by avoiding unnec-essary requests when the resource has not yet expired, while servers benefit via a reduction of bandwidth and I/O load.Caching in HTTP[11]is based on two princi-pal mechanisms,expiration and validation.
In order to allow clients to verify the freshness of a resource,the server specifies an expiration time in the future,using either the Expires header or max-age di-
rective within the Cache-Control header.A fresh resource does not need to be refetched,thereby avoiding unnecessary requests entirely.The server should choose an expiration time that is adequate for the resource.Set-ting an expiration time in the past signals a stale re-sponse,which the cache should always validate.Deter-mining whether a response is stale involves comparing the freshness lifetime of the cache entry to its age.1A stale response must be validated,which we describe be-low.
After having determined that a cache entry is stale, the cache will validate the resource by asking the server whether or not the resource is still usable.For this purpose,HTTP1.1uses conditional requests,in which the client includes a validator that has been stored with the original cache entry.The server compares the re-ceived validator against the current validator,and,if these values match,responds with a304(Not Modi-fied)status code,and returns the full response otherwise. HTTP1.1distinguishes between strong and weak val-idators.The strong validator changes by necessity when the entity changes,whereas the weak validator does not always ,only when significant semantic changes occur.Conditional requests differ from regular requests only in that they carry the validator in an ex-tra header,which is often the Last-Modified header. The Last-Modified header is implicitly weak,as it offers only per-second granularity.As an alternative,the server can include an entity tag,which is a custom val-idator specified in the ETag header.2This enables strong ,if the
entity tag is calculated as a check-sum of the resource.Entity tags therefore allow for a more generic and reliable validation.The current HTTP standard advocates for the use of a strong validator to-gether with a Last-Modified header.3
3Persistent Cache Poisoning
The goal of the attacker is to replace a piece of JavaScript in the victim’s browser cache with a malicious one.In particular,the attacker seeks to(i)maximize the time until the victim validates the malicious cache entry,and (ii)avoid validation failures of conditional requests.In order to do so,the attacker must be able to forge the re-sponse from the web server,which is particularly easy in 1If both a Cache-Control header with max-age directive and an Expires header are present,the former overrides the latter;if nei-ther header is present,the cache may use a heuristic to compute the freshness lifetime.
2When an ETag header appears in a response,the client must use it for subsequent conditional requests in the If-Match or If-None-Match header.
3To enable backwards compatibility with HTTP 1.0,which does not feature entity tags,clients should include entity tag and Last-Modified if both are present.a Man-in-the-Middle(MITM)scenario.Multiple w
ays to get in such a position exist,which we discuss in§6.In order to convey the basic idea of our attack,we mustfirst make the assumption that the attacker has managed to become a MITM and can observe,block,and manipulate traffic.We relax this assumption later and demonstrate a variation of the attack when the attacker is merely an eavesdropper in a wireless network–without the abil-ity to manipulate ongoing communication,yet with the ability to inject packets.
3.1MITM Cache Poisoning
Consider the scenario where a client requests a page from site A,which in turn includes a link to a JavaScriptfile hosted by server S;in other words,after receiving the page from A,the client opens a new connection to S to download the linked script,as illustrated by step in Figure1.This script could be part of a widely used JavaScript library embedded by various different sites. Because such libraries change less frequently than the primary page content,S returns the script with the nec-essary Cache-Control headers in the HTTP response ( ).When observing the response,the attacker prepends a malicious payload to the script and relays it back to the client( ),keeping the original cache control head-ers intact.Although the attacker could have ignored the client’s response directly and returned merely the pay-load after observing the request,it is stealthier to prepend the payload to the original script,because the page in the client’s browser can still make use of the provided func-tionality.
The combination of the attacker’s payload and the original script now resides in the client’s cache.As long as the cache entry is fresh,the client will not issue new requests for the script.Each time the client visits page A or any other page B that includes the same script,the malicious payload will be executed.For example,the payload could implement a JavaScript keystroke logger, or report detailed information about the client’s plugins and browser version to deliver a0-day exploit when the client is vulnerable(see§4).Because the attacker can modify the lifetime of the cache entry when returning the forged response,the script may reside in the cache for a long period of time–days,weeks,or even months. However,there are several conditions that trigger a validation of the cache entry by issuing a conditional re-quest,as represented by step in Figure1.Successful validation elicits a304response( ),which indicates that the client can keep using the resource.For example,Fire-fox validates the cache hitting reload[8],and our own ex-periments show that Safari validates cache entries upon startup.In future work,we plan to more closely examine the conditions that trigger validation.
Client Attacker Server the same shared medium,as commonly occurs within a
wireless network.For unencrypted and WEP encrypted networks,the medium is truly shared,in the sense that each station can see the traffic of all other stations if in range.For WPA/2encrypted networks,however,each station additionally negotiates an session key with the access point(AP)after jo
ining the network with the pre-shared secret key known to all stations.Any station that observes the session key negotiation also sees the session key.Consequently,when the attacker joins the network, only ongoing sessions are protected from eavesdropping until the next re-keying procedure.From this point on, we assume the attacker can observe all frames from the client,but not block,delay,or manipulate the communi-
cation of the client with other stations.
The scenario we consider now is almost identical to the one in§3.1,with the exception that the attacker has to do more work to poison the client’s cache due to the lack of the MITM position.Again,a client visits a site that includes a cacheable piece of JavaScript from another site,and the attacker’s objective is to add a malicious payload to the script.This time,the attacker can neither manipulate the request of the client nor the response of the server;thus the only way to interfere is via traffic in-
Client Attacker Server
GET script.js
to the client( ),indicating that the script is located at the attacker’s machine.Because the attacker close
s the con-nection after the redirect,the client does not consider the original response from the server anymore.The client then asks for the script from the attacker( ),who only sends back an ACK( )in order to keep the client wait-ing until the response arrives from the server( ).Now equipped with the response,the attacker forwards it to-gether with the reply to the client( ).
4Implementation
We implemented the above attack in400lines of Ruby code.Our tool,called air-poison,uses the LOR-CON[14]library5for driver-independent802.11frame injection via the Linux mac802.11stack[15].In princi-ple,air-poison tries to parse each frame seen in the air and check whether it is a HTTP GET request.If the 4The client’s connection to the server is only protected by the TCP sequence numbers which the attacker can observe and use to generate valid traffic.
5LORCON currently only supports injection with unencrypted net-works,although there is no conceptual obstacle to support injection in WEP and WPA/2encrypted networks.
GET request matches a regular expression specified on the command line,air-poison forges an cacheable HTTP response with the following headers:
•Cache-Control:max-age set to one year in the future
•Expires:set to one year in the future
•Last-Modified:set to the current time
In addition,air-poison includes a Connection header set to close in order to signal the victim to terminate the TCP connection after processing the re-sponse.This avoids a situation where the real response from the web server would potentially override the at-tacker’s response.After all,the HTTP standard recom-mends the use of the most recent answer in the case of the arrival of multiple responses,even if the previous an-swer is still fresh.At this point,air-poison simply injects the payload with the above headers without wait-ing for the original response to arrive.We plan to inte-grate response inclusion and302redirects when releas-ing air-poison to the public.
The user can use any payload with air-poison, which reads input from STDIN or via-r fromfile. Moreover,air-poison ships with a payload that reg-isters the victim as a“zombie”with the BeEF frame-work[4].When the payload executes,it establishes a bidirectional communication channel over JavaScript to the zombie,who can then be controlled by the attacker over a web front-end.For example,BeEF offers a variety of modules tofingerprint the browser and gather avail-able plug-ins versions,log keystrokes,detect whether the zombie uses Tor[6],capture cookies,perform port scans in t
he local network,or exploit the browser through metasploit XMLRPC[16].The communication channel exists until the victim navigates away from the site that included the payload.
To facilitate the exploitation process,we also im-plemented a DNS cache poisoning attack.The user can specify a regular expression that matches DNS re-quests,and answers them with a custom IP serving the payload.This is particularly effective because long-lived DNS cache entries are much harder to evict than JavaScript in the browser cache.It is also possible to tell air-poison to spawn a fake web server on the at-tackers machine that replies to any request with the pay-load.This is a powerful attack vector whenfirst poison-ing the DNS cache,and then redirecting the victim to the attacker’s fake web server.
5Case Study:Google Analytics
A particularly attractive target for the persistent cache poisoning attack is Google Analytics[2],a popular ser-vice for comprehensive tracking,reporting,and under-standing of user’s surfing behavior.The reports include information such as the amount of time a user stayed a page,the depth of the navigation,and the site to which the user went afterwards.Additionally,the decomposi-tion of traffic sources further breaks the visits down into visits from search engines,referring sites,and direct traf-fic.
Google Analytics is used by approximately32.2%of the Alexa’s list of the10,000most popular web sites[10]. To enable Google Analytics on a web site,the content provider includes a small piece of JavaScript that con-tains the site identifier and a link to an external script hosted on google-analytics with the bulk logic.The external script,named ga.js,has an expi-ration time of7days.However,when the browser val-idates the cache entry,Google returns a304status code indicating that the script has not changed since the end of November2009.In other words,a malicious cache entry could reside for roughly4month in the browser while remaining undetected.
In general,all widespread web applications that of-fload a cacheable API to the client are equally suscepti-ble.We intend to conduct a detailed evaluation about the prevalence of such cacheable APIs in the second project of this class.
6Owning the Medium
In this section we summarize several techniques to bring the attacker into a MITM position,which is required by the attack described in in§3.1.
The vast majority of efforts in securing wireless net-works have been traditionally focused on protecting the AP,without giving equal consideration to client-side se-curity.The lack of mutual authentic
ation exposes mul-tiple difficult-to-defend attacks against clients.One fun-damental issue shared by all major operating systems is probing for previously used APs in the list of preferred wireless networks.Due to the fact that802.11does not authenticate link-layer frames,anyone–not only the cor-rect AP–can respond to these probes.Off-the-shelf tools[1]allow an attacker to reply to these probes and transparently impersonate the requested network to real-ize a full MITM scenario,6without the client even notic-ing its occurrence.
cacheableThis transparent form of MITM is particularly dev-astating when an attacker gains control over random clients,such as on the street,in a coffee shop,at the air-port,or even in the plane where a user does not expect to 6This attack can even be used to crack the WEP key of a network solely by interacting with the client[18]and capture WPA/2hand-shakes to launch effective brute-force attacks[17].Furthermore,such attacks could be carried out over distances using high-gain directional antennas.
var url="HOST/beef/hook/beefmagic.js.php";
var script="<script language=’Javascript’src="+url+"></script>";
document.write(script);
Figure3:The default payload that ships with air-poison.The HOST running BeEF can be specified on the command line.The loadedfile beefmagic.js.php installs bidirectional JavaScript communication channel to the victim,allowing the attacker to query the browser version,capture keystrokes,sniff cookies,or launch browser exploits.The channel persists until the victim navigates away from the site loading the payload.
be connected.Mobile devices are especially vulnerable to this attack,as the infection occurs unnoticed without the device even leaving the user’s pocket.In general,any location with a high turnaround of mobile devices repre-sents an attractive gateway for infecting a large number of victims quickly,and allowing the malicious code to re-main persistent even long after the victims have left the scene.
Other known techniques to establish a MITM position are ARP spoofing[5]and DHCP spoofing[7].Addition-ally,DNS cache poisoning[9,3]represents a powerful off-path attack since the attacker does not to share the same local network.
7Related Work
Jackson et al.[12]identifies the problem of insufficiently isolated per-site caches,arguing that the browser does not apply the same-origin policy consistently when writ-ing to and reading from the cach
e.In other words,when cacheable third-party content hosted by a server S is in-cluded in site A,the browser does not restrict the cached content to be used only by A,but instead allows another site B,that includes the same third-party content via S, to use the cached copy that has been previously cached in the context of A.
To prevent cache content content from being used across domains,the authors propose partitioning the cache based on the embedding context.That is,the third-party content in the above example hosted by S should be cached for A and B separately such that the pairs(A,S) and(B,S)represent two disjoint caches.The authors implement this policy as an extension for the Firefox browser called SafeCache[19].Unfortunately,our at-tempts to test the extension failed due to incompatibility with the most recent version3.6of Firefox.
Cache partitioning would solve the problem of cross-domain cache access,yet no browser that we are aware of currently implements it out of the box.We strongly ad-vocate that browser vendors implement cache partition-ing on a per-site basis in order to limit the impact of the attacks presented in this paper.8Future Work
Our attack tool presented in§4currently does not spoof strong validators in the Etag header,but instead
always returns a Cache-Control,Expires,and Last-Modified header.Strong validator spoofing re-quires the attacker to copy the validator from the original into the forged response.We will implement this feature in future work.
In the second part of this project,we also plan to per-form an extensive trace-based study to understand the severity of this threat by analyzing the cache expiration and validation behavior observable in practice.In partic-ular,we plan to study the conditions that trigger cache validation in the browser,which is an important aspect because the attacker can forge the cache control head-ers with extremely long expiration times in order to sup-press validation through conditional requests.We al-ready identified that validation occurs at the startup of Safari,and when hitting the reload button.In order to as-sess the severity of this threat,it is crucial to have a clear understanding of the validation model.
Another future avenue worth investigating is the im-pact of this attack on mobile devices.An intriguing possibility is to target smartphones,since these devices put significant effort into joining wireless networks to achieve higher bandwidth.The question remains as to how aggressively the web browsers in these devices should cache data.If smartphones operate like the disk cache of regular web browsers,then it is straightforward to exploit this ,with the probe stealing at-tack sketched i
n§6.A unique aspect of an infected popu-lation of mobile devices is the ability to bridge the gap to attack the cellular network.For instance,a trivially dis-tributed denial-of-service could be to used to instruct the victim to download a largefile,which would be difficult for the provider tofilter.
Our initial experiments show that such mobile devices do not actually maintain a persistent cache.In particu-lar,the current version of Safari on the iPhone and iPod touch(iPhone OS version3)does not maintain a cache between sessions,so if another application is selected, the web cache is cleared(for any not-currently active
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论