Cenzic 232 Patent
Paid Advertising
web application security lab

Searchable SWFs

I got forwarded this link today from businesswire about how Google and Yahoo are now going to be armed with the information necessary to look at and extract information out of SWF files. Ho-boy, here we go. The link was sent to me with the “bad juju” caveat, and I’m pretty sure I agree.

The problem is, like anything, if the search engines start pulling down rich applications that actually interact with the web application, there is untold issues that could arise. For instance, Flash applications have quite a bit of rich features in them, and some of that could be dangerous if they interact with back end applications. Also, if the word “test” appears in a Flash movie, does that mean it should get indexed? Or is it a frame that’s not visible, or off the side of the page, or whatever? What if it takes ten minutes to find that particular line of text or dozens of sub-menus? Are people really going to sit for that?

Do people really want to load a Flash movie when they query for things? I know I sure don’t! I’m already annoyed when I get linked to PDF files or .docx files. I think this just takes searching to a new level where people don’t actually want to go. Instead of crawling deeper and refining their search, the search engines are going to new mediums to stave off the people (like myself) who have argued that Flash isn’t a good medium for accessibility, usability and SEO. SEO is going to be off the table soon enough, leaving accessibility and usability.

But seriously, what’s next? Are the search engines going to decompile Java applets looking for text? As a side note, this should, at least in the short term, lead to a new round of Flash hacking, once it goes live. I’ll give a tee-shirt to the first person who writes a Google dork for internal Flash text that leads to exploitation.

21 Responses to “Searchable SWFs”

  1. Ory Segal Says:

    Hey,

    What I really don’t get is how people are going to reach the information that was indexed?

    let’s assume that Google/Yahoo manage to extract information buried deep inside Flex/Flash applications - how would people reach the application state, which holds that text? will they supply a deep-link somehow?

  2. RSnake Says:

    You’ve got me, Ory. I wish I knew. That would remove one major accessibility issue though, that’s for sure. Being able to bookmark “part” of the application state would cause interesting consequences though, I’m sure.

  3. Ron Says:

    Very interesting points RSnake. One question, what did you mean when you said

    “SEO is going to be off the table soon enough, leaving accessibility and usability.”

    Do you think the SEO business is going to become obsolete ?

  4. Shawn Lauriat Says:

    Flash applications have for years had the same basic history/bookmarking issue that Ajax-based web applications have more recently discovered. This new trickery will only emphasize that, as you can provide a link to someone to paste into an email or what have you, but that won’t work for search engines.

    Just you wait. I bet we’ll find out how many Flash files store passwords in clear, searchable text. :-)

  5. Michael Says:

    Shawn — From reading various articles, it looks like only visible text that users see will be indexed.

    “Q: How does Google “see” the contents of a Flash file?
    We’ve developed an algorithm that explores Flash files in the same way that a person would, by clicking buttons, entering input, and so on. Our algorithm remembers all of the text that it encounters along the way, and that content is then available to be indexed. We can’t tell you all of the proprietary details, but we can tell you that the algorithm’s effectiveness was improved by utilizing Adobe’s new Searchable SWF library.”

    Now what I don’t know is, can Google/Yahoo decompile SWF files and then index the contents? If they can, then I’d start to worry about passwords, usernames, database paths, etc.

  6. Awesome AnDrEw Says:

    What you have neglected to acknowledge and mention, or perhaps did not even consider (though knowing your style you most likely have, RSnake) is that this issue will also serve as a seminal method for malware delivery much in the same way that Search Engine Optimization has basically been exploited in order to achieve better search result perfomances. Since cloaking works so well as it is what is going to happen when someone goes ahead and pulls the old “Bait and Switch” to effectively lure victims into going ahead and viewing the Flash file? What about issues such as Mark Dowd’s Flash Bytecode attack (http://documents.iss.net/whitepapers/IBM_X-Force_WP_final.pdf)?

  7. Jon A. Longoria Says:

    I was waiting for one of the search engines to pick up on this since decompilation of .swf files has been readily feasible since about 2000/2001. It’d be interesting to see how they plan to direct traffic to a specific portion of the flash application hosted on the site.

    Querying the content isn’t really an issue as Flash is searchable so long as the content itself is delivered from a remote include like an XML document. Directing to a specific frame in the .swf is another story, however - i’ve not personally had any success in pointing a crawler to a instance in my timelines. I wonder how they’ll produce calls to retrieve those searched instances. Additionally, how will the crawler determine the difference between a flash site and a flash advertisement?

    And what of the .swf’s that utilize intentional math to generate their motion as opposed to consecutive frames, as I do some more intensive motion intensive projects so that even slower machines can run them - the timeline itself isn’t the same since the AS (ActionScript) is directing to specific frames as opposed to running across the timeline itself. Does this mean that inherently Google’s search engine will be extracting the AS itself? Isn’t that moderately dangerous, especially when my application might include an authentication mechanism?

    I don’t see this as necessarily being beneficial to the consumer and certainly a problem for flash developers who rely on the ability to lure the user into their interfaces for the purposes of their project. If you can pick and choose your content from the search engine, that defeats the purpose of developing a interface in the first place.

  8. Lode Says:

    Wouldn’t it be just like pdf? You can open the pdf right as pdf or go to the HMTL view. In this case you could go to the flash file itself or view the HTML/text view.

  9. RSnake Says:

    @Ron - sorry, no, what I was trying to say is that SEO is no longer a reason not to use Flash if this works as advertised. Although, I have my doubts on people being able to “bookmark” the complex states capable within Flash, so I’d argue that it’s still on the table unless some miracle happens.

  10. istari Says:

    @Shawn Lauriat

    You can already find hardcoded passwords in swf files quite easily. Just search for “admin password -username filetype:swf”. As usually there’s only one admin password, it’ll be hardcoded in the actionscript (or its MD5 will). Some *sec-wise* developers think that this attack can be avoided by including a txt file with a static address in the same server and public permissions :-D

  11. Yousif Says:

    Hmm, well I spotted something similar to extracting the SWF object information from a file. Actually, I beleive the Internet Archive: Wayback Machine has already implemented such a feature. I was browsing around a page that loaded an SWF and in the source of the page it read Loading, and play a bunch of times; all differently– exactly the text that was inside the SWF file.

  12. Brent Says:

    Dangerous or not this appears to me a mad dash to compete with the fact that Silverlight content can easily be searchable. Your point is extremely valid for both Flash and Silverlight!

  13. Sarwar Erfan Says:

    This opens the door to develop SEO friendly web apps.
    Though some may have problems who developed and embedded sensitive information in the swfs, but, this also compels them to get rid of the security holes also. Anyone can copy the swf and decompile to get the swf

  14. MBridge Says:

    Over 10 years ago I watched as someone was able to control another user’s computer through a windows media file. All the end user had to do was play the file off a web-site to lose control of their machine.

    This unfortunately may have the same exact result in some cases. By immediately accessing a flash file this could open the door to an extremely large amount of malware entering users’ computers.

    Google and other search engine providers should allow users to opt out of seeing Flash videos (or any videos including YouTube) at the end of their search queries.

    http://www.MBridge.com

  15. keith jones Says:

    Yahoo have a large organised criminal element. http://endmafia.com

  16. newkaiza Says:

    Only for information

    there is a XSS here :D

    http://www.businesswire.com/portal/site/google/?ndmViewId=news_view&newsId=20080630006649%20and%3CIMG%20SRC=%3E&newsLang=en

  17. keith jones Says:

    more yahoo mafia problems here. http://s216606257.websitehome.co.uk/tony.htm

  18. keith jones Says:

    Google along with Yahoo are heavily involved with Mafia who use captive women. http://endmafia.com
    Tak my advice & stick with Microsoft, a lot of this open source & ‘free’ stuff is a vehicle for spyware.

  19. Gorka Says:

    Well, I just believe it’s stupid to try and index flash or ajax content: in a well formed MVC framework flash only acts as the part that prints out content and asks for it, so truly well done apps don’t have any real content within the swf (or Javascript in AJAX).
    In my opinion search engines should look for a way to access raw data instead of formatted one.

    Cheers,
    Gorka

  20. Kenan Says:

    google have big technology, maybe they have already. only using for usa goverment or fbi . :)) Google collect any information all google user .. they are start to putting cam. all street. is it problem to decompile swf or ajax …

  21. whatever Says:

    What’s next?

    Stop using the browser where one can use userland applications devoted to particular tasks. That is, the UNIX philosophy.

    Ultimately “it’s all just text”. This is what Google is proving.

    Perhaps, if one cannnot read it as text, then it should be run outside the browser by the appropriate application.

    A browser is for reading html (which is just text).

    A video player is for playing video.

    A text editor is for editing text.

    A pager is for reading text.

    An http client (wget, etc.) is for communicating through http.

    etc.

    Why are we trying to use a browser to view video? This is why swf exists.

    I can think of many reasons why this decision was made years ago, and none of them are legitimate.

    And perhaps users have a “right” to know what code is being run on their devices. The .js, .swf and other similar paradigms aimed at “interactivity” abrogate that right.

    If a user wants to “interact with the web”, let him learn http. Let him understand the client-server paradigm.
    For that is what the web really is.

    Otherwise, he is not “interacting with the web”, he is letting someone else run their programs on his device and interacting with those applications, whatever they might be doing.