Wednesday, 2 November 2016

Digital Forensic Software Validation Methods

Over the last two decades the world has experienced a major grown in IT technology and cyber crime. The range of devices that have the ability to store and process electronic data is rapidly increasing, devices such as iPods, digital cameras, laptops, notebooks, tablets and smart phones are responsible for this. We are living in a era where mobile, instant access to data is crucial to our day to day lives. Unfortunately, these devices are not always used for sincere purposes, they are often found to be involved with a large number of crimes that occur each year.

With the above taken into account, it isn't hard to see why digital forensic analysts and experts are becoming inundated with devices. Numerous police forces' digital forensics units around the UK are suffering due to a backlog of  electronic evidence (EE) collection and analysis. Wiltshire police were the highlight of a BBC Radio 5 Probe in 2015 which revealed that more than half of UK police forces were experiencing delays of at least 3 months for cases to be even allocated, "Five forces had devices which had not been examined after more than a year". (Gilbert, 2015)

To aid forensic investigators, there are various forensic software available out there in both commercial and open source markets. This is to increase the efficiency of investigations and also to increase the amount of devices that can be assessed than prior to using tools. 
Unfortunately, these tools can vary in their performance, their cost and most importantly, their accuracy. Currently, there are no requirements for tools to be vetted or validated before used within a court case or by an expert witness. However, software validation standards have been around for almost as long as digital forensic investigations. Some of the many standards that exist are: 

  • ISO/IEC 17025 - General requirements for the competence of testing and calibration laboratories
  • IEEE Standard for System and Software Verification and Validation. 2012

Whilst these standards exist for software to be validated and calibrated, they do not provide the entire methodology to testing a piece of software from start to end, no methodology can cover all grounds when it comes to validation of software.
Beckett (2007) expressed that bringing software to compliance with ISO 17025 is the best way to bring all other digital forensic tools (DFTs) to the same level of validity and reliability.

Research was performed into various software validation and evaluation methodologies that have been in use over the last couple of decades. Carl Erik Topp from NordTest (2014) created the TR 535 report which shows both a methodology and template report that implements the use of the ISO 17025 standards along with other recognized standards in software validation. This method was not chosen however as it was much more low-level than required for this assessment, going into depth regarding scope and low-level computing validation.

Daniel Ayers (2009) produced an article in proceedings of the ninth annual DFRWS conference regarding the current state of software validation of DFTs. He proposed several new requirements for the "second generation of DFTs" along with several evaluation metrics that should be considered during an assessment of a DFT.
Requirements from Ayers (2009):

  • Parallel processing
  • Data storage and I/O bandwidth
  • Accuracy and reliability
  • Auditability
  • Repeatability
  • Data abstraction
Evaluation metrics:

"Absolute speed – the elapsed (wall clock) time required to complete analysis.
Relative speed – the average rate at which the tool is able to process evidence compared with the rate at which data can be read from the original evidential media.
Accuracy – the proportion of analysis results that are correct.
Completeness – the proportion of forensic artefacts present in the evidence that are identified and reported by the tool.
Reliability – the proportion of tests where the tool executes successfully, does not crash or hang, and provides output in the documented format. (Note that this metric is not concerned with the accuracy of results.)
Auditability – the proportion of results which are fully auditable back to original evidence data, including documenting all computations performed to derive results and all assumptions and other inputs (such as configuration information set by the analyst) that are capable of influencing results.
Repeatability – the proportion of tests where it can be established, for example using detailed logs generated by the tool, that the process employed for analysis of an evidence item was exactly the process specified." (Ayers, 2009)

With the above taken in to account, for a valid attempt at evaluation software, a process (or methodology) must be followed to ensure thoroughness and completeness of validation.

Test Data Samples
There are multiple resources available for forensic reference data sets, such as the Computer Forensic Reference Data Sets (CFReDS) project by NIST. (2016) This project provides anonymous test data with detailed expected results. These generally appear to be for memory forensics and data imaging / recovery tools. No browser profiles were found during research of the various data sets that are provided by the CFReDS project nor external links from this project.

Whilst efforts were applied to finding browser profiles that matched Firefox, SeaMonkey and IceWeasel, none were found. 

Due to this, example user data profiles will be created with defined expected results, for example, 
when testing the ability of listing the users cache from Dumpzilla, several pages will be loaded and the web developer tab will be used to capture all files that are being cached by the browser. This can then be cross-referenced with the results of Dumpzilla.

A total of 5 test profiles will be created.

  • 3x Firefox profiles with various expected results per assessed function
  • 1x IceWeasal profile with at least one expected result per assessed function.
  • 1x SeaMonkey profile with at least one expected result per assessed function.

Testing Process: 
The below process will be followed when evaluation each function of the Dumpzilla DFT ( each function defined in another post).


  1. Virtual Machine is reverted to a previous snapshot before any testing had commenced.
  2. The test data is created via loading up the relevant browser and performing operations to input data into analyzed datastores.
  3. All background applications are closed down and the system is rebooted to clear system cache and volatile memory space. 
  4. The Dumpzilla script is then used, pointed at the current test profile. With one function used per run.
    1. Time of process is captured (Absolute speed)
    2. Results are logged for comparison (AccuracyCompleteness)
    3. Any errors are logged for reliability
    4. The step is repeated for a total of 3 times for a larger result set.
  5. Results are collated per browser and function.
  6. Results are compared for accuracy of the expected results.
  7. Any anomalous results are further investigated for root cause.


This blog will continue to be updated over the course of the next month showing the various stages of the testing process and creation of test data. All results are expected to be public by the 27th of November 2016.


Bibliography

Gilbert, D. (2015) Police force delay in forensic examination revealed by BBC probe. heraldscotland [Online], 9 November. Available from: <http://www.heraldscotland.com/news/> [Accessed 01/11/2016].

International Organization for Standardization (2005) ISO/IEC17025 General requirements for the competence of testing and calibration laboratories Second Edition [Online]. Switzerland: ISO. Available from: <http://www.uobaghdad.edu.iq/uploads/pics13/q1684/iso17025_eng.pdf> [Accessed 01/11/2016]

J. Beckett and J. Slay. (2007) Digital forensics: Validation and verification in a dynamic work environment: proceedings of the 40th Hawaii International Conference on System Sciences, January 3-6, 2007, Hawaii: IEEE Computer Society

National Institute of Standards and Technology. (2015) The CFReDS Project  [Online]. Available from: <http://www.cfreds.nist.gov/> [Accessed 01/11/2016]

NordTest. (2014) History - Nordtest.info [Online]. Available from: <http://www.nordtest.info/index.php/nordtest/history.html> [Accessed 01/11/2016]

Ayers, Daniel. (2009) Digital Investigation: A second generation computer forensic analysis system: proceedings of the Ninth Annual DFRWS Conference, September, 2007 Montreal: Elsevier Ltd

Monday, 31 October 2016

Testing Environment & Setup


Testing Environment 
A test lab was configured for the purposes of this review, below are the specifications:

VMWare Virtual Machine (Workstation, 12)
Operating system: Windows 7 Professional
Memory: 2gb
Hard Drive: 50gb
Virtual Network adapter with host only and bridged networking.

This virtual machine is configured with hourly snapshot backups to ensure retention of the lab environment. This will also allow for a system rollback if required. See Figure 1.

Figure 1: Virtual Machine Settings


Additionally, snapshots before important changes will be made also.

Software:
None, vanilla install of Windows 7 Professional.

 
The following has been installed to meet the pre requisites of Dumpzilla:
Python 3.5.2 (64-Bit)
GnuWin32 File utility (Required by Magic Module)

Python Modules:
The following additional modules were required by Dumpzilla
Magic Module (https://github.com/ahupp/python-magic)

Additional Software & Configuration: 
Notepad++ - An extended text editor.
Environment Variable - PYTHONIOENCODING=UTF-8
This environment variable is suggested as directed by the Python 3.x Wiki (2012) to prevent unprintable characters flooding the output of the tool.


Software Installation Timeline:
This forensics tool does not install, instead it is utilized by using the python 3.x binary. As such, there isn't much of a timeline that can be shown here, nor the system changes that have occurred.
Various DLL files that required for the Magic Module and GnuWin32 File Utility were installed into the system32 folder of windows so that they can be accessed later via the default path environment.

Some screenshots were taken to visualize the installation process of these pre-requisites.
Figure 2: The Python installation folder was added to the windows PATH environment variable so that it can be accessed from the command prompt.,

Figure 3: Magic Module being installed (python setup.py install)

Figure 4: A screenshot showing the Dumpzilla script being ran for the first time, and outputting the syntax / help menu.

Once these had been installed, the latest version of the Firefox web browser was installed and several searches were performed along with the installation of two addons. The Procmon tool by Microsoft (Russinovich, 2016) was used to capture all system events during the processing of the Firefox profile with Dumpzilla. This garnered a large amount of results and thus has been compiled into parsable XML output and is available for download and viewing here! (Easton, 2016)

The summary of events from Procmon for the python process is shown below in Figure 5.
Figure 5: Summary of events from the Python process whilst running Dumpazilla



Bibliography
Busindre. (2013) Dumpzilla Manual [Online]. Available from: <http://www.dumpzilla.org/Manual_dumpzilla_en.txt>[Accessed 31/10/2016]

Python Wiki (2012) PrintFails, 2012-11-25 11:32:18 [Online]. Available from: <https://wiki.python.org/moin/PrintFails>[Accessed 31/10/2016]

Russinovich, M. (2016) Process Monitor [Online]. Available from: <https://technet.microsoft.com/en-us/sysinternals/processmonitor.aspx>[Accessed 2/11/2016]

Easton, C (2016) Procmon_Dumpzilla.txt [Online]. Available from: <https://www.dropbox.com/s/fo23x5zces3zpz2/Procmon_Dumpzilla.txt?dl=0>[Accessed 2/11/2016]



Tool Introduction: DumpZilla

(Busindre, 2013)



"Dumpzilla application is developed in Python 3.x and has as purpose extract all forensic interesting information of Firefox, Iceweasel and Seamonkey browsers to be analyzed. Due to its Python 3.x developement, might not work properly in old Python versions, mainly with certain characters. Works under Unix and Windows 32/64 bits systems. Works in command line interface, so information dumps could be redirected by pipes with tools such as grep, awk, cut, sed… Dumpzilla allows to visualize following sections, search customization and extract certain content." (Busindre, 2014)

The application of choice for this assignment is Dumpzilla, as mentioned above, this python tool is designed to analyze the Firefox, Iceweasel and Seamonkey web browsers user data and then display the information retrieved visually.

The tool can be downloaded from: http://www.dumpzilla.org/, the latest version was released in 2013. However, there is a GitHub repository which has not been updated since the beginning of 2016 which contains the same file that can be found on the Dumpzilla main website. The author of this tool goes by the name of Busindre. This is an independently authored tool that was not published by a company.

Claims:
Busindre, the developer of Dumpzilla claims that the application is capable of visualizing the following:
 "- Cookies + DOM Storage (HTML 5).
 - User preferences (Domain permissions, Proxy settings...).
 - Downloads.
 - Web forms (Searches, emails, comments..).
 - Historial.
 - Bookmarks.
 - Cache HTML5 Visualization / Extraction (Offline cache).
 - visited sites "thumbnails" Visualization / Extraction .
 - Addons / Extensions and used paths or urls.
 - Browser saved passwords.
 - SSL Certificates added as a exception.
 - Session data (Webs, reference URLs and text used in forms).
 - Visualize live user surfing, Url used in each tab / window and use of forms. 
Dumpzilla will show SHA256 hash of each file to extract the information and finally a summary with totals.
Sections which date filter is not possible: DOM Storage, Permissions / Preferences, Addons, Extensions, Passwords/Exceptions, Thumbnails and Session"(Busindre, 2013)

Three different browsers are supported by the tool, these are:
- Firefox (Win, Linux, Mac)
- IceWeasel (Win, Linux, Mac)
- SeaMonkey (Win, Linux, Mac)

No information is provided as to whether or not the support of these browsers differs over versions of the tool, however the storage of information between these browsers are almost identical. All of which stored information within SQLite Databases.


Licensing
The licensing of Dumpzilla is defined as GPLv3+: GNU GPL version 3 or later from the main website and manual. (Busindre, 2013) From the GPLv3 website, this license is expressed to be a non-restrictive, open-source license which allows anybody to modify, update or share the software. This is also known as copyleft: the software is copyrighted, but instead of using those rights to restrict users like proprietary software does, the rights are applied to ensure every user has freedom. (Smith, 2014)

As this script is free and open source, an analysis of the source code can take place to further investigate how the tool performs its many functions.


Bibliography:
Busindre. (2013) Dumpzilla Logo [Online image]. Available from: <http://www.dumpzilla.org/dumpzilla.png>[Accessed 31/10/2016]

Busindre. (2013) Dumpzilla Manual [Online]. Available from: <http://www.dumpzilla.org/Manual_dumpzilla_en.txt>[Accessed 31/10/2016]

Smith, B. (2014) A Quick Guide to GPLv3, 11th November [Online]. Available from: <https://www.gnu.org/licenses/quick-guide-gplv3.html>[Accessed 31/10/2016]



Saturday, 29 October 2016

Hello World...

This blog has been created to track the progress and to show the results of a critical review of a browser digital forensics analysis tool, Dumpzilla by Busindre. Available from: http://www.dumpzilla.org/

Over the course of the next two months, this blog will be kept maintained to show all the results and 'behind the scenes' of this review.

This blog has been created for Leeds Beckett University by Christopher Easton (c3398352) as part of the Digital Forensics Analysis module of the BSc Hons Computer Forensics & Security final year program.