Posts by alexkemp

alexkemp · Other Issues

sivaprema wrote:

in fontforge ".._ttf is not a known format

The use of an underscore (“_”) concerns me there. In general Linux uses mime (or file) to discover what a file actually is, rather than the Window$-inspired “.tld” convention.

I do not have fontforge installed.

Possibly one way to begin to diagnose your situation is from the command-line. If you have mlocate installed (to quickly locate files) and either FireFox and/or Thunderbird installed then you will be able to locate this specific TTF font:

$ file /usr/lib/firefox-esr/fonts/TwemojiMozilla.ttf
/usr/lib/firefox-esr/fonts/TwemojiMozilla.ttf: TrueType Font data, 17 tables, 1st "COLR", 12 names, Macintosh, type 1 string

That should help to begin to help discover exactly what your system thinks any particular ttf file is.

One final other common source is mscorefonts and/or fonts-liberation:

$ apt search fonts-liberation
Sorting... Done
Full Text Search... Done
fonts-liberation/stable,now 1:1.07.4-11 all [installed]
  Fonts with the same metrics as Times, Arial and Courier

fonts-liberation2/stable,now 2.1.3-1 all [installed]
  Fonts with the same metrics as Times, Arial and Courier (v2)

ttf-mscorefonts-installer/stable,now 3.8 all [installed]
  Installer for Microsoft TrueType core fonts

$ file /usr/share/fonts/truetype/msttcorefonts/Arial.ttf
/usr/share/fonts/truetype/msttcorefonts/Arial.ttf: TrueType Font data, digitally signed, 23 tables, 1st "DSIG", 70 names, Unicode, Typeface \251 The Monotype Corporation plc. Data \251 The Monotype Corporation plc/Type Solution

HTH

alexkemp · Hardware & System Configuration

Hi amc252.

Every electronic component within your computer has a driver associated with it that allows that component to "play along". Monitors are no different to anything else. Therefore, your first search can involve finding a Chimaera driver for the digital TV.

The miracle of modern electronic equipment was made far easier with the introduction of PnP ("Plug 'n' Play"). That relies on a digital connection & various subsystems, and is what allows something like a monitor to be plugged in, auto-detected by the computer, recognised, the driver auto-located via the internet, auto-downloaded & auto-installed. Now, a USB connection is certainly digital but is more used for modems or HDD & little used for connecting monitors - HDMI connections are the standard for that.

Check your computer: does it have a HDMI port?
Check your digital monitor: does it have a HDMI port?

If the answer to the two questions above are both "yes" then you may be in business very quickly; just make sure that both ports are switched "on" in the setup for both machines, and that your computer is connected via an Ethernet port to the internet before you make the HDMI connection (Ethernet is 'old school' & thus provides few problems cf WLAN).

If the above is not possible & you are determined to go ahead with RCA & such-like then bring your will up-to-date so that afterwards others can realise the reasons for your suicide.

Good luck.

alexkemp · Other Issues

I've got two drives that are USB-connected HDD:

Seagate 2TB portable
(this is formatted using standard Linux utilities to (so-called) FAT64 (HPFS/NTFS/exFAT: max 2TB))
WD (Western Digital) 4TB portable ("My Passport")
(this is native M$ format ("Microsoft basic data") and I cannot find a linux utility that can format and/or repair it to it's current state)

The advantage of the former is that it is ubiquitous across many different OS. As an example, my ancient Samsung TV can read & play movies from (1), but not (2).

The advantage of (2) is that it can store above the 2TB threshold. Also, astonishing that I may need it to. I left the disk in it's supplied format since that can be read by more OS than a native Linux format.

As long as you have a 64-bit cpu then (as I understand it) either disk can be read up to the max of the cpu (which I cannot recall as I sit here, but much, much more than 4TB).

alexkemp · Installation

OK. Thanks to admin (although the OP has been further edited).

alexkemp · Installation

Well, now that you have edited it, it reads "daedalus" although when first posted it read "deadalus".

I was attempting to both be light-hearted in my response, and also warning other folks not to blindly copy your [ code ]'ed config since it contained a spelling mistake. Also, I personally only ever 'code' actual results without editing them so that others can trust that what I code is actually what I got as a result.

You need to address your remarks on the absence of non-free to fsmithred, since he states that yes, that does work whereas security & updates will not. I have no personal experience to be able to comment.

alexkemp · Installation

For those that realise that daedalus is *not* dead, do not copy brday's code.

The Devuan package information is here, and the sole Default configuration shown for daedalus is as follows on that page:

Devuan Daedalus (testing) wrote:

deb http://deb.devuan.org/merged daedalus main

@brday:
If your code was copied from the terminal then you likely have a reason (speling), otherwise non-free & contrib are not available for deadalus, though they may be for daedalus.

alexkemp · Devuan Derivatives

prospero wrote:

a new Chimaera live iso from two days ago

Possibly due to a recent kernel upgrade to 6.1.12-1 (available from backports):

$ uname -a
Linux ng3 6.1.0-0.deb11.5-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.12-1~bpo11+1 (2023-03-05) x86_64 GNU/Linux
$ la /boot/initrd* /boot/vmlinuz*
-rw-r--r-- 1 root root 68062772 Mar 28 22:35 /boot/initrd.img-6.0.0-0.deb11.6-amd64
-rw-r--r-- 1 root root 68913767 Mar 31 17:35 /boot/initrd.img-6.1.0-0.deb11.5-amd64
-rw-r--r-- 1 root root  7730784 Dec 19 14:14 /boot/vmlinuz-6.0.0-0.deb11.6-amd64
-rw-r--r-- 1 root root  7866720 Mar  5 18:27 /boot/vmlinuz-6.1.0-0.deb11.5-amd64

My actual update was March 31 (I update daily):

$ la -clt /boot/initrd* /boot/vmlinuz*
-rw-r--r-- 1 root root 68913767 Mar 31 17:35 /boot/initrd.img-6.1.0-0.deb11.5-amd64
-rw-r--r-- 1 root root  7866720 Mar 31 17:35 /boot/vmlinuz-6.1.0-0.deb11.5-amd64
-rw-r--r-- 1 root root 68062772 Mar 28 22:35 /boot/initrd.img-6.0.0-0.deb11.6-amd64
-rw-r--r-- 1 root root  7730784 Jan  4 10:14 /boot/vmlinuz-6.0.0-0.deb11.6-amd64

alexkemp · Desktop and Multimedia

Hi Ralph

If you have a suggestion(s) I'll investigate it/them. However, I'm used to GitHub now & it is free. Of course, that *is* what was said about MSIE…

alexkemp · Desktop and Multimedia

It is said that the connection between Rats & Bulldogs lies in the construction of their jawbones: once they bite, neither can release their teeth until the jaws clamp together (due to a ratchet mechanism that joins the upper & lower jawbone). I sympathise with both species; my mind has a similar mechanism.

I finally spotted how to determine the precise length of the embedded URL within each cache (simple) Entry file. It is now possible to collate all urls, data-lengths, etc.. That finally opens the possibility to providing url + file listing, search, selection + individual extraction. However, that will all have to wait for later. For now, it is a simple utility that extracts all cached files (or just one file) into a single directory (listing below).

There is a commented-out print-line almost at the bottom of the script. It can produce a listing of all files for you. The following from a terminal can do that (comment the $DD lines & uncomment the PRINT line first):

~/Personal/.getCC > temp.txt; sort -n temp.txt > mime.txt;

The Cache contains all kinds of corrupted files. There are lines in the script to try to catch those; the notices go to STDERR so it will not corrupt your mime.txt.

Note that there has been a radical reset of almost all code, which creates some disjuncture between current code & earlier BugFix comments. $magic is still in the code but is unused now.

If I cannot stop myself producing a file browser then I shall place the code into GitHub, so that this thread can finally sleep.

#!/usr/bin/perl
   # get Chrome Cache
   # suggestion: save as ~/.getCC; chmod +x; chmod 700

   #  A PERL script to iterate through Chromium/Chrome 'Cache_Data/' dir
   #+ & extract all http-delivered files stored within those data-files

   # 2023-03-21:         Finally found location of URL-length
   #                     (& thus how to find start of content for all files)
   # 2023-03-16: bugfix: Account for Content-Encoding invalidating file-magic
   # 2023-03-12:         Account for multiple http version + 200|203 status
   # 2023-03-08: bugfix: COUNT removed; LEN used instead
   #           +        (FOFF used for BEG, not COUNT)
   #           +         brotli now works
   #           +        (no magic for brotli (a mistake imo))
   # 2023-03-07: bugfix: corrected miss on most magic files (my bad)
   #           +         excluded compound header fields to eliminate wrong values
   #             added $FOFF (diff between HTTP-begin ($END - $LEN) & magic-begin ($BEG))
   #           + (*every* file with both $BEG & $LEN has diff == x34) (h-begin is bigger)
   #           + thus if no magic but LEN then BEG = END - LEN - 52
   #           +      if magic but no LEN then LEN = END - BEG - 52 (yes, this *does* happen)
   # 2023-03-05: bugfix: coded to exclude 711 zero-length files
   #           +         account for multiple-same-value $mime (fixes ~1000 gif + jpg files)
   #           +         added 'Content-Encoding:br' Brotli compression
   #           +        (you may need 'sudo apt install brotli' to view those files)

   use strict;
   use warnings;
   use autodie;
   use experimental qw( switch );

   # Global CONSTANTS
   my $UNBROT= "/usr/bin/brotli -d";                                   # change to your location
   my $DD    = "/bin/dd";                                              # - ditto -
   my $GUNZIP= "/bin/gunzip";                                          # - ditto -
   my $TOUCH = "/usr/bin/touch";                                       # - ditto -
   my $IN    = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data"; # Chromium cache folder
   my $OUT   = "/home/alexk/Personal/ChromeCache/Files/";              # Place to extract files to
   my $FOFF  = 52;                                                     # Offset of HTTP-begin from magic-eof (BEG) + LEN
   my $HTTP  = "HTTP/1.1 200";                                         # '200 OK' not in all files
   my $MEOF  = "\x{d8}\x{41}\x{0d}\x{97}\x{45}\x{6f}\x{fa}\x{f4}";     # Magic End bits (last 8 bytes of every simple cache Entry file data record)
   my $MENT  = "\x{30}\x{5c}\x{72}\x{a7}\x{1b}\x{6d}\x{fb}\x{fc}";     # Magic Start bits (1st 8 bytes of every simple cache Entry file data record)
   my $MURL  = "_dk_";                                                 # Magic Start for URL (url follows within cache Entry file data record)

   # save algorithm:
   # 1) $URL/@URL: find $key_length from header
   # 2) $BEG;$END;$LEN: obtain data start+end (from $key_length + $MEOF)
   # 3) only save HTTP 200 files ($HTTP)
   # 4) $HTTP;$BROTLI;$GZIP;$MIME;$MOD;$TLS: obtain http header fields (from $MEOF + $FOFF)
   # 5) extract section $BEG to $END from $IN file into $OUT dir
   # 6) $MOD: touch file to conform with http header date
   # 7) $BROTLI;$GZIP: decompress gzip/brotli files

   # Stats 2023-03-06:
   # 10978 HTTP 200 from 23594 files in Cache_Data
   #     6 do NOT contain a MIME field
   # 10979 files saved to disk (real	1m23.219s)

   # chromium cache in 2023 is a "simple cache"
   # see https://www.chromium.org/developers/design-documents/network-stack/disk-cache/very-simple-backend/
   # see https://chromium.googlesource.com/chromium/src/+/HEAD/net/disk_cache/simple/simple_entry_format.h
   # see https://github.com/JimmXinu/FanFicFare/blob/main/fanficfare/browsercache/browsercache_simple.py
   # start-of-record magic-marker == 30 5c 72 a7 1b 6d fb fc
   #   end-of-record magic-marker == d8 41 0d 97 45 6f fa f4
   # (data ends immediately before eor)
   # (http header starts 44 bytes after eor, and thus 44+8=52 bytes (\x34) after end-of-data)
   # (eor also ends file; 16 bytes then follow to actual end-of-file)
   # from FFF: (finally found url-length location)
   # cache Entry-file header = struct.Struct('<QLLLL') [little-endian | 8-byte | 4-byte | 4-byte | 4-byte | 4-byte)
   # (magic, version, key_length, key_hash, padding) = shformat.unpack(data)
   # Parse Chrome Cache File; see https://github.com/JimmXinu/FanFicFare/blob/main/fanficfare/browsercache/chromagnon/cacheParse.py

   opendir( my $d, "$IN") or die "Cannot open directory $IN";          # Open cache dir
   my @list 
      = grep { 
         !/^\.\.?$/                                                    # miss /. + /.. files
         && -f "$IN/$_"                                                # is a file (not dir, etc)
	} readdir( $d );
   closedir( $d );
   foreach my $f (@list) {                                             # Iterate through each cached data-file
#     my $f    = "be75a13d44e548da_0";
      # section variables
      my $BEG  = -1;                                                   # Extract begins (bytes)
      my $BROTLI = 0;                                                  # brotli encoding (0/1)
      my $END  = -1;                                                   # Extract ends   (bytes)
      my $GZIP = 0;                                                    # gzip encoding (0/1)
      my $HPOS = -1;                                                   # 'HTTP' string begins (bytes)
      my $HSTA = -1;                                                   # 'HTTP' status string (only interested in '200' or '203')
      my $HVER = '';                                                   # 'HTTP' version string (eg '1.1')
      my $LEN  = -1;                                                   # content-length
      my $MAGIC = '';
      my $MIME = "";                                                   # content-type
      my $MOD  = "";                                                   # last-modified
      my $OFF  = -1;                                                   # Offset of magic from file beginning
      my $TLS  = "";                                                   # TLS==Three Letter Suffix
      my $URL  = "";                                                   # url within cache Entry file
      my @URL  = "";                                                   # same url as an array
      my $UPOS = "";                                                   # position of url start in Entry file
      open my $fh, '<:raw', "$IN/$f" or die "Cannot open file $IN/$f";

      # 1 Obtain url length then url
      # $key_length starts from byte 24 (\x18), normally begins with an 8-byte string '1/0/_dk_', then stretches to the end of the URL sequence
      # the std 8-byte string indicates that two streams (1 + 0) are included within the file
      # the request-url sequence is 2 x (normally-identical) base urls then the full request url, each separated by a single space
      # data supplied to request url begins immediately after the url, and ends immediately before the $MEOF magic-marker
      # http response headers begin 44 bytes after the end of $MEOF, starting with HTTP Status string at $HPOS
      # none of the "std" response headers can be *expected* to exist, though most do
      # all sorts of stuff exists after initial response header bundle, many of which I do not understand
      #+ including content-servers such as amazon, certificates, proxy-servers, others
      # this second stream (for std 2-stream files) ends with another $MEOF 16 bytes (\x10) before eof
      # eg1: "1/0/_dk_https://bbc.co.uk https://bbc.co.uk https://static.files.bbci.co.uk/core/bundle-service-bar.003e5ecd332a5558802c.js"
      #  \x18 ^       ^ $UPOS (=32 =\x20)                ($key_length =123 =\x7b; note: 24+123 =147 =\x93)                         \x93 ^
      # eg2: "d8410d97 456ffaf4 01000000 24be2bf3 8d010000000000005814000003654702 acd8b17d9a552f00b8a4b27d9a552f00 40040000 HTTP/1.1 200"
      # \x220 ^           \x228 ^           \x230 ^                          \x240 ^                          \x250 ^        ^ $HPOS (=596 =\x254)
      my $bytes_read = read $fh, my $bytes, 24;
      die "Got $bytes_read but expected 24" unless $bytes_read == 24;
      my ($magic, $version, $key_length, $key_hash, $padding) = unpack 'a8 a4 a4 a4 a4', $bytes;
      if( unpack('Q', $magic ) ne unpack('Q', $MENT )) {
         $magic = unpack('H16', $magic );
         $MENT = unpack('H16', $MENT );
         die "'$IN/$f' is not a cache entry file, wrong magic number\n (got '$magic' not '$MENT')";
      }
      seek( $fh, 0, 0 );                                               # return to start of file
      read( $fh, my $cache_buffer, -s "$IN/$f" );                      # put whole file in $cache_buffer
      close( $fh ) or die "could not close $IN/$f";
      # Obtain url
      if( $cache_buffer =~ /$MURL/ ) {
         $UPOS = $-[0] + 4;                                            # url begins immediately *after* marker string
         $key_length=unpack('L', $key_length );
         $key_hash  =unpack('H16', $key_hash );
         $URL  = substr( $cache_buffer, $UPOS, $key_length - ($UPOS - 24));
         @URL  = split(' ', $URL );
      }

      # 2 Obtain data start+end
      $BEG = $key_length + 24;
      $END = index( $cache_buffer, "$MEOF", $BEG);
      if( $END < 1 ) {
         print STDERR "'$IN/$f': error finding end of data at $0 line:". __LINE__ ."\n";
         next;                                                         # immediately skips up to foreach() + increments $f
      } else {
         if( $BEG == $END ) {                                          # yes, some pages have Content-Length:0
            $LEN = -1;
         } else {
            $LEN = $END - $BEG;
         }
      }

      # 3 Only extract from HTTP 200|203
      if( $cache_buffer =~ /\x{00}\x{00}HTTP\/(\d.\d*)\s(200|203)/i ) {
         $HPOS = $-[0] + 2;
         if( $HPOS != $END + $FOFF) {
            print STDERR "'$IN/$f':  error finding start of http at $0 line:". __LINE__ ."\n";
            next;                                                      # immediately skips up to foreach() + increments $f
         }
         $HVER = "$1";                                                 # http version; always HTTP/1.1 for me
         $HSTA = "$2";                                                 # http status; we are only interested in 200 or 203
         $HTTP = "HTTP/$HVER $HSTA";
         # 4 Obtain http header fields
         if( $LEN > 0 ) {                                              # yes, some pages have Content-Length:0
            if( $cache_buffer =~ /\x00Content-Encoding:\s*br/i )   { $BROTLI = 1; }
            if( $cache_buffer =~ /\x00Content-Encoding:\s*gzip/i ) { $GZIP   = 1; }
            if( $cache_buffer =~ /\x00Content-Length:\s*(\d+)/i )  {
               if( $1 != $LEN ) {
                  print STDERR "'$IN/$f': data-length \$LEN=$LEN differs from http Content-Length=$1 at $0 line:". __LINE__ ."\n";
               }
               if( !$1 ) { print STDERR "'$IN/$f': len=0 at $0 line:". __LINE__ ."\n"; }
            }
            if( $cache_buffer =~ /\x00Last-Modified:\s*([ A-Za-z0-9,:]+)/i ) {
               $MOD = $1;                                              # some web servers ignore case + introduce spaces!
            } else {
               if( $cache_buffer =~ /\x00Date:\s*([ A-Za-z0-9,:]+)/i ) {# did page did not want to be cached? (Chromium did it anyway!)
                  $MOD = $1;                                           # (all pages should have a date (or a Date))
               }
            }
            if( $cache_buffer =~ /\x00Content-Type:\s*([a-z-]+\/[a-z0-9.+-]+)/i ) {
               $MIME = $1;
            } # variable $1 NOT reset on failed match (v stupid)
         } else { next; } # if( $LEN > 0 )

         # easy to mixup mime/media-types & encoding (compression schemes) here
         # Content-Type     == mime-type          refers to the type of file that is being transferred
         # Content-Encoding == compression scheme refers to the type of compression used during transfer
         # so, a text file (js txt xml, etc) with gzip magic will be a gzipped-textfile (eg file.xml.gz)
         # gzip encoding (+ brotli) are only support; deflate no support, compress not even mentioned
         # see https://httpd.apache.org/docs/current/mod/mod_deflate.html
         # see https://www.iana.org/assignments/media-types/media-types.xhtml
         given( $MIME ) {
            when ('application/font-woff' )  { $MAGIC = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('application/font-woff2')  { $MAGIC = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('application/javascript')  { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }   # magic for gzip encoding
            when ('application/json')        { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'json'; } # magic for gzip encoding
            when ('application/manifest+json'){ $MAGIC = "\x{1f}\x{8b}\x{08}";            $OFF = 0; $TLS = 'json'; } # magic for gzip encoding
            when ('application/x-javascript'){ $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }   # magic for gzip encoding
            when ('application/xml')         { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'xml'; }  # magic for gzip encoding
            when ('binary/octet-stream')     { $MAGIC = "GIF89a";                         $OFF = 0; $TLS = 'gif'; }
            when ('font/ttf')                { $MAGIC = "\x{00}\x{01}\x{00}\x{00}\x{00}"; $OFF = 0; $TLS = 'ttf'; }
            when ('font/woff')               { $MAGIC = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('font/woff2')              { $MAGIC = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('image/gif')               { $MAGIC = 'GIF87a';                         $OFF = 0; $TLS = 'gif'; }
#           when ('image/gif')               { $MAGIC = 'GIF89a';                         $OFF = 0; $TLS = 'gif'; }
            when ('image/jpeg')              { $MAGIC = 'JFIF';                           $OFF = 6; $TLS = 'jpg'; }
#           when ('image/jpeg')              { $MAGIC = 'Exif';                           $OFF = 6; $TLS = 'jpeg'; }
#           when ('image/jpeg')              { $MAGIC = "\x{ff}\x{d8}\x{ff}\x{e0}";       $OFF = 6; $TLS = 'jpg'; }
            when ('image/png')               { $MAGIC = "\x{89}PNG";                      $OFF = 0; $TLS = 'png'; }
            when ('image/svg+xml')           { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'svg'; }  # magic for gzip encoding
            when ('image/vnd.microsoft.icon'){ $MAGIC = "\x{00}\x{00}\x{01}\x{00}";       $OFF = 0; $TLS = 'ico'; }
            when ('image/webp')              { $MAGIC = 'RIFF';                           $OFF = 0; $TLS = 'webp'; }
            when ('image/x-icon')            { $MAGIC = "\x{00}\x{00}\x{01}\x{00}";       $OFF = 0; $TLS = 'ico'; }
            when ('text/css')                { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'css'; }  # magic for gzip encoding
            when ('text/fragment+html')      { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'htm'; }  # magic for gzip encoding
            when ('text/html')               { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'html'; } # magic for gzip encoding
            when ('text/javascript')         { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }   # magic for gzip encoding
            when ('text/plain')              { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'txt'; }  # magic for gzip encoding
            when ('text/xml')                { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'xml'; }  # magic for gzip encoding
            when ('video/mp4')               { $MAGIC = 'ftypisom';                       $OFF = 4; $TLS = 'mp4'; }  # most unlikely
            default                          { $MAGIC = '';                               $OFF = 0; $TLS = ''; }
         }
         # gzip encoding overrides file magic (is earlier in file-stream)
         # brotli encoding overrides file magic (there is none)
         if( $GZIP ) { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; } elsif( $BROTLI ) { $MAGIC = ""; $OFF = 0; }
         if( $MAGIC ) {
            if( $MAGIC eq 'GIF87a') {                                  # account for gif + jpeg multiple $MAGIC
               if( ! index( $cache_buffer, "$MAGIC" )) {
                  $MAGIC = 'GIF89a';
               }
            } elsif( $MAGIC eq 'JFIF') {
               if( ! index( $cache_buffer, "$MAGIC" )) {
                  $MAGIC = 'Exif';
                  $TLS   = 'jpeg';
                  if( ! index( $cache_buffer, "$MAGIC" )) {
                     $MAGIC = "\x{ff}\x{d8}\x{ff}\x{e0}";
                     $TLS   = 'jpg';
                  }
               }
            }
         }
         # suffixes (holy m$)
         if( $TLS ) {
            $TLS = ".$TLS";
            if( $GZIP || $BROTLI ) {                                   # compression-encoding
               if( $GZIP ) { $TLS = "$TLS.gz"; } else { $TLS = "$TLS.br"; }
            }
         }
         # 5 print the files out
         if( $BEG > -1 && $LEN > -1 ) {
            `$DD if="$IN/$f" of="$OUT/$f$TLS" skip=$BEG count=$LEN iflag=skip_bytes,count_bytes status=none`;
            # 6 set the date to last-modified
            if( $MOD ) { `$TOUCH "$OUT/$f$TLS" -d "$MOD"`; }
            # 7 decompress if necessary
            if( $GZIP || $BROTLI ) {                                   # compression-encoding
               if( $GZIP ) {                                           # decompressed; .gz/.br suffix removed
                  `$GUNZIP "$OUT/$f$TLS"`;                             # original file removed; date retained
               } else {
                  `$UNBROT -j "$OUT/$f$TLS"`;
               }
            }
         } # lots of Content-Length:0 files
#         print "$MIME; $URL[0]; $f; \$key_length=$key_length; \$key_hash=$key_hash; \$BEG=$BEG; \$END=$END; \$LEN=$LEN; \$TLS=$TLS \n";
      } # if( $cache_buffer =~ /\x{00}\x{00}HTTP\/(\d.\d*)\s(200|203)/i ) # other pages mostly HTTP 204 No Content
   }

Thursday update: small improvement to comments

alexkemp · Desktop and Multimedia

This should be the last code update for now (below).

It is tested as well as I can manage in a short time. ~64% of cache are HTTP 200, with most of the rest being 204 No Content. A number of the 200 OK files are also Content-Length:0 (js files for search-results in many cases). The script is written so that no attempt is made to extract no-content files.

The final search was for Content-Encoding: (compression before delivery). My main source was latest Apache modules and that showed that only gzip & brotli are currently used. The statement was that "deflate is not supported", whilst compress was not even mentioned.

#!/usr/bin/perl
   # get Chrome Cache
   # suggestion: save as ~/.getCC; chmod +x; chmod 700

   #  A PERL script to iterate through Chromium/Chrome 'Cache_Data/' dir
   #+ & extract all http-delivered files stored within those data-files

   # 2023-03-12:         Account for multiple http version + 200|203 status
   # 2023-03-08: bugfix: COUNT removed; LEN used instead
   #           +        (F_OFF used for BEG, not COUNT)
   #           +         brotli now works
   #           +        (no magic for brotli (a mistake imo))
   # 2023-03-07: bugfix: corrected miss on most magic files (my bad)
   #           +         excluded compound header fields to eliminate wrong values
   #             added $F_OFF (diff between HTTP-begin ($END - $LEN) & magic-begin ($BEG))
   #           + (*every* file with both $BEG & $LEN has diff == x34) (h-begin is bigger)
   #           + thus if no magic but LEN then BEG = END - LEN - 52
   #           +      if magic but no LEN then LEN = END - BEG - 52 (yes, this *does* happen)
   # 2023-03-05: bugfix: coded to exclude 711 zero-length files
   #           +         account for multiple-same-value $mime (fixes ~1000 gif + jpg files)
   #           +         added 'Content-Encoding:br' Brotli compression
   #           +        (you may need 'sudo apt install brotli' to view those files)

   use strict;
   use warnings;
   use autodie;
   use experimental qw( switch );

   # save algorithm:
   # 1) only save HTTP 200 files ($END)
   # 2) try first to set file beginning ($BEG) from magic bytes
   # 3) if (2) fails, set $BEG from $LEN; if no length, then ignore file
   # 4) extract section $BEG to $END from $IN file into $OUT dir
   # 5) touch file to conform with http header date

   # Stats 2023-03-06:
   # 10978 HTTP 200 from 23594 files in Cache_Data
   #     6 do NOT contain a MIME field
   # 10979 files saved to disk (real	1m23.219s)

   # Global CONSTANTS
   my $IN   = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data/"; # Chromium cache folder.
   my $OUT  = "/home/alexk/Personal/ChromeCache/Files/";               # Place for extracted files
   my $HTTP = "HTTP/1.1 200";                                          # '200 OK' not in all files
   my $F_OFF= 52;                                                      # Offset of HTTP-begin from magic-begin (BEG) + LEN

   opendir( my $d, "$IN") or die "Cannot open directory $IN: $!\n";    # Open cache dir
   my @list 
      = grep { 
         !/^\.\.?$/                                                    # miss /. + /.. files
         && -f "$IN/$_"                                                # is a file (not dir, etc)
	} readdir( $d );
   closedir( $d );
   foreach my $f (@list) {                                             # Iterate through each cached data-file
#     my $f    = "000420fedcafe6ff_0";
      # section variables
      my $BEG  = -1;                                                   # Extract begins (bytes)
      my $BROTLI = 0;                                                  # brotli encoding (0/1)
      my $END  = -1;                                                   # Extract ends   (bytes)
      my $GZIP = 0;                                                    # gzip encoding (0/1)
      my $HPOS = -1;                                                   # 'HTTP' string begins (bytes)
      my $HSTA = -1;                                                   # 'HTTP' status string (only interested in '200' or '203')
      my $HVER = '';                                                   # 'HTTP' version string (eg '1.1')
      my $magic = '';
      my $MIME = "";                                                   # content-type
      my $MOD  = "";                                                   # last-modified
      my $OFF  = -1;                                                   # Offset of magic from file beginning
      my $TLS  = "";                                                   # TLS==Three Letter Suffix
      my $LEN  = -1;                                                   # content-length
      open my $fhi, '<:raw', "$IN/$f" or die $!;
      read( $fhi, my $cache_buffer, -s "$IN/$f" );
      close( $fhi ) or die "could not close $IN/$f: $!";
      if( $cache_buffer =~ /\x{00}\x{00}HTTP\/(\d.\d*)\s(200|203)/i ) {
         $HPOS = $-[0] + 2;
         $HVER = "$1";
         $HSTA = "$2";
         $HTTP = "HTTP/$HVER $HSTA";
      }
      $END     = index( $cache_buffer, "$HTTP", $HPOS);                # Check for presence of HTTP 200|203 header (paranoia coding)
      if( $END > -1 ) {                                                #+(and therefore std header fields for successful access)
         if( $cache_buffer =~ /\x00Content-Encoding:\s*br/i )   { $BROTLI = 1; }
         if( $cache_buffer =~ /\x00Content-Encoding:\s*gzip/i ) { $GZIP   = 1; }
         if( $cache_buffer =~ /\x00Content-Length:\s*(\d+)/i )  {
            $LEN  = $1;
            if( !$LEN ) { $LEN  = -1; }                                # yes, some pages have Content-Length:0
         }
         if( $cache_buffer =~ /\x00Last-Modified:\s*([ A-Za-z0-9,:]+)/i ) {
            $MOD = $1;                                                 # some web servers ignore case + introduce spaces!
         } else {
            if( $cache_buffer =~ /\x00Date:\s*([ A-Za-z0-9,:]+)/i ) {  # did page did not want to be cached? (Chromium did it anyway!)
               $MOD = $1;                                              # (all pages should have a date (or a Date))
            }
         }
         if( $cache_buffer =~ /\x00Content-Type:\s*([a-z-]+\/[a-z0-9.+-]+)/i ) {
            $MIME = $1;
         } # variable $1 NOT reset on failed match (v stupid)

         # easy to mixup mime/media-types & encoding (compression schemes) here
         # Content-Type == mime-type refers to the type of file that is being transferred
         # Content-Encoding == compression scheme refers to the type of compression used during transfer
         # so, a text file (js txt xml, etc) with gzip magic will be a gzipped-textfile (eg file.xml.gz)
         # gzip encoding (+ brotli) are only support; deflate no support, compress not even mentioned
         # see https://httpd.apache.org/docs/current/mod/mod_deflate.html
         # see https://www.iana.org/assignments/media-types/media-types.xhtml
         given( $MIME ) {
            when ('application/font-woff' )  { $magic = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('application/font-woff2')  { $magic = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('application/javascript')  { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }   # magic for gzip encoding
            when ('application/json')        { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'json'; } # magic for gzip encoding
            when ('application/manifest+json'){ $magic = "\x{1f}\x{8b}\x{08}";            $OFF = 0; $TLS = 'json'; } # magic for gzip encoding
            when ('application/x-javascript'){ $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }   # magic for gzip encoding
            when ('application/xml')         { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'xml'; }  # magic for gzip encoding
            when ('binary/octet-stream')     { $magic = "GIF89a";                         $OFF = 0; $TLS = 'gif'; }
            when ('font/ttf')                { $magic = "\x{00}\x{01}\x{00}\x{00}\x{00}"; $OFF = 0; $TLS = 'ttf'; }
            when ('font/woff')               { $magic = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('font/woff2')              { $magic = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('image/gif')               { $magic = 'GIF87a';                         $OFF = 0; $TLS = 'gif'; }
#           when ('image/gif')               { $magic = 'GIF89a';                         $OFF = 0; $TLS = 'gif'; }
            when ('image/jpeg')              { $magic = 'JFIF';                           $OFF = 6; $TLS = 'jpg'; }
#           when ('image/jpeg')              { $magic = 'Exif';                           $OFF = 6; $TLS = 'jpeg'; }
#           when ('image/jpeg')              { $magic = "\x{ff}\x{d8}\x{ff}\x{e0}";       $OFF = 6; $TLS = 'jpg'; }
            when ('image/png')               { $magic = "\x{89}PNG";                      $OFF = 0; $TLS = 'png'; }
            when ('image/svg+xml')           { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'svg'; }  # magic for gzip encoding
            when ('image/vnd.microsoft.icon'){ $magic = "\x{00}\x{00}\x{01}\x{00}";       $OFF = 0; $TLS = 'ico'; }
            when ('image/webp')              { $magic = 'RIFF';                           $OFF = 0; $TLS = 'webp'; }
            when ('image/x-icon')            { $magic = "\x{00}\x{00}\x{01}\x{00}";       $OFF = 0; $TLS = 'ico'; }
            when ('text/css')                { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'css'; }  # magic for gzip encoding
            when ('text/fragment+html')      { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'htm'; }  # magic for gzip encoding
            when ('text/html')               { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'html'; } # magic for gzip encoding
            when ('text/javascript')         { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }   # magic for gzip encoding
            when ('text/plain')              { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'txt'; }  # magic for gzip encoding
            when ('text/xml')                { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'xml'; }  # magic for gzip encoding
            when ('video/mp4')               { $magic = 'ftypisom';                       $OFF = 4; $TLS = 'mp4'; }  # most unlikely
            default                          { $magic = '';                               $OFF = 0; $TLS = ''; }
         }
         if( $magic ) {
            if( $magic eq 'GIF87a') {                                  # account for gif + jpeg multiple $magic
               $BEG = index( $cache_buffer, "$magic" );
               if( $BEG < 0 ) {
                  $magic = 'GIF89a';
                  $BEG   = index( $cache_buffer, "$magic" );
               }
            } elsif( $magic eq 'JFIF') {
               $BEG = index( $cache_buffer, "$magic" );
               if( $BEG < 0 ) {
                  $magic = 'Exif';
                  $TLS   = 'jpeg';
                  $BEG   = index( $cache_buffer, "$magic" );
                  if( $BEG < 0 ) {
                     $magic = "\x{ff}\x{d8}\x{ff}\x{e0}";
                     $TLS   = 'jpg';
                     $BEG   = index( $cache_buffer, "$magic" );
                  }
               }
            }
            $BEG   = index( $cache_buffer, "$magic" );
         }
         # fix $BEG + $LEN
         if( $BEG > -1 ) {
            $BEG  -= $OFF;
            if( $LEN < 1 ) { $LEN = $END - $BEG - $F_OFF; }            # v rare, but happens
         } elsif( $LEN > -1 ) { $BEG = $END - $LEN - $F_OFF; }         # no magic (text + brotli files)
         # suffixes (holy m$)
         if( $TLS ) {
            $TLS = ".$TLS";
            if( $GZIP || $BROTLI ) {                                   # compression-encoding
               if( $GZIP ) { $TLS = "$TLS.gz"; } else { $TLS = "$TLS.br"; }
            }
         }
         # print the files out
         if( $BEG > -1 && $LEN > -1 ) {
            `dd if="$IN/$f" of="$OUT/$f$TLS" skip=$BEG count=$LEN iflag=skip_bytes,count_bytes status=none`;
            if( $MOD ) { `touch "$OUT/$f$TLS" -d "$MOD"`; }
#           print "$MIME: $f; \$TLS=$TLS; \$BEG=$BEG; \$LEN=$LEN; \$END=$END; \$MOD=$MOD; \n";
         } # lots of Content-Length:0 files
      } # if( $END > -1 )                                              # other pages mostly HTTP 204 No Content
   }

alexkemp · Desktop and Multimedia

getCC, the ChromeCache decrypt script:

Added the ability to decode any HTTP version + status 200 or 203 files.

Testing results:

$ ~/Personal/.getCC
image/webp: 000420fedcafe6ff_0; $TLS=.webp; $HPOS=5185; $END=5185; $HVER=1.1; $HSTA=200; $HTTP=HTTP/1.1 200; $MOD=Fri, 03 Mar 2023 20:27:56 GMT; 
$ cd ~/Personal/ChromeCache/Files
$ time ~/Personal/.getCC

real	1m28.431s
user	0m53.738s
sys	0m34.723s

I had noticed that all HTTP/1.1 server responses were preceded by two null bytes in the cache files:

00001430  fc 9b 54 2f 00 a4 ec 1b  fc 9b 54 2f 00 57 02 00  |..T/......T/.W..|
00001440  00 48 54 54 50 2f 31 2e  31 20 32 30 30 00 61 63  |.HTTP/1.1 200.ac|

00002800  02 72 ab 33 c9 16 55 2f  00 8d 9c 35 c9 16 55 2f  |.r.3..U/...5..U/|
00002810  00 56 02 00 00 48 54 54  50 2f 31 2e 31 20 32 30  |.V...HTTP/1.1 20|

00003740  ea 4c 55 2f 00 48 7c d5  ea 4c 55 2f 00 55 02 00  |.LU/.H|..LU/.U..|
00003750  00 48 54 54 50 2f 31 2e  31 20 32 30 30 00 61 63  |.HTTP/1.1 200.ac|

I used that fact to guarantee that the HTTP string that was being indexed for was the correct one + updated $HTTP to contain the correct strings.

Here is the latest code:

#!/usr/bin/perl
   # get Chrome Cache
   # suggestion: save as ~/.getCC; chmod +x; chmod 700

   #  A PERL script to iterate through Chromium/Chrome 'Cache_Data/' dir
   #+ & extract all http-delivered files stored within those data-files

   # 2023-03-12:         Account for multiple http version + 200|203 status
   # 2023-03-08: bugfix: COUNT removed; LEN used instead
   #           +        (F_OFF used for BEG, not COUNT)
   #           +         brotli now works
   #           +        (no magic for brotli (a mistake imo))
   # 2023-03-07: bugfix: corrected miss on most magic files (my bad)
   #           +         excluded compound header fields to eliminate wrong values
   #             added $F_OFF (diff between HTTP-begin ($END - $LEN) & magic-begin ($BEG))
   #           + (*every* file with both $BEG & $LEN has diff == x34) (h-begin is bigger)
   #           + thus if no magic but LEN then BEG = END - LEN - 52
   #           +      if magic but no LEN then LEN = END - BEG - 52 (yes, this *does* happen)
   # 2023-03-05: bugfix: coded to exclude 711 zero-length files
   #           +         account for multiple-same-value $mime (fixes ~1000 gif + jpg files)
   #           +         added 'Content-Encoding:br' Brotli compression
   #           +        (you may need 'sudo apt install brotli' to view those files)

   use strict;
   use warnings;
   use autodie;
   use experimental qw( switch );

   # save algorithm:
   # 1) only save HTTP 200 files ($END)
   # 2) try first to set file beginning ($BEG) from magic bytes
   # 3) if (2) fails, set $BEG from $LEN; if no length, then ignore file
   # 4) extract section $BEG to $END from $IN file into $OUT dir
   # 5) touch file to conform with http header date

   # Stats 2023-03-06:
   # 10978 HTTP 200 from 23594 files in Cache_Data
   #     6 do NOT contain a MIME field
   # 10979 files saved to disk (real	1m23.219s)

   # Global CONSTANTS
   my $IN   = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data/"; # Chromium cache folder.
   my $OUT  = "/home/alexk/Personal/ChromeCache/Files/";               # Place for extracted files
   my $HTTP = "HTTP/1.1 200";                                          # '200 OK' not in all files
   my $F_OFF= 52;                                                      # Offset of HTTP-begin from magic-begin (BEG) + LEN

   opendir( my $d, "$IN") or die "Cannot open directory $IN: $!\n";    # Open cache dir
   my @list 
      = grep { 
         !/^\.\.?$/                                                    # miss /. + /.. files
         && -f "$IN/$_"                                                # is a file (not dir, etc)
	} readdir( $d );
   closedir( $d );
   foreach my $f (@list) {                                             # Iterate through each cached data-file
#     my $f    = "000420fedcafe6ff_0";
      # section variables
      my $BEG  = -1;                                                   # Extract begins (bytes)
      my $BROTLI = 0;                                                  # brotli encoding (0/1)
      my $END  = -1;                                                   # Extract ends   (bytes)
      my $GZIP = 0;                                                    # gzip encoding (0/1)
      my $HPOS = -1;                                                   # 'HTTP' string begins (bytes)
      my $HSTA = -1;                                                   # 'HTTP' status string (only interested in '200' or '203')
      my $HVER = '';                                                   # 'HTTP' version string (eg '1.1')
      my $magic = '';
      my $MIME = "";                                                   # content-type
      my $MOD  = "";                                                   # last-modified
      my $OFF  = -1;                                                   # Offset of magic from file beginning
      my $TLS  = "";                                                   # TLS==Three Letter Suffix
      my $LEN  = -1;                                                   # content-length
      open my $fhi, '<:raw', "$IN/$f" or die $!;
      read( $fhi, my $cache_buffer, -s "$IN/$f" );
      close( $fhi ) or die "could not close $IN/$f: $!";
      if( $cache_buffer =~ /\x{00}\x{00}HTTP\/(\d.\d*)\s(200|203)/i ) {
         $HPOS = $-[0] + 2;
         $HVER = "$1";
         $HSTA = "$2";
         $HTTP = "HTTP/$HVER $HSTA";
      }
      $END     = index( $cache_buffer, "$HTTP", $HPOS);                # Check for presence of HTTP 200|203 header (paranoia coding)
      if( $END > -1 ) {                                                #+(and therefore std header fields for successful access)
         if( $cache_buffer =~ /\x00Content-Encoding:\s*br/i )   { $BROTLI = 1; }
         if( $cache_buffer =~ /\x00Content-Encoding:\s*gzip/i ) { $GZIP   = 1; }
         if( $cache_buffer =~ /\x00Content-Length:\s*(\d+)/i )  {
            $LEN  = $1;
            if( !$LEN ) { $LEN  = -1; }                                # yes, some pages have Content-Length:0
         }
         if( $cache_buffer =~ /\x00Last-Modified:\s*([ A-Za-z0-9,:]+)/i ) {
            $MOD = $1;                                                 # some web servers ignore case + introduce spaces!
         } else {
            if( $cache_buffer =~ /\x00Date:\s*([ A-Za-z0-9,:]+)/i ) {  # did page did not want to be cached? (Chromium did it anyway!)
               $MOD = $1;                                              # (all pages should have a date (or a Date))
            }
         }
         if( $cache_buffer =~ /\x00Content-Type:\s*([a-z-]+\/[a-z0-9.+-]+)/i ) {
            $MIME = $1;
         } # variable $1 NOT reset on failed match (v stupid)
         given( $MIME ) {
            when ('application/font-woff' )  { $magic = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('application/font-woff2')  { $magic = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('application/javascript')  { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }    # magic for gzip encoding
            when ('application/json')        { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'json'; }
            when ('application/x-javascript'){ $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }
            when ('application/xml')         { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }
            when ('binary/octet-stream')     { $magic = "GIF89a";                         $OFF = 0; $TLS = 'gif'; }
            when ('font/ttf')                { $magic = "\x{00}\x{01}\x{00}\x{00}\x{00}"; $OFF = 0; $TLS = 'ttf'; }
            when ('font/woff')               { $magic = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('font/woff2')              { $magic = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('image/gif')               { $magic = 'GIF87a';                         $OFF = 0; $TLS = 'gif'; }
#           when ('image/gif')               { $magic = 'GIF89a';                         $OFF = 0; $TLS = 'gif'; }
            when ('image/jpeg')              { $magic = 'JFIF';                           $OFF = 6; $TLS = 'jpg'; }
#           when ('image/jpeg')              { $magic = 'Exif';                           $OFF = 6; $TLS = 'jpeg'; }
#           when ('image/jpeg')              { $magic = "\x{ff}\x{d8}\x{ff}\x{e0}";       $OFF = 6; $TLS = 'jpg'; }
            when ('image/png')               { $magic = "\x{89}PNG";                      $OFF = 0; $TLS = 'png'; }
            when ('image/svg+xml')           { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'svg'; }
            when ('image/vnd.microsoft.icon'){ $magic = "\x{00}\x{00}\x{01}\x{00}";       $OFF = 0; $TLS = 'ico'; }
            when ('image/webp')              { $magic = 'RIFF';                           $OFF = 0; $TLS = 'webp'; }
            when ('text/css')                { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'css'; }
            when ('text/fragment+html')      { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'htm'; }
            when ('text/html')               { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'html'; }
            when ('text/javascript')         { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }
            when ('text/plain')              { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'txt'; }
            when ('video/mp4')               { $magic = 'ftypisom';                       $OFF = 4; $TLS = 'mp4'; }   # most unlikely
            default                          { $magic = '';                               $OFF = 0; $TLS = ''; }
         }
         if( $magic ) {
            if( $magic eq 'GIF87a') {                                  # account for gif + jpeg multiple $magic
               $BEG = index( $cache_buffer, "$magic" );
               if( $BEG < 0 ) {
                  $magic = 'GIF89a';
                  $BEG   = index( $cache_buffer, "$magic" );
               }
            } elsif( $magic eq 'JFIF') {
               $BEG = index( $cache_buffer, "$magic" );
               if( $BEG < 0 ) {
                  $magic = 'Exif';
                  $TLS   = 'jpeg';
                  $BEG   = index( $cache_buffer, "$magic" );
                  if( $BEG < 0 ) {
                     $magic = "\x{ff}\x{d8}\x{ff}\x{e0}";
                     $TLS   = 'jpg';
                     $BEG   = index( $cache_buffer, "$magic" );
                  }
               }
            }
            $BEG   = index( $cache_buffer, "$magic" );
         }
#         # trying to decode where each file begins (determine common offsets)
#         if( $LEN < 1  &&  $BEG > -1 )  { }
#         if( $BEG > -1  &&  $LEN > -1 ) {
#            # at this point $BEG - $OFF == start of magic
#            #               $END        == start of $HTTP
#            #               $LEN        == length of content from header
#            my $mbeg = $BEG - $OFF;  my $mhex = sprintf("0x%X", $mbeg);
#            my $hbeg = $END - $LEN;  my $hhex = sprintf("0x%X", $hbeg);
#            my $diff = $hbeg - $mbeg;
#            my $dhex = sprintf("0x%X", $diff);
#            print "$MIME: $f; \$END/\$LEN=$END / $LEN; \$mbeg=$mbeg / $mhex; \$hbeg=$hbeg / $hhex; \$diff=$diff / $dhex; \n";
#        }
         if( $BEG > -1 ) {
            $BEG  -= $OFF;
            if( $LEN < 1 ) { $LEN = $END - $BEG - $F_OFF; }            # v rare, but happens
         } elsif( $LEN > -1 ) { $BEG = $END - $LEN - $F_OFF; }         # no magic (text, xml + brotli files)
         # suffixes (holy m$)
         if( $TLS ) {
            $TLS = ".$TLS";
            if( $GZIP || $BROTLI ) {                                   # account for different compression-encodings
               if( $GZIP ) { $TLS = "$TLS.gz"; } else { $TLS = "$TLS.br"; }
            }
         }
         # print the files out
         if( $BEG > -1 && $LEN > -1 ) {
            `dd if="$IN/$f" of="$OUT/$f$TLS" skip=$BEG count=$LEN iflag=skip_bytes,count_bytes status=none`;
            if( $MOD ) { `touch "$OUT/$f$TLS" -d "$MOD"`; }
#           print "$MIME: $f; \$TLS=$TLS; \$HPOS=$HPOS; \$END=$END; \$HVER=$HVER; \$HSTA=$HSTA; \$HTTP=$HTTP; \$MOD=$MOD; \n";
         }
      } # if( $END > -1 )                                              # other pages are most likely to be HTTP 204 No Content
   }

alexkemp · Desktop and Multimedia

getCC, the ChromeCache decrypt script:

Does it need to detect other than HTTP/1.1 200 OK?

I needed to know whether it needed to activate on other HTTP Status Codes than just 200, so did some calculations:

$ la ~/.cache/chromium/Default/Cache/Cache_Data/* | wc -l
22525
$ strings ~/.cache/chromium/Default/Cache/Cache_Data/* | fgrep "HTTP/1.1" | sort | uniq -c
strings: Warning: '/home/alexk/.cache/chromium/Default/Cache/Cache_Data/index-dir' is a directory
  14055 HTTP/1.1 200
      1 HTTP/1.1 200 200
    564 HTTP/1.1 200 OK
   7490 HTTP/1.1 204
     45 HTTP/1.1 204 No Content
      5 HTTP/1.1 206
     42 HTTP/1.1 301
     15 HTTP/1.1 301 Moved Permanently
    236 HTTP/1.1 302
      1 HTTP/1.1 302 Found
      1 HTTP/1.1 303 See Other
      1 HTTP/1.1 307
      2 HTTP/1.1 400
      2 HTTP/1.1 403
     84 HTTP/1.1 404
      5 HTTP/1.1 404 Not Found
      1 HTTP/1.1 410
     11 HTTP/1.1 500

Sums:
65% 14,620 HTTP 200 OK
33%  7,535 HTTP 204 No Content
 0%      5 HTTP 206 Partial Content
 0%     57 HTTP 301 Moved Permanently
 1%    237 HTTP 302 Found
 0%      1 HTTP 303 See Other
 0%      1 HTTP 307 Temporary Redirect
 0%      2 HTTP 400 Bad Request
 0%      2 HTTP 403 Forbidden
 0%     89 HTTP 404 Not Found
 0%      1 HTTP 410 Gone
 0%     11 HTTP 500 Internal Server Error

Ah well, that's ok then. The script can stick with Status 200, no problem. There is a small chance that 203 Non-Authoritative Information may be involved (responses from a proxy, although never features in my accesses), but I'm happy to consider the chance of that being remote.

All of the 22 thousand files in the current cache were from servers reporting themselves to be version 1.1. HTTP/0.9 & HTTP/1.0 are now considered obsolete (I bet that some still exist). Both HTTP/2 & HTTP/3 are now supposed to be a thing, although no server reported either version in my accesses. However, I obviously need to modify the PERL regex to accept such possibilities, and that will come with the next post.

alexkemp · Desktop and Multimedia

Explanation + info on setting up getCC, the ChromeCache decrypt script:

Setup

Install PERL if necessary
(makes use of switch which was installed by default in version 5.10, but also available from CPAN)
Place the script where you will
Make executable
(chmod +x; chmod 700)
Set the values of $IN & $OUT
(lines 41 + 42; be careful to check permissions, particularly for $OUT)
Run the command from a command-prompt
(there are often 10s of thousands of files decrypted, so there is zero terminal output if no errors)
Install brotli
(sudo apt install brotli)
(this is to facilitate viewing text files)
(I run Chimaera & it is available as standard)

Comments

All lines beginning with a # are comments.

Lines 137 - 148 are all commented. It was exploratory code to determine if there was a common offset to the beginning of the cached file. There *was* indeed such an offset ($diff). This was important as not all files contained magic, and the start-of-file varied in ways that I could not decrypt.

Rationale

The Chrome CacheData dir contains data-files which each contain the data + http-header from a single HTTP file delivered from a server during a Chrome/Chromium browser session.

HTTP files consist of a HTTP header + data.
The CacheData files have the file-data near the top of the file, then the HTTP header & then a bunch of other stuff. Here is a *very* small gif-file to make the point (look for 'GIF89a', the gif magic-marker, at ca in the hex-dump below). Notice how the gif is just 43 bytes, yet the cache-file that contains it is 4k bytes:

$ la ~/.cache/chromium/Default/Cache/Cache_Data/fff822c2bb27d828_0
-rw------- 1 alexk alexk 4389 Feb 24 02:31 /home/alexk/.cache/chromium/Default/Cache/Cache_Data/fff822c2bb27d828_0
$ la ~/Personal/ChromeCache/Files/fff822c2bb27d828_0.gif
-rw-r--r-- 1 alexk alexk 43 Feb 24 02:31 /home/alexk/Personal/ChromeCache/Files/fff822c2bb27d828_0.gif
$ hexdump ~/.cache/chromium/Default/Cache/Cache_Data/fff822c2bb27d828_0 -C | head -31
00000000  30 5c 72 a7 1b 6d fb fc  05 00 00 00 b2 00 00 00  |0\r..m..........|
00000010  23 84 68 3b 00 00 00 00  31 2f 30 2f 5f 64 6b 5f  |#.h;....1/0/_dk_|
00000020  68 74 74 70 73 3a 2f 2f  61 6d 61 7a 6f 6e 2e 63  |https://amazon.c|
00000030  6f 2e 75 6b 20 68 74 74  70 73 3a 2f 2f 61 6d 61  |o.uk https://ama|
00000040  7a 6f 6e 2e 63 6f 2e 75  6b 20 68 74 74 70 73 3a  |zon.co.uk https:|
00000050  2f 2f 61 61 78 2d 65 75  2e 61 6d 61 7a 6f 6e 2e  |//aax-eu.amazon.|
00000060  63 6f 2e 75 6b 2f 65 2f  6c 6f 69 2f 69 6d 70 3f  |co.uk/e/loi/imp?|
00000070  62 3d 4a 48 4f 6b 41 4c  63 55 4e 66 59 35 4f 61  |b=JHOkALcUNfY5Oa|
00000080  54 5f 5a 31 61 39 4c 32  67 41 41 41 47 47 67 55  |T_Z1a9L2gAAAGGgU|
00000090  4b 4d 77 67 4d 41 41 41  48 32 41 51 42 4f 4c 30  |KMwgMAAAH2AQBOL0|
000000a0  45 67 49 43 41 67 49 43  41 67 49 43 41 67 49 43  |EgICAgICAgICAgIC|
000000b0  42 4f 4c 30 45 67 49 43  41 67 49 43 41 67 49 43  |BOL0EgICAgICAgIC|
000000c0  41 67 49 43 41 2d 55 71  38 45 47 49 46 38 39 61  |AgICA-Uq8EGIF89a|
000000d0  01 00 01 00 f0 00 00 00  00 00 00 00 00 21 f9 04  |.............!..|
000000e0  01 00 00 00 00 2c 00 00  00 00 01 00 01 00 00 02  |.....,..........|
000000f0  02 44 01 00 3b d8 41 0d  97 45 6f fa f4 01 00 00  |.D..;.A..Eo.....|
00000100  00 ab bd 8a cb 2b 00 00  00 00 00 00 00 dc 0f 00  |.....+..........|
00000110  00 03 0d 45 02 86 fc 8d  34 ff 53 2f 00 e7 d9 8e  |...E....4.S/....|
00000120  34 ff 53 2f 00 bd 00 00  00 48 54 54 50 2f 31 2e  |4.S/.....HTTP/1.|
00000130  31 20 32 30 30 20 4f 4b  00 53 65 72 76 65 72 3a  |1 200 OK.Server:|
00000140  20 53 65 72 76 65 72 00  44 61 74 65 3a 20 46 72  | Server.Date: Fr|
00000150  69 2c 20 32 34 20 46 65  62 20 32 30 32 33 20 30  |i, 24 Feb 2023 0|
00000160  32 3a 33 31 3a 30 38 20  47 4d 54 00 43 6f 6e 74  |2:31:08 GMT.Cont|
00000170  65 6e 74 2d 54 79 70 65  3a 20 69 6d 61 67 65 2f  |ent-Type: image/|
00000180  67 69 66 00 43 6f 6e 74  65 6e 74 2d 4c 65 6e 67  |gif.Content-Leng|
00000190  74 68 3a 20 34 33 00 78  2d 61 6d 7a 2d 72 69 64  |th: 43.x-amz-rid|
000001a0  3a 20 42 37 35 4d 32 37  57 4e 38 38 32 54 59 4d  |: B75M27WN882TYM|
000001b0  45 56 32 4e 46 48 00 56  61 72 79 3a 20 43 6f 6e  |EV2NFH.Vary: Con|
000001c0  74 65 6e 74 2d 54 79 70  65 2c 41 63 63 65 70 74  |tent-Type,Accept|
000001d0  2d 45 6e 63 6f 64 69 6e  67 2c 55 73 65 72 2d 41  |-Encoding,User-A|
000001e0  67 65 6e 74 00 00 00 00  00 03 00 00 00 0d 07 00  |gent............|
$ hexdump fff822c2bb27d828_0.gif -C
00000000  47 49 46 38 39 61 01 00  01 00 f0 00 00 00 00 00  |GIF89a..........|
00000010  00 00 00 21 f9 04 01 00  00 00 00 2c 00 00 00 00  |...!.......,....|
00000020  01 00 01 00 00 02 02 44  01 00 3b                 |.......D..;|
0000002b

So, in the Cache file:

hex CA: filedata begins ('GIF89a')
hex 129: http header begins ('HTTP/1.1 200 OK')

Amongst other things, the HTTP header can give the Type of file, the length of file, delivery Date & Encoding (type of compression).

Compression, including decoding Brotli Files

Every sensible Internet Server compresses most of the files that it delivers, and particularly text-files. atm getCC only detects gzip & brotli compression:-

gzip: shown as 'file.txt.gz'
brotli: shown as 'file.txt.br'

If viewed from a terminal with less file.txt.gz the gzip-file will be auto-decompressed & shown as plain text within the less-screen. That will NOT work the same for Brotli files unless you take the following steps:-

Enabling Auto-decompression for Brotli Files within Less

My version of BASH uses ~/.bashrc as a shell-script to initialise it. The following code within ~/.bashrc enables less to auto-decode a wealth of different compressions (though not Brotli) in conjunction with LESSPIPE:-

# make less more friendly for non-text input files, see lesspipe(1)
[ -x /usr/bin/lesspipe ] && eval "$(SHELL=/bin/sh lesspipe)"

Take the following steps to add Brotli to all the other auto-decoded compressions:

Install Brotli
Save the script below as "~/.lessfilter"
Make it executable

#!/bin/sh
# ~/.lessfilter
# 2023-03-11 add brotli to all other encodings for less

case "$1" in
   *.br)
      brotli -dc "$1"
      ;;
   *)
      # We don't handle this format.
      exit 1
esac

# No further processing by lesspipe necessary
exit 0

alexkemp · Desktop and Multimedia

I'm setting this thread to "SOLVED" now.

WINE has been fixed by removing it, and the script I added in the previous post now works fully to extract all of the files within CacheData. The one thing that is missing is a description of the script + how to setup less to auto-show the compressed Brotli files, so I'll put that in the next post.

alexkemp · Desktop and Multimedia

AlexKemp wrote:

I'm simply astonished that so few people (seemingly just one) have produced a Chrome cache viewer.

There *is* another on Github. It was a little heavyweight for me, so I spent a week learning PERL whilst writing a script to extract all the Chrome-cached files into a directory. ~100 lines. Below for your elucidation:

4pm update: +20 lines to fix ~2000 bad files
5pm update: added Brotli compression encoding; still not sure if that works ok
Mar 8 update: Brotli now works; ~150 active lines (+ ~10 debug lines commented out)

#!/usr/bin/perl
   # get Chrome Cache
   # suggestion: save as ~/.getCC; chmod +x; chmod 700

   # A PERL script to iterate through the Chromium/Chrome 'Cache_Data/'
   #+extract all http-delivered files stored within those data-files

   # 2023-03-08: bugfix: COUNT removed; LEN used instead
   #           +        (F_OFF used for BEG, not COUNT)
   #           +         brotli now works
   #           +        (no magic for brotli (a mistake imo))
   # 2023-03-07: bugfix: corrected miss on most magic files (my bad)
   #           +         excluded compound header fields to eliminate wrong values
   #             added $F_OFF (diff between HTTP-begin ($END - $LEN) & magic-begin ($BEG))
   #           + (*every* file with both $BEG & $LEN has diff == x34) (h-begin is bigger)
   #           + thus if no magic but LEN then BEG = END - LEN - 52
   #           +      if magic but no LEN then LEN = END - BEG - 52 (yes, this *does* happen)
   # 2023-03-05: bugfix: coded to exclude 711 zero-length files
   #           +         account for multiple-same-value $mime (fixes ~1000 gif + jpg files)
   #           +         added 'Content-Encoding:br' Brotli compression
   #           +        (you may need 'sudo apt install brotli' to view those files)

   use strict;
   use warnings;
   use autodie;
   use experimental qw( switch );

   # save algorithm:
   # 1) only save HTTP 200 files ($END)
   # 2) try first to set file beginning ($BEG) from magic bytes
   # 3) if (2) fails, set $BEG from $LEN; if no length, then ignore file
   # 4) extract section $BEG to $END from $IN file into $OUT dir
   # 5) touch file to conform with http header date

   # Stats 2023-03-06:
   # 10978 HTTP 200 from 23594 files in Cache_Data
   #     6 do NOT contain a MIME field
   # 10979 files saved to disk (real	1m23.219s)

   # Global CONSTANTS
   my $IN   = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data/"; # Chromium cache folder.
   my $OUT  = "/home/alexk/Personal/ChromeCache/Files/";               # Place for extracted files
   my $HTTP = "HTTP/1.1 200";                                          # '200 OK' not in all files
   my $F_OFF= 52;                                                      # Offset of HTTP-begin from magic-begin (BEG) + LEN

   opendir( my $d, "$IN") or die "Cannot open directory $IN: $!\n";    # Open cache dir
   my @list 
      = grep { 
         !/^\.\.?$/                                                    # miss /. + /.. files
         && -f "$IN/$_"                                                # is a file (not dir, etc)
	} readdir( $d );
   closedir( $d );
   foreach my $f (@list) {                                             # Iterate through each cached data-file
#     my $f    = "0f0ce6df8548452e_0";
      # section variables
      my $BEG  = -1;                                                   # Extract begins (bytes)
      my $BROTLI = 0;                                                  # brotli encoding (0/1)
      my $END  = -1;                                                   # Extract ends   (bytes)
      my $GZIP = 0;                                                    # gzip encoding (0/1)
      my $magic = '';
      my $MIME = "";                                                   # content-type
      my $MOD  = "";                                                   # last-modified
      my $OFF  = -1;                                                   # Offset of magic from file beginning
      my $TLS  = "";                                                   # TLS==Three Letter Suffix
      my $LEN  = -1;                                                   # content-length
      open my $fhi, '<:raw', "$IN/$f" or die $!;
      read( $fhi, my $cache_buffer, -s "$IN/$f" );
      close( $fhi ) or die "could not close $IN/$f: $!";
      $END     = index( $cache_buffer, "$HTTP");                       # Check for presence of HTTP 200 OK header
      if( $END > -1 ) {                                                #+(and therefore std header fields)
         if( $cache_buffer =~ /\x00Content-Encoding:\s*br/i )   { $BROTLI = 1; }
         if( $cache_buffer =~ /\x00Content-Encoding:\s*gzip/i ) { $GZIP   = 1; }
         if( $cache_buffer =~ /\x00Content-Length:\s*(\d+)/i )  {
            $LEN  = $1;
            if( !$LEN ) { $LEN  = -1; }                                # yes, some pages have Content-Length:0
         }
         if( $cache_buffer =~ /\x00Last-Modified:\s*([ A-Za-z0-9,:]+)/i ) {
            $MOD = $1;                                                 # some web servers ignore case + introduce spaces!
         } else {
            if( $cache_buffer =~ /\x00Date:\s*([ A-Za-z0-9,:]+)/i ) {  # did page did not want to be cached? (Chromium did it anyway!)
               $MOD = $1;                                              # (all pages should have a date (or a Date))
            }
         }
         if( $cache_buffer =~ /\x00Content-Type:\s*([a-z-]+\/[a-z0-9.+-]+)/i ) {
            $MIME = $1;
         } # variable $1 NOT reset on failed match (v stupid)
         given( $MIME ) {
            when ('application/font-woff' )  { $magic = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('application/font-woff2')  { $magic = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('application/javascript')  { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }    # magic for gzip encoding
            when ('application/json')        { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'json'; }
            when ('application/x-javascript'){ $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }
            when ('application/xml')         { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }
            when ('binary/octet-stream')     { $magic = "GIF89a";                         $OFF = 0; $TLS = 'gif'; }
            when ('font/ttf')                { $magic = "\x{00}\x{01}\x{00}\x{00}\x{00}"; $OFF = 0; $TLS = 'ttf'; }
            when ('font/woff')               { $magic = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('font/woff2')              { $magic = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('image/gif')               { $magic = 'GIF87a';                         $OFF = 0; $TLS = 'gif'; }
#           when ('image/gif')               { $magic = 'GIF89a';                         $OFF = 0; $TLS = 'gif'; }
            when ('image/jpeg')              { $magic = 'JFIF';                           $OFF = 6; $TLS = 'jpg'; }
#           when ('image/jpeg')              { $magic = 'Exif';                           $OFF = 6; $TLS = 'jpeg'; }
#           when ('image/jpeg')              { $magic = "\x{ff}\x{d8}\x{ff}\x{e0}";       $OFF = 6; $TLS = 'jpg'; }
            when ('image/png')               { $magic = "\x{89}PNG";                      $OFF = 0; $TLS = 'png'; }
            when ('image/svg+xml')           { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'svg'; }
            when ('image/vnd.microsoft.icon'){ $magic = "\x{00}\x{00}\x{01}\x{00}";       $OFF = 0; $TLS = 'ico'; }
            when ('image/webp')              { $magic = 'RIFF';                           $OFF = 0; $TLS = 'webp'; }
            when ('text/css')                { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'css'; }
            when ('text/fragment+html')      { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'htm'; }
            when ('text/html')               { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'html'; }
            when ('text/javascript')         { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }
            when ('text/plain')              { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'txt'; }
            when ('video/mp4')               { $magic = 'ftypisom';                       $OFF = 4; $TLS = 'mp4'; }   # most unlikely
            default                          { $magic = '';                               $OFF = 0; $TLS = ''; }
         }
         if( $magic ) {
            if( $magic eq 'GIF87a') {                                  # account for gif + jpeg multiple $magic
               $BEG = index( $cache_buffer, "$magic" );
               if( $BEG < 0 ) {
                  $magic = 'GIF89a';
                  $BEG   = index( $cache_buffer, "$magic" );
               }
            } elsif( $magic eq 'JFIF') {
               $BEG = index( $cache_buffer, "$magic" );
               if( $BEG < 0 ) {
                  $magic = 'Exif';
                  $TLS   = 'jpeg';
                  $BEG   = index( $cache_buffer, "$magic" );
                  if( $BEG < 0 ) {
                     $magic = "\x{ff}\x{d8}\x{ff}\x{e0}";
                     $TLS   = 'jpg';
                     $BEG   = index( $cache_buffer, "$magic" );
                  }
               }
            }
            $BEG   = index( $cache_buffer, "$magic" );
         }
#         # trying to decode where each file begins (determine common offsets)
#         if( $LEN < 1  &&  $BEG > -1 )  { }
#         if( $BEG > -1  &&  $LEN > -1 ) {
#            # at this point $BEG - $OFF == start of magic
#            #               $END        == start of $HTTP
#            #               $LEN        == length of content from header
#            my $mbeg = $BEG - $OFF;  my $mhex = sprintf("0x%X", $mbeg);
#            my $hbeg = $END - $LEN;  my $hhex = sprintf("0x%X", $hbeg);
#            my $diff = $hbeg - $mbeg;
#            my $dhex = sprintf("0x%X", $diff);
#            print "$MIME: $f; \$END/\$LEN=$END / $LEN; \$mbeg=$mbeg / $mhex; \$hbeg=$hbeg / $hhex; \$diff=$diff / $dhex; \n";
#        }
         if( $BEG > -1 ) {
            $BEG  -= $OFF;
            if( $LEN < 1 ) { $LEN = $END - $BEG - $F_OFF; }            # v rare, but happens
         } elsif( $LEN > -1 ) { $BEG = $END - $LEN - $F_OFF; }         # no magic (text, xml + brotli files)
         # suffixes (holy m$)
         if( $TLS ) {
            $TLS = ".$TLS";
            if( $GZIP || $BROTLI ) {                                   # account for different compression-encodings
               if( $GZIP ) { $TLS = "$TLS.gz"; } else { $TLS = "$TLS.br"; }
            }
         }
         # print the files out
         if( $BEG > -1 && $LEN > -1 ) {
            `dd if="$IN/$f" of="$OUT/$f$TLS" skip=$BEG count=$LEN iflag=skip_bytes,count_bytes status=none`;
            if( $MOD ) { `touch "$OUT/$f$TLS" -d "$MOD"`; }
#          print "$MIME: $f; \$TLS=$TLS; \$END=$END; \$BEG=$BEG; \$LEN=$LEN; \$MOD=$MOD; \n";
         }
      } # if( $END > -1 )                                              # other pages are most likely to be HTTP 204 No Content
   }

alexkemp · Desktop and Multimedia

Mark Hindley in the bug report was able to get to the gates of success in installing wine32 on a vanilla chimaera, and has therefore fingered backports as the reason for the error on my system. That log-file reported a terrifyingly-large number of i386 packages to install as helpers to wine32.

I would like to give public thanks to Mark for his help so far, but I'm going to remove all traces of Wine & the i386 architecture from my system.

alexkemp · Desktop and Multimedia

BeginnerForever at this StackOverflow page has a PHP script which, after just a couple of tweaks, will extract all JPEG + PNG files from the Chromium/Chrome dir to a dir. Fast & very impressive

There now follows my small update to that script. I've added a section for GIF files (those files get extracted, but do not work as image files):

#!/usr/bin/php
<?php
   // getCC (get Chrome Cache)
   // suggestion: save as ~/.getCC; chmod +x; chmod 700

   $dir  = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data/"; // Chromium cache folder.
   $ppl  = "/home/alexk/Personal/ChromeCache/Files/";               // Place for extracted files
// $END  = "HTTP/1.1 200 OK";                                       // Search in cache-file (works, yet not in some files)
   $END  = "HTTP/1.1 200";                                          // Search in cache-file (works, and IS in all files)
   $FTL  = "";                                                      // Filetype lowercase
   $FTU  = "";                                                      // Filetype uppercase
   $MOFF = 0;                                                       // Offset of magic from file beginning

   $list = scandir( $dir );
   foreach( $list as $filename ) {

      if( is_file( $dir.$filename )) {
         $content = file_get_contents( $dir.$filename );
         if( strstr( $content, 'JFIF')) {
            $FTL  = "jpg";
            $FTU  = "JPEG";
            $MOFF    = 6;
            echo( $filename."  $FTU \n");
            $start   = ( strpos( $content, "JFIF", 0 ) - $MOFF );
            $end     =   strpos( $content, $END, 0 );
            $content =   substr( $content, $start, $end - $MOFF );
            $length =   strlen( $content );
            $wholenm =   $ppl.$filename.".$FTL";
            file_put_contents( $wholenm, $content );
//          echo( "Saving :".$wholenm." \n");
            echo( "start : $start \n");
            echo( "end   : $end \n");
            $diff = $end - $start;
            echo( "length: $length (s/b $diff)\n");
         }
         elseif( strstr( $content, "\211PNG")) {
            $FTL  = "png";
            $FTU  = "PNG";
            $MOFF    = 1;
            echo( $filename."  $FTU \n");
            $start   = ( strpos( $content, "$FTU", 0 ) - $MOFF );
            $end     =   strpos( $content, $END, 0 );
            $content =   substr( $content, $start, $end - $MOFF );
            $length =   strlen( $content );
            $wholenm = $ppl.$filename.".$FTL";
            file_put_contents( $wholenm, $content );
//          echo( "Saving :".$wholenm." \n");
            echo( "start : $start \n");
            echo( "end   : $end \n");
            $diff = $end - $start;
            echo( "length: $length (s/b $diff)\n");
         }
         elseif( strstr( $content, "GIF89a")) {
            $FTL  = "gif";
            $FTU  = "GIF";
            $MOFF    = 0;
            echo( $filename."  $FTU \n");
            $start  = ( strpos( $content, "GIF89a", 0 ) - $MOFF );
            $end    =   strpos( $content, $END, 0 );
            $newc   =   substr( $content, $start, $end );
            $length =   strlen( $newc );
            $wholenm = $ppl.$filename.".$FTL";
            file_put_contents( $wholenm, $newc );
            echo( "Saving :".$wholenm." \n");
            echo( "start : $start \n");
            echo( "end   : $end \n");
            $diff = $end - $start;
            echo( "length: $length (s/b $diff)\n");
         }
         else {
            echo( $filename."  UNKNOWN \n");
         }
      }
   }
?>

There are a couple of strange occurrences that I cannot explain nor fix, and have added some echo lines to try to debug it. I'm going to rewrite the script in BASH which, hopefully, will be more reliable. If so, I will not need WINE (hooray!).

Line 8 has $END = "HTTP/1.1 200 OK"; & each section has $end = strpos( $content, $END, 0 );. I discovered that some files do not have an "OK" in the cache-file, yet they were found (not by grep) & the image correctly extracted. I cannot explain what is going on there.
The file content is concatenated within the Cache_Data file immediately before the $END string. Somehow, none of the extracted files is the length that they should be. JPEG + PNG files do not seem to mind, but GIF files refuse to play. I put some echo lines into the script to try to debug what on earth is going on.

Here is the very end of the script text output, to try to give some sense of the difficulty:

ffa41e3d8b4e0cf9_0  PNG 
start : 150 
end   : 14212 
length: 14211 (s/b 14062)
ffa78518232ea9f2_0  PNG 
start : 170 
end   : 1417 
length: 1416 (s/b 1247)
ffad48f3aefb6cd7_0  GIF 
Saving :/home/alexk/Personal/ChromeCache/Files/ffad48f3aefb6cd7_0.gif 
start : 1089 
end   : 1183 
length: 1183 (s/b 94)
ffba1f5387a04a08_0  JPEG 
start : 166 
end   : 972 
length: 966 (s/b 806)
ffbf8448256da635_0  UNKNOWN 
ffc1ebd8d62551b6_0  GIF 
Saving :/home/alexk/Personal/ChromeCache/Files/ffc1ebd8d62551b6_0.gif 
start : 193 
end   : 288 
length: 288 (s/b 95)
ffc2019c23af2000_0  UNKNOWN 
ffc239239bc4e4a9_0  JPEG 
start : 195 
end   : 1920 
length: 1914 (s/b 1725)
ffc57d9b41cebadd_0  UNKNOWN 
ffcbd7258d6a0aea_0  UNKNOWN 
ffda4d6b8e2937fd_0  UNKNOWN 
ffdac4bf770719a1_0  UNKNOWN 
ffde560cb8ad0eaf_0  UNKNOWN 
fff42f6de6d58540_0  UNKNOWN 
fff530252c03d813_0  UNKNOWN 
fff55afc8b58e35f_0  UNKNOWN 
fff822c2bb27d828_0  GIF 
Saving :/home/alexk/Personal/ChromeCache/Files/fff822c2bb27d828_0.gif 
start : 202 
end   : 297 
length: 297 (s/b 95)
index  UNKNOWN

PHP seems to be unworkable now, so I'm going to switch to BASH.

alexkemp · Desktop and Multimedia

This is fundamental info for the Chromium disk_cache.

This is the src dir for the Chromium disk_cache

alexkemp · Desktop and Multimedia

Obviously, somewhere in the Chromium code will be routines for accessing, exploring & extracting these cached files. I'm simply astonished that so few people (seemingly just one) have produced a Chrome cache viewer.

alexkemp · Desktop and Multimedia

boughtonp wrote:

Have you looked at any of the files to determine contents?

Yes. It's not easy, since they all give the same enigmatic response to file:

~/.cache/chromium/Default/Cache/Cache_Data$ file -z 00037beb6d874770_0
00037beb6d874770_0: data

Some refer to css files, some to image files, and so on. I've tried to find html files but that is difficult, as almost all files contain the text 'text/html' without actually containing any reference to such a file. Annoyingly, none seem to contain any actual css, png, html nor any other filetype content, though they do seem to contain packet headers. I'll try to illustrate:

$ hexdump 47c5717d1de790a5_0 -C
00000000  30 5c 72 a7 1b 6d fb fc  05 00 00 00 78 00 00 00  |0\r..m......x...|
00000010  31 0b 69 2c 00 00 00 00  31 2f 30 2f 5f 64 6b 5f  |1.i,....1/0/_dk_|
00000020  68 74 74 70 73 3a 2f 2f  79 6f 75 74 75 62 65 2e  |https://youtube.|
00000030  63 6f 6d 20 68 74 74 70  73 3a 2f 2f 79 6f 75 74  |com https://yout|
00000040  75 62 65 2e 63 6f 6d 20  68 74 74 70 73 3a 2f 2f  |ube.com https://|
00000050  77 77 77 2e 67 73 74 61  74 69 63 2e 63 6f 6d 2f  |www.gstatic.com/|
00000060  79 6f 75 74 75 62 65 2f  69 6d 67 2f 62 72 61 6e  |youtube/img/bran|
00000070  64 69 6e 67 2f 66 61 76  69 63 6f 6e 2f 66 61 76  |ding/favicon/fav|
00000080  69 63 6f 6e 5f 31 34 34  78 31 34 34 2e 70 6e 67  |icon_144x144.png|
00000090  89 50 4e 47 0d 0a 1a 0a  00 00 00 0d 49 48 44 52  |.PNG........IHDR|
000000a0  00 00 00 90 00 00 00 90  08 03 00 00 00 d0 98 12  |................|
000000b0  8a 00 00 00 63 50 4c 54  45 00 00 00 ff 00 00 ff  |....cPLTE.......|
000000c0  00 00 ff 00 00 ff 00 00  ff 00 00 ff 00 00 ff 00  |................|
# (snip)
00000390  00 fc 86 92 50 21 39 2f  00 e8 02 00 00 48 54 54  |....P!9/.....HTT|
000003a0  50 2f 31 2e 31 20 32 30  30 00 61 63 63 65 70 74  |P/1.1 200.accept|
000003b0  2d 72 61 6e 67 65 73 3a  62 79 74 65 73 00 63 72  |-ranges:bytes.cr|
000003c0  6f 73 73 2d 6f 72 69 67  69 6e 2d 72 65 73 6f 75  |oss-origin-resou|
000003d0  72 63 65 2d 70 6f 6c 69  63 79 3a 63 72 6f 73 73  |rce-policy:cross|
000003e0  2d 6f 72 69 67 69 6e 00  63 72 6f 73 73 2d 6f 72  |-origin.cross-or|
000003f0  69 67 69 6e 2d 6f 70 65  6e 65 72 2d 70 6f 6c 69  |igin-opener-poli|
00000400  63 79 2d 72 65 70 6f 72  74 2d 6f 6e 6c 79 3a 73  |cy-report-only:s|
00000410  61 6d 65 2d 6f 72 69 67  69 6e 3b 20 72 65 70 6f  |ame-origin; repo|
00000420  72 74 2d 74 6f 3d 22 73  74 61 74 69 63 2d 6f 6e  |rt-to="static-on|
00000430  2d 62 69 67 74 61 62 6c  65 22 00 72 65 70 6f 72  |-bigtable".repor|
00000440  74 2d 74 6f 3a 7b 22 67  72 6f 75 70 22 3a 22 73  |t-to:{"group":"s|
00000450  74 61 74 69 63 2d 6f 6e  2d 62 69 67 74 61 62 6c  |tatic-on-bigtabl|
00000460  65 22 2c 22 6d 61 78 5f  61 67 65 22 3a 32 35 39  |e","max_age":259|
00000470  32 30 30 30 2c 22 65 6e  64 70 6f 69 6e 74 73 22  |2000,"endpoints"|
00000480  3a 5b 7b 22 75 72 6c 22  3a 22 68 74 74 70 73 3a  |:[{"url":"https:|
00000490  2f 2f 63 73 70 2e 77 69  74 68 67 6f 6f 67 6c 65  |//csp.withgoogle|
000004a0  2e 63 6f 6d 2f 63 73 70  2f 72 65 70 6f 72 74 2d  |.com/csp/report-|
000004b0  74 6f 2f 73 74 61 74 69  63 2d 6f 6e 2d 62 69 67  |to/static-on-big|
000004c0  74 61 62 6c 65 22 7d 5d  7d 00 63 6f 6e 74 65 6e  |table"}]}.conten|
000004d0  74 2d 6c 65 6e 67 74 68  3a 37 32 39 00 78 2d 63  |t-length:729.x-c|
000004e0  6f 6e 74 65 6e 74 2d 74  79 70 65 2d 6f 70 74 69  |ontent-type-opti|
000004f0  6f 6e 73 3a 6e 6f 73 6e  69 66 66 00 73 65 72 76  |ons:nosniff.serv|
00000500  65 72 3a 73 66 66 65 00  78 2d 78 73 73 2d 70 72  |er:sffe.x-xss-pr|
00000510  6f 74 65 63 74 69 6f 6e  3a 30 00 64 61 74 65 3a  |otection:0.date:|
00000520  53 75 6e 2c 20 31 33 20  4d 61 72 20 32 30 32 32  |Sun, 13 Mar 2022|
00000530  20 31 36 3a 34 32 3a 33  39 20 47 4d 54 00 65 78  | 16:42:39 GMT.ex|
00000540  70 69 72 65 73 3a 4d 6f  6e 2c 20 31 33 20 4d 61  |pires:Mon, 13 Ma|
00000550  72 20 32 30 32 33 20 31  36 3a 34 32 3a 33 39 20  |r 2023 16:42:39 |
00000560  47 4d 54 00 63 61 63 68  65 2d 63 6f 6e 74 72 6f  |GMT.cache-contro|
00000570  6c 3a 70 75 62 6c 69 63  2c 20 6d 61 78 2d 61 67  |l:public, max-ag|
00000580  65 3d 33 31 35 33 36 30  30 30 00 61 67 65 3a 34  |e=31536000.age:4|
00000590  37 35 37 39 34 00 6c 61  73 74 2d 6d 6f 64 69 66  |75794.last-modif|
000005a0  69 65 64 3a 54 68 75 2c  20 30 33 20 4f 63 74 20  |ied:Thu, 03 Oct |
000005b0  32 30 31 39 20 31 30 3a  31 35 3a 30 30 20 47 4d  |2019 10:15:00 GM|
000005c0  54 00 63 6f 6e 74 65 6e  74 2d 74 79 70 65 3a 69  |T.content-type:i|
000005d0  6d 61 67 65 2f 70 6e 67  00 61 6c 74 2d 73 76 63  |mage/png.alt-svc|
000005e0  3a 68 33 3d 22 3a 34 34  33 22 3b 20 6d 61 3d 32  |:h3=":443"; ma=2|
000005f0  35 39 32 30 30 30 2c 68  33 2d 32 39 3d 22 3a 34  |592000,h3-29=":4|
00000600  34 33 22 3b 20 6d 61 3d  32 35 39 32 30 30 30 2c  |43"; ma=2592000,|
00000610  68 33 2d 51 30 35 30 3d  22 3a 34 34 33 22 3b 20  |h3-Q050=":443"; |
00000620  6d 61 3d 32 35 39 32 30  30 30 2c 68 33 2d 51 30  |ma=2592000,h3-Q0|
00000630  34 36 3d 22 3a 34 34 33  22 3b 20 6d 61 3d 32 35  |46=":443"; ma=25|
00000640  39 32 30 30 30 2c 68 33  2d 51 30 34 33 3d 22 3a  |92000,h3-Q043=":|
00000650  34 34 33 22 3b 20 6d 61  3d 32 35 39 32 30 30 30  |443"; ma=2592000|
00000660  2c 71 75 69 63 3d 22 3a  34 34 33 22 3b 20 6d 61  |,quic=":443"; ma|
00000670  3d 32 35 39 32 30 30 30  3b 20 76 3d 22 34 36 2c  |=2592000; v="46,|
00000680  34 33 22 00 00 03 00 00  00 c0 04 00 00 30 82 04  |43"..........0..|
00000690  bc 30 82 03 a4 a0 03 02  01 02 02 11 00 89 50 eb  |.0............P.|

The inevitable conclusion is that the actual content must be somewhere else within the labyrinth of dirs.

alexkemp · Desktop and Multimedia

I *did* originally grep the files for '3040s' (not 1100) & found it in a number of dates (4 different dates if I remember correctly). I have a record in bash history only of 'Feb 18', 3 different files, due to specific checks made at the time. However, today *none* of the files contain '3040s', which is why in the abbreviated results (there are *far* more records after the last one above) I used 'amazon' as the search term.

alexkemp · Desktop and Multimedia

boughtonp wrote:

I'd suggest it's very unlikely that Amazon caches search results for a week

$ cd ~/.cache/chromium/Default/Cache/Cache_Data
$ fgrep amazon * -l > amazon.txt
grep: index-dir: Is a directory
$ wc -l amazon.txt
588 amazon.txt
$ fgrep amazon * -l | xargs ls -ltr
grep: index-dir: Is a directory
-rw------- 1 alexk alexk   6640 Jan 14 10:40 a25a4684dc578add_0
-rw------- 1 alexk alexk   6283 Feb 16 11:25 53cb04645ec61dbe_0
-rw------- 1 alexk alexk  15666 Feb 17 23:58 e8f89e2a5b7a01f1_0
-rw------- 1 alexk alexk  13867 Feb 17 23:58 886a5cd11ba0631f_0
-rw------- 1 alexk alexk   5874 Feb 17 23:58 316a7542b7befa08_0
-rw------- 1 alexk alexk   8924 Feb 17 23:58 0178e5420f91ea0d_0
-rw------- 1 alexk alexk   7659 Feb 17 23:58 ad11df88c0edb21b_0
-rw------- 1 alexk alexk  10943 Feb 17 23:58 548924d727f4b76c_0
-rw------- 1 alexk alexk   6091 Feb 17 23:58 3f2cf4a8f4ed3da1_0
-rw------- 1 alexk alexk  11344 Feb 17 23:58 5bee7426f918804c_0
-rw------- 1 alexk alexk   7267 Feb 17 23:58 5602f4c5219938bd_0
-rw------- 1 alexk alexk  14755 Feb 17 23:58 847822115a578d5c_0
-rw------- 1 alexk alexk   5492 Feb 17 23:58 421a319a89da0977_0
-rw------- 1 alexk alexk   7717 Feb 17 23:58 21287a7168c435cf_0
-rw------- 1 alexk alexk   6705 Feb 17 23:58 467f6b3bbdaba59b_0
-rw------- 1 alexk alexk  11348 Feb 18 00:09 fe2bc889ad53c7e8_0
-rw------- 1 alexk alexk  12090 Feb 18 00:09 b9868882a8c66b57_0
-rw------- 1 alexk alexk   5113 Feb 18 00:21 705dafad26790491_0

alexkemp · Desktop and Multimedia

andyprough wrote:

I'd probably try the python one first.

It says:

python package to retrieve (almost) any browser's history on (almost) any platform

I'm interested in neither the History (which I can obtain with a simple Ctrl-H) nor the Bookmarks, so fail to understand the point of installing that. I want to be able to view the historic pages in a browser, not the History.

Same with the askubuntu.com question:

Is it possible to view Google Chrome bookmarks and history from the terminal
The History is a binary file in SQLite format 3

Thanks for trying, Andy, but you appear to have misunderstood my query.

alexkemp · Desktop and Multimedia

Bug posted just now:
Error: "it looks like wine32 is missing" but then cannot install wine32

alexkemp · Desktop and Multimedia

GlennW wrote:

Have you thought about using a different browser to look at the cache?

The Chromium cache is an encrypted binary mess of connected directories (specialised JSON for the ones I've looked at), with not a single html nor css file within them. Can you suggest a browser that *can* view them?

The officially official Devuan Forum!

#226 Re: Other Issues » [SOLVED] TTF True Type Fonts » 2023-05-19 09:19:58

#227 Re: Hardware & System Configuration » How do I get USB to RCA connection to work? » 2023-04-29 09:29:53

#228 Re: Other Issues » What is the maximum USB capacity Devuan can use? » 2023-04-28 08:12:52

#229 Re: Installation » Add repositoy to daedalus? » 2023-04-26 23:53:09

#230 Re: Installation » Add repositoy to daedalus? » 2023-04-26 14:30:16

#231 Re: Installation » Add repositoy to daedalus? » 2023-04-26 11:19:15

#232 Re: Devuan Derivatives » GNUinOS - Libre » 2023-04-04 10:48:16

#233 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-03-23 13:35:01

#234 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-03-23 02:28:33

#235 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-03-14 01:29:35

#236 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-03-12 23:33:51

#237 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-03-12 14:07:57

Does it need to detect other than HTTP/1.1 200 OK?

#238 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-03-11 17:46:40

Setup

Comments

Rationale

Compression, including decoding Brotli Files

Enabling Auto-decompression for Brotli Files within Less

#239 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-03-11 15:58:50

#240 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-03-05 02:34:28

#241 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-02-28 10:32:41

#242 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-02-27 21:32:12

#243 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-02-26 12:16:44

#244 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-02-26 01:05:01

#245 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-02-26 00:56:35

#246 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-02-25 21:00:41

#247 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-02-25 16:28:41

#248 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-02-25 10:27:18

#249 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-02-24 16:18:20

#250 Re: Desktop and Multimedia » [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching » 2023-02-24 12:24:20

Board footer