[SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

alexkemp · 2023-03-11 15:58:50

I'm setting this thread to "SOLVED" now.

WINE has been fixed by removing it, and the script I added in the previous post now works fully to extract all of the files within CacheData. The one thing that is missing is a description of the script + how to setup less to auto-show the compressed Brotli files, so I'll put that in the next post.

alexkemp · 2023-03-11 17:46:40

Explanation + info on setting up getCC, the ChromeCache decrypt script:

Setup

Install PERL if necessary
(makes use of switch which was installed by default in version 5.10, but also available from CPAN)
Place the script where you will
Make executable
(chmod +x; chmod 700)
Set the values of $IN & $OUT
(lines 41 + 42; be careful to check permissions, particularly for $OUT)
Run the command from a command-prompt
(there are often 10s of thousands of files decrypted, so there is zero terminal output if no errors)
Install brotli
(sudo apt install brotli)
(this is to facilitate viewing text files)
(I run Chimaera & it is available as standard)

Comments

All lines beginning with a # are comments.

Lines 137 - 148 are all commented. It was exploratory code to determine if there was a common offset to the beginning of the cached file. There *was* indeed such an offset ($diff). This was important as not all files contained magic, and the start-of-file varied in ways that I could not decrypt.

Rationale

The Chrome CacheData dir contains data-files which each contain the data + http-header from a single HTTP file delivered from a server during a Chrome/Chromium browser session.

HTTP files consist of a HTTP header + data.
The CacheData files have the file-data near the top of the file, then the HTTP header & then a bunch of other stuff. Here is a *very* small gif-file to make the point (look for 'GIF89a', the gif magic-marker, at ca in the hex-dump below). Notice how the gif is just 43 bytes, yet the cache-file that contains it is 4k bytes:

$ la ~/.cache/chromium/Default/Cache/Cache_Data/fff822c2bb27d828_0
-rw------- 1 alexk alexk 4389 Feb 24 02:31 /home/alexk/.cache/chromium/Default/Cache/Cache_Data/fff822c2bb27d828_0
$ la ~/Personal/ChromeCache/Files/fff822c2bb27d828_0.gif
-rw-r--r-- 1 alexk alexk 43 Feb 24 02:31 /home/alexk/Personal/ChromeCache/Files/fff822c2bb27d828_0.gif
$ hexdump ~/.cache/chromium/Default/Cache/Cache_Data/fff822c2bb27d828_0 -C | head -31
00000000  30 5c 72 a7 1b 6d fb fc  05 00 00 00 b2 00 00 00  |0\r..m..........|
00000010  23 84 68 3b 00 00 00 00  31 2f 30 2f 5f 64 6b 5f  |#.h;....1/0/_dk_|
00000020  68 74 74 70 73 3a 2f 2f  61 6d 61 7a 6f 6e 2e 63  |https://amazon.c|
00000030  6f 2e 75 6b 20 68 74 74  70 73 3a 2f 2f 61 6d 61  |o.uk https://ama|
00000040  7a 6f 6e 2e 63 6f 2e 75  6b 20 68 74 74 70 73 3a  |zon.co.uk https:|
00000050  2f 2f 61 61 78 2d 65 75  2e 61 6d 61 7a 6f 6e 2e  |//aax-eu.amazon.|
00000060  63 6f 2e 75 6b 2f 65 2f  6c 6f 69 2f 69 6d 70 3f  |co.uk/e/loi/imp?|
00000070  62 3d 4a 48 4f 6b 41 4c  63 55 4e 66 59 35 4f 61  |b=JHOkALcUNfY5Oa|
00000080  54 5f 5a 31 61 39 4c 32  67 41 41 41 47 47 67 55  |T_Z1a9L2gAAAGGgU|
00000090  4b 4d 77 67 4d 41 41 41  48 32 41 51 42 4f 4c 30  |KMwgMAAAH2AQBOL0|
000000a0  45 67 49 43 41 67 49 43  41 67 49 43 41 67 49 43  |EgICAgICAgICAgIC|
000000b0  42 4f 4c 30 45 67 49 43  41 67 49 43 41 67 49 43  |BOL0EgICAgICAgIC|
000000c0  41 67 49 43 41 2d 55 71  38 45 47 49 46 38 39 61  |AgICA-Uq8EGIF89a|
000000d0  01 00 01 00 f0 00 00 00  00 00 00 00 00 21 f9 04  |.............!..|
000000e0  01 00 00 00 00 2c 00 00  00 00 01 00 01 00 00 02  |.....,..........|
000000f0  02 44 01 00 3b d8 41 0d  97 45 6f fa f4 01 00 00  |.D..;.A..Eo.....|
00000100  00 ab bd 8a cb 2b 00 00  00 00 00 00 00 dc 0f 00  |.....+..........|
00000110  00 03 0d 45 02 86 fc 8d  34 ff 53 2f 00 e7 d9 8e  |...E....4.S/....|
00000120  34 ff 53 2f 00 bd 00 00  00 48 54 54 50 2f 31 2e  |4.S/.....HTTP/1.|
00000130  31 20 32 30 30 20 4f 4b  00 53 65 72 76 65 72 3a  |1 200 OK.Server:|
00000140  20 53 65 72 76 65 72 00  44 61 74 65 3a 20 46 72  | Server.Date: Fr|
00000150  69 2c 20 32 34 20 46 65  62 20 32 30 32 33 20 30  |i, 24 Feb 2023 0|
00000160  32 3a 33 31 3a 30 38 20  47 4d 54 00 43 6f 6e 74  |2:31:08 GMT.Cont|
00000170  65 6e 74 2d 54 79 70 65  3a 20 69 6d 61 67 65 2f  |ent-Type: image/|
00000180  67 69 66 00 43 6f 6e 74  65 6e 74 2d 4c 65 6e 67  |gif.Content-Leng|
00000190  74 68 3a 20 34 33 00 78  2d 61 6d 7a 2d 72 69 64  |th: 43.x-amz-rid|
000001a0  3a 20 42 37 35 4d 32 37  57 4e 38 38 32 54 59 4d  |: B75M27WN882TYM|
000001b0  45 56 32 4e 46 48 00 56  61 72 79 3a 20 43 6f 6e  |EV2NFH.Vary: Con|
000001c0  74 65 6e 74 2d 54 79 70  65 2c 41 63 63 65 70 74  |tent-Type,Accept|
000001d0  2d 45 6e 63 6f 64 69 6e  67 2c 55 73 65 72 2d 41  |-Encoding,User-A|
000001e0  67 65 6e 74 00 00 00 00  00 03 00 00 00 0d 07 00  |gent............|
$ hexdump fff822c2bb27d828_0.gif -C
00000000  47 49 46 38 39 61 01 00  01 00 f0 00 00 00 00 00  |GIF89a..........|
00000010  00 00 00 21 f9 04 01 00  00 00 00 2c 00 00 00 00  |...!.......,....|
00000020  01 00 01 00 00 02 02 44  01 00 3b                 |.......D..;|
0000002b

So, in the Cache file:

hex CA: filedata begins ('GIF89a')
hex 129: http header begins ('HTTP/1.1 200 OK')

Amongst other things, the HTTP header can give the Type of file, the length of file, delivery Date & Encoding (type of compression).

Compression, including decoding Brotli Files

Every sensible Internet Server compresses most of the files that it delivers, and particularly text-files. atm getCC only detects gzip & brotli compression:-

gzip: shown as 'file.txt.gz'
brotli: shown as 'file.txt.br'

If viewed from a terminal with less file.txt.gz the gzip-file will be auto-decompressed & shown as plain text within the less-screen. That will NOT work the same for Brotli files unless you take the following steps:-

Enabling Auto-decompression for Brotli Files within Less

My version of BASH uses ~/.bashrc as a shell-script to initialise it. The following code within ~/.bashrc enables less to auto-decode a wealth of different compressions (though not Brotli) in conjunction with LESSPIPE:-

# make less more friendly for non-text input files, see lesspipe(1)
[ -x /usr/bin/lesspipe ] && eval "$(SHELL=/bin/sh lesspipe)"

Take the following steps to add Brotli to all the other auto-decoded compressions:

Install Brotli
Save the script below as "~/.lessfilter"
Make it executable

#!/bin/sh
# ~/.lessfilter
# 2023-03-11 add brotli to all other encodings for less

case "$1" in
   *.br)
      brotli -dc "$1"
      ;;
   *)
      # We don't handle this format.
      exit 1
esac

# No further processing by lesspipe necessary
exit 0

Last edited by alexkemp (2023-03-11 17:49:13)

alexkemp · 2023-03-12 14:07:57

getCC, the ChromeCache decrypt script:

Does it need to detect other than HTTP/1.1 200 OK?

I needed to know whether it needed to activate on other HTTP Status Codes than just 200, so did some calculations:

$ la ~/.cache/chromium/Default/Cache/Cache_Data/* | wc -l
22525
$ strings ~/.cache/chromium/Default/Cache/Cache_Data/* | fgrep "HTTP/1.1" | sort | uniq -c
strings: Warning: '/home/alexk/.cache/chromium/Default/Cache/Cache_Data/index-dir' is a directory
  14055 HTTP/1.1 200
      1 HTTP/1.1 200 200
    564 HTTP/1.1 200 OK
   7490 HTTP/1.1 204
     45 HTTP/1.1 204 No Content
      5 HTTP/1.1 206
     42 HTTP/1.1 301
     15 HTTP/1.1 301 Moved Permanently
    236 HTTP/1.1 302
      1 HTTP/1.1 302 Found
      1 HTTP/1.1 303 See Other
      1 HTTP/1.1 307
      2 HTTP/1.1 400
      2 HTTP/1.1 403
     84 HTTP/1.1 404
      5 HTTP/1.1 404 Not Found
      1 HTTP/1.1 410
     11 HTTP/1.1 500

Sums:
65% 14,620 HTTP 200 OK
33%  7,535 HTTP 204 No Content
 0%      5 HTTP 206 Partial Content
 0%     57 HTTP 301 Moved Permanently
 1%    237 HTTP 302 Found
 0%      1 HTTP 303 See Other
 0%      1 HTTP 307 Temporary Redirect
 0%      2 HTTP 400 Bad Request
 0%      2 HTTP 403 Forbidden
 0%     89 HTTP 404 Not Found
 0%      1 HTTP 410 Gone
 0%     11 HTTP 500 Internal Server Error

Ah well, that's ok then. The script can stick with Status 200, no problem. There is a small chance that 203 Non-Authoritative Information may be involved (responses from a proxy, although never features in my accesses), but I'm happy to consider the chance of that being remote.

All of the 22 thousand files in the current cache were from servers reporting themselves to be version 1.1. HTTP/0.9 & HTTP/1.0 are now considered obsolete (I bet that some still exist). Both HTTP/2 & HTTP/3 are now supposed to be a thing, although no server reported either version in my accesses. However, I obviously need to modify the PERL regex to accept such possibilities, and that will come with the next post.

alexkemp · 2023-03-12 23:33:51

getCC, the ChromeCache decrypt script:

Added the ability to decode any HTTP version + status 200 or 203 files.

Testing results:

$ ~/Personal/.getCC
image/webp: 000420fedcafe6ff_0; $TLS=.webp; $HPOS=5185; $END=5185; $HVER=1.1; $HSTA=200; $HTTP=HTTP/1.1 200; $MOD=Fri, 03 Mar 2023 20:27:56 GMT; 
$ cd ~/Personal/ChromeCache/Files
$ time ~/Personal/.getCC

real	1m28.431s
user	0m53.738s
sys	0m34.723s

I had noticed that all HTTP/1.1 server responses were preceded by two null bytes in the cache files:

00001430  fc 9b 54 2f 00 a4 ec 1b  fc 9b 54 2f 00 57 02 00  |..T/......T/.W..|
00001440  00 48 54 54 50 2f 31 2e  31 20 32 30 30 00 61 63  |.HTTP/1.1 200.ac|

00002800  02 72 ab 33 c9 16 55 2f  00 8d 9c 35 c9 16 55 2f  |.r.3..U/...5..U/|
00002810  00 56 02 00 00 48 54 54  50 2f 31 2e 31 20 32 30  |.V...HTTP/1.1 20|

00003740  ea 4c 55 2f 00 48 7c d5  ea 4c 55 2f 00 55 02 00  |.LU/.H|..LU/.U..|
00003750  00 48 54 54 50 2f 31 2e  31 20 32 30 30 00 61 63  |.HTTP/1.1 200.ac|

I used that fact to guarantee that the HTTP string that was being indexed for was the correct one + updated $HTTP to contain the correct strings.

Here is the latest code:

#!/usr/bin/perl
   # get Chrome Cache
   # suggestion: save as ~/.getCC; chmod +x; chmod 700

   #  A PERL script to iterate through Chromium/Chrome 'Cache_Data/' dir
   #+ & extract all http-delivered files stored within those data-files

   # 2023-03-12:         Account for multiple http version + 200|203 status
   # 2023-03-08: bugfix: COUNT removed; LEN used instead
   #           +        (F_OFF used for BEG, not COUNT)
   #           +         brotli now works
   #           +        (no magic for brotli (a mistake imo))
   # 2023-03-07: bugfix: corrected miss on most magic files (my bad)
   #           +         excluded compound header fields to eliminate wrong values
   #             added $F_OFF (diff between HTTP-begin ($END - $LEN) & magic-begin ($BEG))
   #           + (*every* file with both $BEG & $LEN has diff == x34) (h-begin is bigger)
   #           + thus if no magic but LEN then BEG = END - LEN - 52
   #           +      if magic but no LEN then LEN = END - BEG - 52 (yes, this *does* happen)
   # 2023-03-05: bugfix: coded to exclude 711 zero-length files
   #           +         account for multiple-same-value $mime (fixes ~1000 gif + jpg files)
   #           +         added 'Content-Encoding:br' Brotli compression
   #           +        (you may need 'sudo apt install brotli' to view those files)

   use strict;
   use warnings;
   use autodie;
   use experimental qw( switch );

   # save algorithm:
   # 1) only save HTTP 200 files ($END)
   # 2) try first to set file beginning ($BEG) from magic bytes
   # 3) if (2) fails, set $BEG from $LEN; if no length, then ignore file
   # 4) extract section $BEG to $END from $IN file into $OUT dir
   # 5) touch file to conform with http header date

   # Stats 2023-03-06:
   # 10978 HTTP 200 from 23594 files in Cache_Data
   #     6 do NOT contain a MIME field
   # 10979 files saved to disk (real	1m23.219s)

   # Global CONSTANTS
   my $IN   = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data/"; # Chromium cache folder.
   my $OUT  = "/home/alexk/Personal/ChromeCache/Files/";               # Place for extracted files
   my $HTTP = "HTTP/1.1 200";                                          # '200 OK' not in all files
   my $F_OFF= 52;                                                      # Offset of HTTP-begin from magic-begin (BEG) + LEN

   opendir( my $d, "$IN") or die "Cannot open directory $IN: $!\n";    # Open cache dir
   my @list 
      = grep { 
         !/^\.\.?$/                                                    # miss /. + /.. files
         && -f "$IN/$_"                                                # is a file (not dir, etc)
	} readdir( $d );
   closedir( $d );
   foreach my $f (@list) {                                             # Iterate through each cached data-file
#     my $f    = "000420fedcafe6ff_0";
      # section variables
      my $BEG  = -1;                                                   # Extract begins (bytes)
      my $BROTLI = 0;                                                  # brotli encoding (0/1)
      my $END  = -1;                                                   # Extract ends   (bytes)
      my $GZIP = 0;                                                    # gzip encoding (0/1)
      my $HPOS = -1;                                                   # 'HTTP' string begins (bytes)
      my $HSTA = -1;                                                   # 'HTTP' status string (only interested in '200' or '203')
      my $HVER = '';                                                   # 'HTTP' version string (eg '1.1')
      my $magic = '';
      my $MIME = "";                                                   # content-type
      my $MOD  = "";                                                   # last-modified
      my $OFF  = -1;                                                   # Offset of magic from file beginning
      my $TLS  = "";                                                   # TLS==Three Letter Suffix
      my $LEN  = -1;                                                   # content-length
      open my $fhi, '<:raw', "$IN/$f" or die $!;
      read( $fhi, my $cache_buffer, -s "$IN/$f" );
      close( $fhi ) or die "could not close $IN/$f: $!";
      if( $cache_buffer =~ /\x{00}\x{00}HTTP\/(\d.\d*)\s(200|203)/i ) {
         $HPOS = $-[0] + 2;
         $HVER = "$1";
         $HSTA = "$2";
         $HTTP = "HTTP/$HVER $HSTA";
      }
      $END     = index( $cache_buffer, "$HTTP", $HPOS);                # Check for presence of HTTP 200|203 header (paranoia coding)
      if( $END > -1 ) {                                                #+(and therefore std header fields for successful access)
         if( $cache_buffer =~ /\x00Content-Encoding:\s*br/i )   { $BROTLI = 1; }
         if( $cache_buffer =~ /\x00Content-Encoding:\s*gzip/i ) { $GZIP   = 1; }
         if( $cache_buffer =~ /\x00Content-Length:\s*(\d+)/i )  {
            $LEN  = $1;
            if( !$LEN ) { $LEN  = -1; }                                # yes, some pages have Content-Length:0
         }
         if( $cache_buffer =~ /\x00Last-Modified:\s*([ A-Za-z0-9,:]+)/i ) {
            $MOD = $1;                                                 # some web servers ignore case + introduce spaces!
         } else {
            if( $cache_buffer =~ /\x00Date:\s*([ A-Za-z0-9,:]+)/i ) {  # did page did not want to be cached? (Chromium did it anyway!)
               $MOD = $1;                                              # (all pages should have a date (or a Date))
            }
         }
         if( $cache_buffer =~ /\x00Content-Type:\s*([a-z-]+\/[a-z0-9.+-]+)/i ) {
            $MIME = $1;
         } # variable $1 NOT reset on failed match (v stupid)
         given( $MIME ) {
            when ('application/font-woff' )  { $magic = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('application/font-woff2')  { $magic = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('application/javascript')  { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }    # magic for gzip encoding
            when ('application/json')        { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'json'; }
            when ('application/x-javascript'){ $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }
            when ('application/xml')         { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }
            when ('binary/octet-stream')     { $magic = "GIF89a";                         $OFF = 0; $TLS = 'gif'; }
            when ('font/ttf')                { $magic = "\x{00}\x{01}\x{00}\x{00}\x{00}"; $OFF = 0; $TLS = 'ttf'; }
            when ('font/woff')               { $magic = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('font/woff2')              { $magic = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('image/gif')               { $magic = 'GIF87a';                         $OFF = 0; $TLS = 'gif'; }
#           when ('image/gif')               { $magic = 'GIF89a';                         $OFF = 0; $TLS = 'gif'; }
            when ('image/jpeg')              { $magic = 'JFIF';                           $OFF = 6; $TLS = 'jpg'; }
#           when ('image/jpeg')              { $magic = 'Exif';                           $OFF = 6; $TLS = 'jpeg'; }
#           when ('image/jpeg')              { $magic = "\x{ff}\x{d8}\x{ff}\x{e0}";       $OFF = 6; $TLS = 'jpg'; }
            when ('image/png')               { $magic = "\x{89}PNG";                      $OFF = 0; $TLS = 'png'; }
            when ('image/svg+xml')           { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'svg'; }
            when ('image/vnd.microsoft.icon'){ $magic = "\x{00}\x{00}\x{01}\x{00}";       $OFF = 0; $TLS = 'ico'; }
            when ('image/webp')              { $magic = 'RIFF';                           $OFF = 0; $TLS = 'webp'; }
            when ('text/css')                { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'css'; }
            when ('text/fragment+html')      { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'htm'; }
            when ('text/html')               { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'html'; }
            when ('text/javascript')         { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }
            when ('text/plain')              { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'txt'; }
            when ('video/mp4')               { $magic = 'ftypisom';                       $OFF = 4; $TLS = 'mp4'; }   # most unlikely
            default                          { $magic = '';                               $OFF = 0; $TLS = ''; }
         }
         if( $magic ) {
            if( $magic eq 'GIF87a') {                                  # account for gif + jpeg multiple $magic
               $BEG = index( $cache_buffer, "$magic" );
               if( $BEG < 0 ) {
                  $magic = 'GIF89a';
                  $BEG   = index( $cache_buffer, "$magic" );
               }
            } elsif( $magic eq 'JFIF') {
               $BEG = index( $cache_buffer, "$magic" );
               if( $BEG < 0 ) {
                  $magic = 'Exif';
                  $TLS   = 'jpeg';
                  $BEG   = index( $cache_buffer, "$magic" );
                  if( $BEG < 0 ) {
                     $magic = "\x{ff}\x{d8}\x{ff}\x{e0}";
                     $TLS   = 'jpg';
                     $BEG   = index( $cache_buffer, "$magic" );
                  }
               }
            }
            $BEG   = index( $cache_buffer, "$magic" );
         }
#         # trying to decode where each file begins (determine common offsets)
#         if( $LEN < 1  &&  $BEG > -1 )  { }
#         if( $BEG > -1  &&  $LEN > -1 ) {
#            # at this point $BEG - $OFF == start of magic
#            #               $END        == start of $HTTP
#            #               $LEN        == length of content from header
#            my $mbeg = $BEG - $OFF;  my $mhex = sprintf("0x%X", $mbeg);
#            my $hbeg = $END - $LEN;  my $hhex = sprintf("0x%X", $hbeg);
#            my $diff = $hbeg - $mbeg;
#            my $dhex = sprintf("0x%X", $diff);
#            print "$MIME: $f; \$END/\$LEN=$END / $LEN; \$mbeg=$mbeg / $mhex; \$hbeg=$hbeg / $hhex; \$diff=$diff / $dhex; \n";
#        }
         if( $BEG > -1 ) {
            $BEG  -= $OFF;
            if( $LEN < 1 ) { $LEN = $END - $BEG - $F_OFF; }            # v rare, but happens
         } elsif( $LEN > -1 ) { $BEG = $END - $LEN - $F_OFF; }         # no magic (text, xml + brotli files)
         # suffixes (holy m$)
         if( $TLS ) {
            $TLS = ".$TLS";
            if( $GZIP || $BROTLI ) {                                   # account for different compression-encodings
               if( $GZIP ) { $TLS = "$TLS.gz"; } else { $TLS = "$TLS.br"; }
            }
         }
         # print the files out
         if( $BEG > -1 && $LEN > -1 ) {
            `dd if="$IN/$f" of="$OUT/$f$TLS" skip=$BEG count=$LEN iflag=skip_bytes,count_bytes status=none`;
            if( $MOD ) { `touch "$OUT/$f$TLS" -d "$MOD"`; }
#           print "$MIME: $f; \$TLS=$TLS; \$HPOS=$HPOS; \$END=$END; \$HVER=$HVER; \$HSTA=$HSTA; \$HTTP=$HTTP; \$MOD=$MOD; \n";
         }
      } # if( $END > -1 )                                              # other pages are most likely to be HTTP 204 No Content
   }

alexkemp · 2023-03-14 01:29:35

This should be the last code update for now (below).

It is tested as well as I can manage in a short time. ~64% of cache are HTTP 200, with most of the rest being 204 No Content. A number of the 200 OK files are also Content-Length:0 (js files for search-results in many cases). The script is written so that no attempt is made to extract no-content files.

The final search was for Content-Encoding: (compression before delivery). My main source was latest Apache modules and that showed that only gzip & brotli are currently used. The statement was that "deflate is not supported", whilst compress was not even mentioned.

#!/usr/bin/perl
   # get Chrome Cache
   # suggestion: save as ~/.getCC; chmod +x; chmod 700

   #  A PERL script to iterate through Chromium/Chrome 'Cache_Data/' dir
   #+ & extract all http-delivered files stored within those data-files

   # 2023-03-12:         Account for multiple http version + 200|203 status
   # 2023-03-08: bugfix: COUNT removed; LEN used instead
   #           +        (F_OFF used for BEG, not COUNT)
   #           +         brotli now works
   #           +        (no magic for brotli (a mistake imo))
   # 2023-03-07: bugfix: corrected miss on most magic files (my bad)
   #           +         excluded compound header fields to eliminate wrong values
   #             added $F_OFF (diff between HTTP-begin ($END - $LEN) & magic-begin ($BEG))
   #           + (*every* file with both $BEG & $LEN has diff == x34) (h-begin is bigger)
   #           + thus if no magic but LEN then BEG = END - LEN - 52
   #           +      if magic but no LEN then LEN = END - BEG - 52 (yes, this *does* happen)
   # 2023-03-05: bugfix: coded to exclude 711 zero-length files
   #           +         account for multiple-same-value $mime (fixes ~1000 gif + jpg files)
   #           +         added 'Content-Encoding:br' Brotli compression
   #           +        (you may need 'sudo apt install brotli' to view those files)

   use strict;
   use warnings;
   use autodie;
   use experimental qw( switch );

   # save algorithm:
   # 1) only save HTTP 200 files ($END)
   # 2) try first to set file beginning ($BEG) from magic bytes
   # 3) if (2) fails, set $BEG from $LEN; if no length, then ignore file
   # 4) extract section $BEG to $END from $IN file into $OUT dir
   # 5) touch file to conform with http header date

   # Stats 2023-03-06:
   # 10978 HTTP 200 from 23594 files in Cache_Data
   #     6 do NOT contain a MIME field
   # 10979 files saved to disk (real	1m23.219s)

   # Global CONSTANTS
   my $IN   = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data/"; # Chromium cache folder.
   my $OUT  = "/home/alexk/Personal/ChromeCache/Files/";               # Place for extracted files
   my $HTTP = "HTTP/1.1 200";                                          # '200 OK' not in all files
   my $F_OFF= 52;                                                      # Offset of HTTP-begin from magic-begin (BEG) + LEN

   opendir( my $d, "$IN") or die "Cannot open directory $IN: $!\n";    # Open cache dir
   my @list 
      = grep { 
         !/^\.\.?$/                                                    # miss /. + /.. files
         && -f "$IN/$_"                                                # is a file (not dir, etc)
	} readdir( $d );
   closedir( $d );
   foreach my $f (@list) {                                             # Iterate through each cached data-file
#     my $f    = "000420fedcafe6ff_0";
      # section variables
      my $BEG  = -1;                                                   # Extract begins (bytes)
      my $BROTLI = 0;                                                  # brotli encoding (0/1)
      my $END  = -1;                                                   # Extract ends   (bytes)
      my $GZIP = 0;                                                    # gzip encoding (0/1)
      my $HPOS = -1;                                                   # 'HTTP' string begins (bytes)
      my $HSTA = -1;                                                   # 'HTTP' status string (only interested in '200' or '203')
      my $HVER = '';                                                   # 'HTTP' version string (eg '1.1')
      my $magic = '';
      my $MIME = "";                                                   # content-type
      my $MOD  = "";                                                   # last-modified
      my $OFF  = -1;                                                   # Offset of magic from file beginning
      my $TLS  = "";                                                   # TLS==Three Letter Suffix
      my $LEN  = -1;                                                   # content-length
      open my $fhi, '<:raw', "$IN/$f" or die $!;
      read( $fhi, my $cache_buffer, -s "$IN/$f" );
      close( $fhi ) or die "could not close $IN/$f: $!";
      if( $cache_buffer =~ /\x{00}\x{00}HTTP\/(\d.\d*)\s(200|203)/i ) {
         $HPOS = $-[0] + 2;
         $HVER = "$1";
         $HSTA = "$2";
         $HTTP = "HTTP/$HVER $HSTA";
      }
      $END     = index( $cache_buffer, "$HTTP", $HPOS);                # Check for presence of HTTP 200|203 header (paranoia coding)
      if( $END > -1 ) {                                                #+(and therefore std header fields for successful access)
         if( $cache_buffer =~ /\x00Content-Encoding:\s*br/i )   { $BROTLI = 1; }
         if( $cache_buffer =~ /\x00Content-Encoding:\s*gzip/i ) { $GZIP   = 1; }
         if( $cache_buffer =~ /\x00Content-Length:\s*(\d+)/i )  {
            $LEN  = $1;
            if( !$LEN ) { $LEN  = -1; }                                # yes, some pages have Content-Length:0
         }
         if( $cache_buffer =~ /\x00Last-Modified:\s*([ A-Za-z0-9,:]+)/i ) {
            $MOD = $1;                                                 # some web servers ignore case + introduce spaces!
         } else {
            if( $cache_buffer =~ /\x00Date:\s*([ A-Za-z0-9,:]+)/i ) {  # did page did not want to be cached? (Chromium did it anyway!)
               $MOD = $1;                                              # (all pages should have a date (or a Date))
            }
         }
         if( $cache_buffer =~ /\x00Content-Type:\s*([a-z-]+\/[a-z0-9.+-]+)/i ) {
            $MIME = $1;
         } # variable $1 NOT reset on failed match (v stupid)

         # easy to mixup mime/media-types & encoding (compression schemes) here
         # Content-Type == mime-type refers to the type of file that is being transferred
         # Content-Encoding == compression scheme refers to the type of compression used during transfer
         # so, a text file (js txt xml, etc) with gzip magic will be a gzipped-textfile (eg file.xml.gz)
         # gzip encoding (+ brotli) are only support; deflate no support, compress not even mentioned
         # see https://httpd.apache.org/docs/current/mod/mod_deflate.html
         # see https://www.iana.org/assignments/media-types/media-types.xhtml
         given( $MIME ) {
            when ('application/font-woff' )  { $magic = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('application/font-woff2')  { $magic = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('application/javascript')  { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }   # magic for gzip encoding
            when ('application/json')        { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'json'; } # magic for gzip encoding
            when ('application/manifest+json'){ $magic = "\x{1f}\x{8b}\x{08}";            $OFF = 0; $TLS = 'json'; } # magic for gzip encoding
            when ('application/x-javascript'){ $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }   # magic for gzip encoding
            when ('application/xml')         { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'xml'; }  # magic for gzip encoding
            when ('binary/octet-stream')     { $magic = "GIF89a";                         $OFF = 0; $TLS = 'gif'; }
            when ('font/ttf')                { $magic = "\x{00}\x{01}\x{00}\x{00}\x{00}"; $OFF = 0; $TLS = 'ttf'; }
            when ('font/woff')               { $magic = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('font/woff2')              { $magic = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('image/gif')               { $magic = 'GIF87a';                         $OFF = 0; $TLS = 'gif'; }
#           when ('image/gif')               { $magic = 'GIF89a';                         $OFF = 0; $TLS = 'gif'; }
            when ('image/jpeg')              { $magic = 'JFIF';                           $OFF = 6; $TLS = 'jpg'; }
#           when ('image/jpeg')              { $magic = 'Exif';                           $OFF = 6; $TLS = 'jpeg'; }
#           when ('image/jpeg')              { $magic = "\x{ff}\x{d8}\x{ff}\x{e0}";       $OFF = 6; $TLS = 'jpg'; }
            when ('image/png')               { $magic = "\x{89}PNG";                      $OFF = 0; $TLS = 'png'; }
            when ('image/svg+xml')           { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'svg'; }  # magic for gzip encoding
            when ('image/vnd.microsoft.icon'){ $magic = "\x{00}\x{00}\x{01}\x{00}";       $OFF = 0; $TLS = 'ico'; }
            when ('image/webp')              { $magic = 'RIFF';                           $OFF = 0; $TLS = 'webp'; }
            when ('image/x-icon')            { $magic = "\x{00}\x{00}\x{01}\x{00}";       $OFF = 0; $TLS = 'ico'; }
            when ('text/css')                { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'css'; }  # magic for gzip encoding
            when ('text/fragment+html')      { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'htm'; }  # magic for gzip encoding
            when ('text/html')               { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'html'; } # magic for gzip encoding
            when ('text/javascript')         { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }   # magic for gzip encoding
            when ('text/plain')              { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'txt'; }  # magic for gzip encoding
            when ('text/xml')                { $magic = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'xml'; }  # magic for gzip encoding
            when ('video/mp4')               { $magic = 'ftypisom';                       $OFF = 4; $TLS = 'mp4'; }  # most unlikely
            default                          { $magic = '';                               $OFF = 0; $TLS = ''; }
         }
         if( $magic ) {
            if( $magic eq 'GIF87a') {                                  # account for gif + jpeg multiple $magic
               $BEG = index( $cache_buffer, "$magic" );
               if( $BEG < 0 ) {
                  $magic = 'GIF89a';
                  $BEG   = index( $cache_buffer, "$magic" );
               }
            } elsif( $magic eq 'JFIF') {
               $BEG = index( $cache_buffer, "$magic" );
               if( $BEG < 0 ) {
                  $magic = 'Exif';
                  $TLS   = 'jpeg';
                  $BEG   = index( $cache_buffer, "$magic" );
                  if( $BEG < 0 ) {
                     $magic = "\x{ff}\x{d8}\x{ff}\x{e0}";
                     $TLS   = 'jpg';
                     $BEG   = index( $cache_buffer, "$magic" );
                  }
               }
            }
            $BEG   = index( $cache_buffer, "$magic" );
         }
         # fix $BEG + $LEN
         if( $BEG > -1 ) {
            $BEG  -= $OFF;
            if( $LEN < 1 ) { $LEN = $END - $BEG - $F_OFF; }            # v rare, but happens
         } elsif( $LEN > -1 ) { $BEG = $END - $LEN - $F_OFF; }         # no magic (text + brotli files)
         # suffixes (holy m$)
         if( $TLS ) {
            $TLS = ".$TLS";
            if( $GZIP || $BROTLI ) {                                   # compression-encoding
               if( $GZIP ) { $TLS = "$TLS.gz"; } else { $TLS = "$TLS.br"; }
            }
         }
         # print the files out
         if( $BEG > -1 && $LEN > -1 ) {
            `dd if="$IN/$f" of="$OUT/$f$TLS" skip=$BEG count=$LEN iflag=skip_bytes,count_bytes status=none`;
            if( $MOD ) { `touch "$OUT/$f$TLS" -d "$MOD"`; }
#           print "$MIME: $f; \$TLS=$TLS; \$BEG=$BEG; \$LEN=$LEN; \$END=$END; \$MOD=$MOD; \n";
         } # lots of Content-Length:0 files
      } # if( $END > -1 )                                              # other pages mostly HTTP 204 No Content
   }

alexkemp · 2023-03-23 02:28:33

It is said that the connection between Rats & Bulldogs lies in the construction of their jawbones: once they bite, neither can release their teeth until the jaws clamp together (due to a ratchet mechanism that joins the upper & lower jawbone). I sympathise with both species; my mind has a similar mechanism.

I finally spotted how to determine the precise length of the embedded URL within each cache (simple) Entry file. It is now possible to collate all urls, data-lengths, etc.. That finally opens the possibility to providing url + file listing, search, selection + individual extraction. However, that will all have to wait for later. For now, it is a simple utility that extracts all cached files (or just one file) into a single directory (listing below).

There is a commented-out print-line almost at the bottom of the script. It can produce a listing of all files for you. The following from a terminal can do that (comment the $DD lines & uncomment the PRINT line first):

~/Personal/.getCC > temp.txt; sort -n temp.txt > mime.txt;

The Cache contains all kinds of corrupted files. There are lines in the script to try to catch those; the notices go to STDERR so it will not corrupt your mime.txt.

Note that there has been a radical reset of almost all code, which creates some disjuncture between current code & earlier BugFix comments. $magic is still in the code but is unused now.

If I cannot stop myself producing a file browser then I shall place the code into GitHub, so that this thread can finally sleep.

#!/usr/bin/perl
   # get Chrome Cache
   # suggestion: save as ~/.getCC; chmod +x; chmod 700

   #  A PERL script to iterate through Chromium/Chrome 'Cache_Data/' dir
   #+ & extract all http-delivered files stored within those data-files

   # 2023-03-21:         Finally found location of URL-length
   #                     (& thus how to find start of content for all files)
   # 2023-03-16: bugfix: Account for Content-Encoding invalidating file-magic
   # 2023-03-12:         Account for multiple http version + 200|203 status
   # 2023-03-08: bugfix: COUNT removed; LEN used instead
   #           +        (FOFF used for BEG, not COUNT)
   #           +         brotli now works
   #           +        (no magic for brotli (a mistake imo))
   # 2023-03-07: bugfix: corrected miss on most magic files (my bad)
   #           +         excluded compound header fields to eliminate wrong values
   #             added $FOFF (diff between HTTP-begin ($END - $LEN) & magic-begin ($BEG))
   #           + (*every* file with both $BEG & $LEN has diff == x34) (h-begin is bigger)
   #           + thus if no magic but LEN then BEG = END - LEN - 52
   #           +      if magic but no LEN then LEN = END - BEG - 52 (yes, this *does* happen)
   # 2023-03-05: bugfix: coded to exclude 711 zero-length files
   #           +         account for multiple-same-value $mime (fixes ~1000 gif + jpg files)
   #           +         added 'Content-Encoding:br' Brotli compression
   #           +        (you may need 'sudo apt install brotli' to view those files)

   use strict;
   use warnings;
   use autodie;
   use experimental qw( switch );

   # Global CONSTANTS
   my $UNBROT= "/usr/bin/brotli -d";                                   # change to your location
   my $DD    = "/bin/dd";                                              # - ditto -
   my $GUNZIP= "/bin/gunzip";                                          # - ditto -
   my $TOUCH = "/usr/bin/touch";                                       # - ditto -
   my $IN    = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data"; # Chromium cache folder
   my $OUT   = "/home/alexk/Personal/ChromeCache/Files/";              # Place to extract files to
   my $FOFF  = 52;                                                     # Offset of HTTP-begin from magic-eof (BEG) + LEN
   my $HTTP  = "HTTP/1.1 200";                                         # '200 OK' not in all files
   my $MEOF  = "\x{d8}\x{41}\x{0d}\x{97}\x{45}\x{6f}\x{fa}\x{f4}";     # Magic End bits (last 8 bytes of every simple cache Entry file data record)
   my $MENT  = "\x{30}\x{5c}\x{72}\x{a7}\x{1b}\x{6d}\x{fb}\x{fc}";     # Magic Start bits (1st 8 bytes of every simple cache Entry file data record)
   my $MURL  = "_dk_";                                                 # Magic Start for URL (url follows within cache Entry file data record)

   # save algorithm:
   # 1) $URL/@URL: find $key_length from header
   # 2) $BEG;$END;$LEN: obtain data start+end (from $key_length + $MEOF)
   # 3) only save HTTP 200 files ($HTTP)
   # 4) $HTTP;$BROTLI;$GZIP;$MIME;$MOD;$TLS: obtain http header fields (from $MEOF + $FOFF)
   # 5) extract section $BEG to $END from $IN file into $OUT dir
   # 6) $MOD: touch file to conform with http header date
   # 7) $BROTLI;$GZIP: decompress gzip/brotli files

   # Stats 2023-03-06:
   # 10978 HTTP 200 from 23594 files in Cache_Data
   #     6 do NOT contain a MIME field
   # 10979 files saved to disk (real	1m23.219s)

   # chromium cache in 2023 is a "simple cache"
   # see https://www.chromium.org/developers/design-documents/network-stack/disk-cache/very-simple-backend/
   # see https://chromium.googlesource.com/chromium/src/+/HEAD/net/disk_cache/simple/simple_entry_format.h
   # see https://github.com/JimmXinu/FanFicFare/blob/main/fanficfare/browsercache/browsercache_simple.py
   # start-of-record magic-marker == 30 5c 72 a7 1b 6d fb fc
   #   end-of-record magic-marker == d8 41 0d 97 45 6f fa f4
   # (data ends immediately before eor)
   # (http header starts 44 bytes after eor, and thus 44+8=52 bytes (\x34) after end-of-data)
   # (eor also ends file; 16 bytes then follow to actual end-of-file)
   # from FFF: (finally found url-length location)
   # cache Entry-file header = struct.Struct('<QLLLL') [little-endian | 8-byte | 4-byte | 4-byte | 4-byte | 4-byte)
   # (magic, version, key_length, key_hash, padding) = shformat.unpack(data)
   # Parse Chrome Cache File; see https://github.com/JimmXinu/FanFicFare/blob/main/fanficfare/browsercache/chromagnon/cacheParse.py

   opendir( my $d, "$IN") or die "Cannot open directory $IN";          # Open cache dir
   my @list 
      = grep { 
         !/^\.\.?$/                                                    # miss /. + /.. files
         && -f "$IN/$_"                                                # is a file (not dir, etc)
	} readdir( $d );
   closedir( $d );
   foreach my $f (@list) {                                             # Iterate through each cached data-file
#     my $f    = "be75a13d44e548da_0";
      # section variables
      my $BEG  = -1;                                                   # Extract begins (bytes)
      my $BROTLI = 0;                                                  # brotli encoding (0/1)
      my $END  = -1;                                                   # Extract ends   (bytes)
      my $GZIP = 0;                                                    # gzip encoding (0/1)
      my $HPOS = -1;                                                   # 'HTTP' string begins (bytes)
      my $HSTA = -1;                                                   # 'HTTP' status string (only interested in '200' or '203')
      my $HVER = '';                                                   # 'HTTP' version string (eg '1.1')
      my $LEN  = -1;                                                   # content-length
      my $MAGIC = '';
      my $MIME = "";                                                   # content-type
      my $MOD  = "";                                                   # last-modified
      my $OFF  = -1;                                                   # Offset of magic from file beginning
      my $TLS  = "";                                                   # TLS==Three Letter Suffix
      my $URL  = "";                                                   # url within cache Entry file
      my @URL  = "";                                                   # same url as an array
      my $UPOS = "";                                                   # position of url start in Entry file
      open my $fh, '<:raw', "$IN/$f" or die "Cannot open file $IN/$f";

      # 1 Obtain url length then url
      # $key_length starts from byte 24 (\x18), normally begins with an 8-byte string '1/0/_dk_', then stretches to the end of the URL sequence
      # the std 8-byte string indicates that two streams (1 + 0) are included within the file
      # the request-url sequence is 2 x (normally-identical) base urls then the full request url, each separated by a single space
      # data supplied to request url begins immediately after the url, and ends immediately before the $MEOF magic-marker
      # http response headers begin 44 bytes after the end of $MEOF, starting with HTTP Status string at $HPOS
      # none of the "std" response headers can be *expected* to exist, though most do
      # all sorts of stuff exists after initial response header bundle, many of which I do not understand
      #+ including content-servers such as amazon, certificates, proxy-servers, others
      # this second stream (for std 2-stream files) ends with another $MEOF 16 bytes (\x10) before eof
      # eg1: "1/0/_dk_https://bbc.co.uk https://bbc.co.uk https://static.files.bbci.co.uk/core/bundle-service-bar.003e5ecd332a5558802c.js"
      #  \x18 ^       ^ $UPOS (=32 =\x20)                ($key_length =123 =\x7b; note: 24+123 =147 =\x93)                         \x93 ^
      # eg2: "d8410d97 456ffaf4 01000000 24be2bf3 8d010000000000005814000003654702 acd8b17d9a552f00b8a4b27d9a552f00 40040000 HTTP/1.1 200"
      # \x220 ^           \x228 ^           \x230 ^                          \x240 ^                          \x250 ^        ^ $HPOS (=596 =\x254)
      my $bytes_read = read $fh, my $bytes, 24;
      die "Got $bytes_read but expected 24" unless $bytes_read == 24;
      my ($magic, $version, $key_length, $key_hash, $padding) = unpack 'a8 a4 a4 a4 a4', $bytes;
      if( unpack('Q', $magic ) ne unpack('Q', $MENT )) {
         $magic = unpack('H16', $magic );
         $MENT = unpack('H16', $MENT );
         die "'$IN/$f' is not a cache entry file, wrong magic number\n (got '$magic' not '$MENT')";
      }
      seek( $fh, 0, 0 );                                               # return to start of file
      read( $fh, my $cache_buffer, -s "$IN/$f" );                      # put whole file in $cache_buffer
      close( $fh ) or die "could not close $IN/$f";
      # Obtain url
      if( $cache_buffer =~ /$MURL/ ) {
         $UPOS = $-[0] + 4;                                            # url begins immediately *after* marker string
         $key_length=unpack('L', $key_length );
         $key_hash  =unpack('H16', $key_hash );
         $URL  = substr( $cache_buffer, $UPOS, $key_length - ($UPOS - 24));
         @URL  = split(' ', $URL );
      }

      # 2 Obtain data start+end
      $BEG = $key_length + 24;
      $END = index( $cache_buffer, "$MEOF", $BEG);
      if( $END < 1 ) {
         print STDERR "'$IN/$f': error finding end of data at $0 line:". __LINE__ ."\n";
         next;                                                         # immediately skips up to foreach() + increments $f
      } else {
         if( $BEG == $END ) {                                          # yes, some pages have Content-Length:0
            $LEN = -1;
         } else {
            $LEN = $END - $BEG;
         }
      }

      # 3 Only extract from HTTP 200|203
      if( $cache_buffer =~ /\x{00}\x{00}HTTP\/(\d.\d*)\s(200|203)/i ) {
         $HPOS = $-[0] + 2;
         if( $HPOS != $END + $FOFF) {
            print STDERR "'$IN/$f':  error finding start of http at $0 line:". __LINE__ ."\n";
            next;                                                      # immediately skips up to foreach() + increments $f
         }
         $HVER = "$1";                                                 # http version; always HTTP/1.1 for me
         $HSTA = "$2";                                                 # http status; we are only interested in 200 or 203
         $HTTP = "HTTP/$HVER $HSTA";
         # 4 Obtain http header fields
         if( $LEN > 0 ) {                                              # yes, some pages have Content-Length:0
            if( $cache_buffer =~ /\x00Content-Encoding:\s*br/i )   { $BROTLI = 1; }
            if( $cache_buffer =~ /\x00Content-Encoding:\s*gzip/i ) { $GZIP   = 1; }
            if( $cache_buffer =~ /\x00Content-Length:\s*(\d+)/i )  {
               if( $1 != $LEN ) {
                  print STDERR "'$IN/$f': data-length \$LEN=$LEN differs from http Content-Length=$1 at $0 line:". __LINE__ ."\n";
               }
               if( !$1 ) { print STDERR "'$IN/$f': len=0 at $0 line:". __LINE__ ."\n"; }
            }
            if( $cache_buffer =~ /\x00Last-Modified:\s*([ A-Za-z0-9,:]+)/i ) {
               $MOD = $1;                                              # some web servers ignore case + introduce spaces!
            } else {
               if( $cache_buffer =~ /\x00Date:\s*([ A-Za-z0-9,:]+)/i ) {# did page did not want to be cached? (Chromium did it anyway!)
                  $MOD = $1;                                           # (all pages should have a date (or a Date))
               }
            }
            if( $cache_buffer =~ /\x00Content-Type:\s*([a-z-]+\/[a-z0-9.+-]+)/i ) {
               $MIME = $1;
            } # variable $1 NOT reset on failed match (v stupid)
         } else { next; } # if( $LEN > 0 )

         # easy to mixup mime/media-types & encoding (compression schemes) here
         # Content-Type     == mime-type          refers to the type of file that is being transferred
         # Content-Encoding == compression scheme refers to the type of compression used during transfer
         # so, a text file (js txt xml, etc) with gzip magic will be a gzipped-textfile (eg file.xml.gz)
         # gzip encoding (+ brotli) are only support; deflate no support, compress not even mentioned
         # see https://httpd.apache.org/docs/current/mod/mod_deflate.html
         # see https://www.iana.org/assignments/media-types/media-types.xhtml
         given( $MIME ) {
            when ('application/font-woff' )  { $MAGIC = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('application/font-woff2')  { $MAGIC = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('application/javascript')  { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }   # magic for gzip encoding
            when ('application/json')        { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'json'; } # magic for gzip encoding
            when ('application/manifest+json'){ $MAGIC = "\x{1f}\x{8b}\x{08}";            $OFF = 0; $TLS = 'json'; } # magic for gzip encoding
            when ('application/x-javascript'){ $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }   # magic for gzip encoding
            when ('application/xml')         { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'xml'; }  # magic for gzip encoding
            when ('binary/octet-stream')     { $MAGIC = "GIF89a";                         $OFF = 0; $TLS = 'gif'; }
            when ('font/ttf')                { $MAGIC = "\x{00}\x{01}\x{00}\x{00}\x{00}"; $OFF = 0; $TLS = 'ttf'; }
            when ('font/woff')               { $MAGIC = 'wOFF';                           $OFF = 0; $TLS = 'woff'; }
            when ('font/woff2')              { $MAGIC = 'wOF2';                           $OFF = 0; $TLS = 'woff2'; }
            when ('image/gif')               { $MAGIC = 'GIF87a';                         $OFF = 0; $TLS = 'gif'; }
#           when ('image/gif')               { $MAGIC = 'GIF89a';                         $OFF = 0; $TLS = 'gif'; }
            when ('image/jpeg')              { $MAGIC = 'JFIF';                           $OFF = 6; $TLS = 'jpg'; }
#           when ('image/jpeg')              { $MAGIC = 'Exif';                           $OFF = 6; $TLS = 'jpeg'; }
#           when ('image/jpeg')              { $MAGIC = "\x{ff}\x{d8}\x{ff}\x{e0}";       $OFF = 6; $TLS = 'jpg'; }
            when ('image/png')               { $MAGIC = "\x{89}PNG";                      $OFF = 0; $TLS = 'png'; }
            when ('image/svg+xml')           { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'svg'; }  # magic for gzip encoding
            when ('image/vnd.microsoft.icon'){ $MAGIC = "\x{00}\x{00}\x{01}\x{00}";       $OFF = 0; $TLS = 'ico'; }
            when ('image/webp')              { $MAGIC = 'RIFF';                           $OFF = 0; $TLS = 'webp'; }
            when ('image/x-icon')            { $MAGIC = "\x{00}\x{00}\x{01}\x{00}";       $OFF = 0; $TLS = 'ico'; }
            when ('text/css')                { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'css'; }  # magic for gzip encoding
            when ('text/fragment+html')      { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'htm'; }  # magic for gzip encoding
            when ('text/html')               { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'html'; } # magic for gzip encoding
            when ('text/javascript')         { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'js'; }   # magic for gzip encoding
            when ('text/plain')              { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'txt'; }  # magic for gzip encoding
            when ('text/xml')                { $MAGIC = "\x{1f}\x{8b}\x{08}";             $OFF = 0; $TLS = 'xml'; }  # magic for gzip encoding
            when ('video/mp4')               { $MAGIC = 'ftypisom';                       $OFF = 4; $TLS = 'mp4'; }  # most unlikely
            default                          { $MAGIC = '';                               $OFF = 0; $TLS = ''; }
         }
         # gzip encoding overrides file magic (is earlier in file-stream)
         # brotli encoding overrides file magic (there is none)
         if( $GZIP ) { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; } elsif( $BROTLI ) { $MAGIC = ""; $OFF = 0; }
         if( $MAGIC ) {
            if( $MAGIC eq 'GIF87a') {                                  # account for gif + jpeg multiple $MAGIC
               if( ! index( $cache_buffer, "$MAGIC" )) {
                  $MAGIC = 'GIF89a';
               }
            } elsif( $MAGIC eq 'JFIF') {
               if( ! index( $cache_buffer, "$MAGIC" )) {
                  $MAGIC = 'Exif';
                  $TLS   = 'jpeg';
                  if( ! index( $cache_buffer, "$MAGIC" )) {
                     $MAGIC = "\x{ff}\x{d8}\x{ff}\x{e0}";
                     $TLS   = 'jpg';
                  }
               }
            }
         }
         # suffixes (holy m$)
         if( $TLS ) {
            $TLS = ".$TLS";
            if( $GZIP || $BROTLI ) {                                   # compression-encoding
               if( $GZIP ) { $TLS = "$TLS.gz"; } else { $TLS = "$TLS.br"; }
            }
         }
         # 5 print the files out
         if( $BEG > -1 && $LEN > -1 ) {
            `$DD if="$IN/$f" of="$OUT/$f$TLS" skip=$BEG count=$LEN iflag=skip_bytes,count_bytes status=none`;
            # 6 set the date to last-modified
            if( $MOD ) { `$TOUCH "$OUT/$f$TLS" -d "$MOD"`; }
            # 7 decompress if necessary
            if( $GZIP || $BROTLI ) {                                   # compression-encoding
               if( $GZIP ) {                                           # decompressed; .gz/.br suffix removed
                  `$GUNZIP "$OUT/$f$TLS"`;                             # original file removed; date retained
               } else {
                  `$UNBROT -j "$OUT/$f$TLS"`;
               }
            }
         } # lots of Content-Length:0 files
#         print "$MIME; $URL[0]; $f; \$key_length=$key_length; \$key_hash=$key_hash; \$BEG=$BEG; \$END=$END; \$LEN=$LEN; \$TLS=$TLS \n";
      } # if( $cache_buffer =~ /\x{00}\x{00}HTTP\/(\d.\d*)\s(200|203)/i ) # other pages mostly HTTP 204 No Content
   }

Thursday update: small improvement to comments

Last edited by alexkemp (2023-03-23 13:27:27)

ralph.ronnquist · 2023-03-23 04:13:22

or perhaps place it somewhere that is not on by microsoft.

alexkemp · 2023-03-23 13:35:01

Hi Ralph

If you have a suggestion(s) I'll investigate it/them. However, I'm used to GitHub now & it is free. Of course, that *is* what was said about MSIE…

boughtonp · 2023-03-23 13:51:00

I've seen both Codeberg and SourceHut recommended before.

Last edited by boughtonp (2023-03-23 13:53:31)

zapper · 2023-03-29 01:43:25

ralph.ronnquist wrote:

or perhaps place it somewhere that is not on by microsoft.

I get if people want to play very old games, etc... if people use microsoft, but who does this by either A:

Exposing anything to the internet or B:

Running that as their main system.

Also, its interesting that windows is still an industry standard.

I wonder why bosses still want it done that way. That sounds like a good way to do security..................................................................If you like the security of being protected by an OS that seems like its made by someone who used the zombie drug, bathsalts... same with having windows being an industry standard.

ralph.ronnquist · 2023-03-29 03:14:17

Yeah, I'm guessing that when Microsoft bought github, they thought that people would show a "convenient mix of lazy and stupid" and pretend for themselves that Microsoft, by keeping keeping github services kind of the same at least initially, supports FOSS.

zapper · 2023-03-29 19:44:48

I agree on both points. Btw, I have more patience with someone who just doesn't know of other alternatives then someone who while not needing windows, still chooses to use it as their main operating system when they aren't ignorant.

That is a special kind of awful, like the kind of awful that continues a perpetual unnecessary cycle, for nothing.

Last edited by zapper (2023-03-29 19:45:25)

alexkemp · 2023-05-31 23:12:23

I've uploaded 2 scripts to Github:

Get-Chrome-Cache

In that Repository, getCC is a PERL script that extracts all accessible files from the Chromium cache. Once they are all extracted, browseCC is a BASH script that accesses text-files dropped into the extract dir & uses YAD to display summaries & specifics on those files, including thumbnails for image files.

The officially official Devuan Forum!

#26 2023-03-11 15:58:50

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

#27 2023-03-11 17:46:40

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

Setup

Comments

Rationale

Compression, including decoding Brotli Files

Enabling Auto-decompression for Brotli Files within Less

#28 2023-03-12 14:07:57

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

Does it need to detect other than HTTP/1.1 200 OK?

#29 2023-03-12 23:33:51

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

#30 2023-03-14 01:29:35

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

#31 2023-03-23 02:28:33

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

#32 2023-03-23 04:13:22

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

#33 2023-03-23 13:35:01

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

#34 2023-03-23 13:51:00

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

#35 2023-03-29 01:43:25

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

#36 2023-03-29 03:14:17

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

#37 2023-03-29 19:44:48

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

#38 2023-05-31 23:12:23

Re: [SOLVED] Help requested with WINE setup and/or Chromium Cache Searching

Board footer