You are not logged in.
in fontforge ".._ttf is not a known format
The use of an underscore (“_”) concerns me there. In general Linux uses mime (or file) to discover what a file actually is, rather than the Window$-inspired “.tld” convention.
I do not have fontforge installed.
Possibly one way to begin to diagnose your situation is from the command-line. If you have mlocate installed (to quickly locate files) and either FireFox and/or Thunderbird installed then you will be able to locate this specific TTF font:
$ file /usr/lib/firefox-esr/fonts/TwemojiMozilla.ttf
/usr/lib/firefox-esr/fonts/TwemojiMozilla.ttf: TrueType Font data, 17 tables, 1st "COLR", 12 names, Macintosh, type 1 string
That should help to begin to help discover exactly what your system thinks any particular ttf file is.
One final other common source is mscorefonts and/or fonts-liberation:
$ apt search fonts-liberation
Sorting... Done
Full Text Search... Done
fonts-liberation/stable,now 1:1.07.4-11 all [installed]
Fonts with the same metrics as Times, Arial and Courier
fonts-liberation2/stable,now 2.1.3-1 all [installed]
Fonts with the same metrics as Times, Arial and Courier (v2)
ttf-mscorefonts-installer/stable,now 3.8 all [installed]
Installer for Microsoft TrueType core fonts
$ file /usr/share/fonts/truetype/msttcorefonts/Arial.ttf
/usr/share/fonts/truetype/msttcorefonts/Arial.ttf: TrueType Font data, digitally signed, 23 tables, 1st "DSIG", 70 names, Unicode, Typeface \251 The Monotype Corporation plc. Data \251 The Monotype Corporation plc/Type Solution
HTH
Hi amc252.
Every electronic component within your computer has a driver associated with it that allows that component to "play along". Monitors are no different to anything else. Therefore, your first search can involve finding a Chimaera driver for the digital TV.
The miracle of modern electronic equipment was made far easier with the introduction of PnP ("Plug 'n' Play"). That relies on a digital connection & various subsystems, and is what allows something like a monitor to be plugged in, auto-detected by the computer, recognised, the driver auto-located via the internet, auto-downloaded & auto-installed. Now, a USB connection is certainly digital but is more used for modems or HDD & little used for connecting monitors - HDMI connections are the standard for that.
Check your computer: does it have a HDMI port?
Check your digital monitor: does it have a HDMI port?
If the answer to the two questions above are both "yes" then you may be in business very quickly; just make sure that both ports are switched "on" in the setup for both machines, and that your computer is connected via an Ethernet port to the internet before you make the HDMI connection (Ethernet is 'old school' & thus provides few problems cf WLAN).
If the above is not possible & you are determined to go ahead with RCA & such-like then bring your will up-to-date so that afterwards others can realise the reasons for your suicide.
Good luck.
I've got two drives that are USB-connected HDD:
Seagate 2TB portable
(this is formatted using standard Linux utilities to (so-called) FAT64 (HPFS/NTFS/exFAT: max 2TB))
WD (Western Digital) 4TB portable ("My Passport")
(this is native M$ format ("Microsoft basic data") and I cannot find a linux utility that can format and/or repair it to it's current state)
The advantage of the former is that it is ubiquitous across many different OS. As an example, my ancient Samsung TV can read & play movies from (1), but not (2).
The advantage of (2) is that it can store above the 2TB threshold. Also, astonishing that I may need it to. I left the disk in it's supplied format since that can be read by more OS than a native Linux format.
As long as you have a 64-bit cpu then (as I understand it) either disk can be read up to the max of the cpu (which I cannot recall as I sit here, but much, much more than 4TB).
OK. Thanks to admin (although the OP has been further edited).
Well, now that you have edited it, it reads "daedalus" although when first posted it read "deadalus".
I was attempting to both be light-hearted in my response, and also warning other folks not to blindly copy your [ code ]'ed config since it contained a spelling mistake. Also, I personally only ever 'code' actual results without editing them so that others can trust that what I code is actually what I got as a result.
You need to address your remarks on the absence of non-free to fsmithred, since he states that yes, that does work whereas security & updates will not. I have no personal experience to be able to comment.
For those that realise that daedalus is *not* dead, do not copy brday's code.
The Devuan package information is here, and the sole Default configuration shown for daedalus is as follows on that page:
deb http://deb.devuan.org/merged daedalus main
@brday:
If your code was copied from the terminal then you likely have a reason (speling), otherwise non-free & contrib are not available for deadalus, though they may be for daedalus.
a new Chimaera live iso from two days ago
Possibly due to a recent kernel upgrade to 6.1.12-1 (available from backports):
$ uname -a
Linux ng3 6.1.0-0.deb11.5-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.12-1~bpo11+1 (2023-03-05) x86_64 GNU/Linux
$ la /boot/initrd* /boot/vmlinuz*
-rw-r--r-- 1 root root 68062772 Mar 28 22:35 /boot/initrd.img-6.0.0-0.deb11.6-amd64
-rw-r--r-- 1 root root 68913767 Mar 31 17:35 /boot/initrd.img-6.1.0-0.deb11.5-amd64
-rw-r--r-- 1 root root 7730784 Dec 19 14:14 /boot/vmlinuz-6.0.0-0.deb11.6-amd64
-rw-r--r-- 1 root root 7866720 Mar 5 18:27 /boot/vmlinuz-6.1.0-0.deb11.5-amd64
My actual update was March 31 (I update daily):
$ la -clt /boot/initrd* /boot/vmlinuz*
-rw-r--r-- 1 root root 68913767 Mar 31 17:35 /boot/initrd.img-6.1.0-0.deb11.5-amd64
-rw-r--r-- 1 root root 7866720 Mar 31 17:35 /boot/vmlinuz-6.1.0-0.deb11.5-amd64
-rw-r--r-- 1 root root 68062772 Mar 28 22:35 /boot/initrd.img-6.0.0-0.deb11.6-amd64
-rw-r--r-- 1 root root 7730784 Jan 4 10:14 /boot/vmlinuz-6.0.0-0.deb11.6-amd64
Hi Ralph
If you have a suggestion(s) I'll investigate it/them. However, I'm used to GitHub now & it is free. Of course, that *is* what was said about MSIE…
It is said that the connection between Rats & Bulldogs lies in the construction of their jawbones: once they bite, neither can release their teeth until the jaws clamp together (due to a ratchet mechanism that joins the upper & lower jawbone). I sympathise with both species; my mind has a similar mechanism.
I finally spotted how to determine the precise length of the embedded URL within each cache (simple) Entry file. It is now possible to collate all urls, data-lengths, etc.. That finally opens the possibility to providing url + file listing, search, selection + individual extraction. However, that will all have to wait for later. For now, it is a simple utility that extracts all cached files (or just one file) into a single directory (listing below).
There is a commented-out print-line almost at the bottom of the script. It can produce a listing of all files for you. The following from a terminal can do that (comment the $DD lines & uncomment the PRINT line first):
~/Personal/.getCC > temp.txt; sort -n temp.txt > mime.txt;
The Cache contains all kinds of corrupted files. There are lines in the script to try to catch those; the notices go to STDERR so it will not corrupt your mime.txt.
Note that there has been a radical reset of almost all code, which creates some disjuncture between current code & earlier BugFix comments. $magic is still in the code but is unused now.
If I cannot stop myself producing a file browser then I shall place the code into GitHub, so that this thread can finally sleep.
#!/usr/bin/perl
# get Chrome Cache
# suggestion: save as ~/.getCC; chmod +x; chmod 700
# A PERL script to iterate through Chromium/Chrome 'Cache_Data/' dir
#+ & extract all http-delivered files stored within those data-files
# 2023-03-21: Finally found location of URL-length
# (& thus how to find start of content for all files)
# 2023-03-16: bugfix: Account for Content-Encoding invalidating file-magic
# 2023-03-12: Account for multiple http version + 200|203 status
# 2023-03-08: bugfix: COUNT removed; LEN used instead
# + (FOFF used for BEG, not COUNT)
# + brotli now works
# + (no magic for brotli (a mistake imo))
# 2023-03-07: bugfix: corrected miss on most magic files (my bad)
# + excluded compound header fields to eliminate wrong values
# added $FOFF (diff between HTTP-begin ($END - $LEN) & magic-begin ($BEG))
# + (*every* file with both $BEG & $LEN has diff == x34) (h-begin is bigger)
# + thus if no magic but LEN then BEG = END - LEN - 52
# + if magic but no LEN then LEN = END - BEG - 52 (yes, this *does* happen)
# 2023-03-05: bugfix: coded to exclude 711 zero-length files
# + account for multiple-same-value $mime (fixes ~1000 gif + jpg files)
# + added 'Content-Encoding:br' Brotli compression
# + (you may need 'sudo apt install brotli' to view those files)
use strict;
use warnings;
use autodie;
use experimental qw( switch );
# Global CONSTANTS
my $UNBROT= "/usr/bin/brotli -d"; # change to your location
my $DD = "/bin/dd"; # - ditto -
my $GUNZIP= "/bin/gunzip"; # - ditto -
my $TOUCH = "/usr/bin/touch"; # - ditto -
my $IN = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data"; # Chromium cache folder
my $OUT = "/home/alexk/Personal/ChromeCache/Files/"; # Place to extract files to
my $FOFF = 52; # Offset of HTTP-begin from magic-eof (BEG) + LEN
my $HTTP = "HTTP/1.1 200"; # '200 OK' not in all files
my $MEOF = "\x{d8}\x{41}\x{0d}\x{97}\x{45}\x{6f}\x{fa}\x{f4}"; # Magic End bits (last 8 bytes of every simple cache Entry file data record)
my $MENT = "\x{30}\x{5c}\x{72}\x{a7}\x{1b}\x{6d}\x{fb}\x{fc}"; # Magic Start bits (1st 8 bytes of every simple cache Entry file data record)
my $MURL = "_dk_"; # Magic Start for URL (url follows within cache Entry file data record)
# save algorithm:
# 1) $URL/@URL: find $key_length from header
# 2) $BEG;$END;$LEN: obtain data start+end (from $key_length + $MEOF)
# 3) only save HTTP 200 files ($HTTP)
# 4) $HTTP;$BROTLI;$GZIP;$MIME;$MOD;$TLS: obtain http header fields (from $MEOF + $FOFF)
# 5) extract section $BEG to $END from $IN file into $OUT dir
# 6) $MOD: touch file to conform with http header date
# 7) $BROTLI;$GZIP: decompress gzip/brotli files
# Stats 2023-03-06:
# 10978 HTTP 200 from 23594 files in Cache_Data
# 6 do NOT contain a MIME field
# 10979 files saved to disk (real 1m23.219s)
# chromium cache in 2023 is a "simple cache"
# see https://www.chromium.org/developers/design-documents/network-stack/disk-cache/very-simple-backend/
# see https://chromium.googlesource.com/chromium/src/+/HEAD/net/disk_cache/simple/simple_entry_format.h
# see https://github.com/JimmXinu/FanFicFare/blob/main/fanficfare/browsercache/browsercache_simple.py
# start-of-record magic-marker == 30 5c 72 a7 1b 6d fb fc
# end-of-record magic-marker == d8 41 0d 97 45 6f fa f4
# (data ends immediately before eor)
# (http header starts 44 bytes after eor, and thus 44+8=52 bytes (\x34) after end-of-data)
# (eor also ends file; 16 bytes then follow to actual end-of-file)
# from FFF: (finally found url-length location)
# cache Entry-file header = struct.Struct('<QLLLL') [little-endian | 8-byte | 4-byte | 4-byte | 4-byte | 4-byte)
# (magic, version, key_length, key_hash, padding) = shformat.unpack(data)
# Parse Chrome Cache File; see https://github.com/JimmXinu/FanFicFare/blob/main/fanficfare/browsercache/chromagnon/cacheParse.py
opendir( my $d, "$IN") or die "Cannot open directory $IN"; # Open cache dir
my @list
= grep {
!/^\.\.?$/ # miss /. + /.. files
&& -f "$IN/$_" # is a file (not dir, etc)
} readdir( $d );
closedir( $d );
foreach my $f (@list) { # Iterate through each cached data-file
# my $f = "be75a13d44e548da_0";
# section variables
my $BEG = -1; # Extract begins (bytes)
my $BROTLI = 0; # brotli encoding (0/1)
my $END = -1; # Extract ends (bytes)
my $GZIP = 0; # gzip encoding (0/1)
my $HPOS = -1; # 'HTTP' string begins (bytes)
my $HSTA = -1; # 'HTTP' status string (only interested in '200' or '203')
my $HVER = ''; # 'HTTP' version string (eg '1.1')
my $LEN = -1; # content-length
my $MAGIC = '';
my $MIME = ""; # content-type
my $MOD = ""; # last-modified
my $OFF = -1; # Offset of magic from file beginning
my $TLS = ""; # TLS==Three Letter Suffix
my $URL = ""; # url within cache Entry file
my @URL = ""; # same url as an array
my $UPOS = ""; # position of url start in Entry file
open my $fh, '<:raw', "$IN/$f" or die "Cannot open file $IN/$f";
# 1 Obtain url length then url
# $key_length starts from byte 24 (\x18), normally begins with an 8-byte string '1/0/_dk_', then stretches to the end of the URL sequence
# the std 8-byte string indicates that two streams (1 + 0) are included within the file
# the request-url sequence is 2 x (normally-identical) base urls then the full request url, each separated by a single space
# data supplied to request url begins immediately after the url, and ends immediately before the $MEOF magic-marker
# http response headers begin 44 bytes after the end of $MEOF, starting with HTTP Status string at $HPOS
# none of the "std" response headers can be *expected* to exist, though most do
# all sorts of stuff exists after initial response header bundle, many of which I do not understand
#+ including content-servers such as amazon, certificates, proxy-servers, others
# this second stream (for std 2-stream files) ends with another $MEOF 16 bytes (\x10) before eof
# eg1: "1/0/_dk_https://bbc.co.uk https://bbc.co.uk https://static.files.bbci.co.uk/core/bundle-service-bar.003e5ecd332a5558802c.js"
# \x18 ^ ^ $UPOS (=32 =\x20) ($key_length =123 =\x7b; note: 24+123 =147 =\x93) \x93 ^
# eg2: "d8410d97 456ffaf4 01000000 24be2bf3 8d010000000000005814000003654702 acd8b17d9a552f00b8a4b27d9a552f00 40040000 HTTP/1.1 200"
# \x220 ^ \x228 ^ \x230 ^ \x240 ^ \x250 ^ ^ $HPOS (=596 =\x254)
my $bytes_read = read $fh, my $bytes, 24;
die "Got $bytes_read but expected 24" unless $bytes_read == 24;
my ($magic, $version, $key_length, $key_hash, $padding) = unpack 'a8 a4 a4 a4 a4', $bytes;
if( unpack('Q', $magic ) ne unpack('Q', $MENT )) {
$magic = unpack('H16', $magic );
$MENT = unpack('H16', $MENT );
die "'$IN/$f' is not a cache entry file, wrong magic number\n (got '$magic' not '$MENT')";
}
seek( $fh, 0, 0 ); # return to start of file
read( $fh, my $cache_buffer, -s "$IN/$f" ); # put whole file in $cache_buffer
close( $fh ) or die "could not close $IN/$f";
# Obtain url
if( $cache_buffer =~ /$MURL/ ) {
$UPOS = $-[0] + 4; # url begins immediately *after* marker string
$key_length=unpack('L', $key_length );
$key_hash =unpack('H16', $key_hash );
$URL = substr( $cache_buffer, $UPOS, $key_length - ($UPOS - 24));
@URL = split(' ', $URL );
}
# 2 Obtain data start+end
$BEG = $key_length + 24;
$END = index( $cache_buffer, "$MEOF", $BEG);
if( $END < 1 ) {
print STDERR "'$IN/$f': error finding end of data at $0 line:". __LINE__ ."\n";
next; # immediately skips up to foreach() + increments $f
} else {
if( $BEG == $END ) { # yes, some pages have Content-Length:0
$LEN = -1;
} else {
$LEN = $END - $BEG;
}
}
# 3 Only extract from HTTP 200|203
if( $cache_buffer =~ /\x{00}\x{00}HTTP\/(\d.\d*)\s(200|203)/i ) {
$HPOS = $-[0] + 2;
if( $HPOS != $END + $FOFF) {
print STDERR "'$IN/$f': error finding start of http at $0 line:". __LINE__ ."\n";
next; # immediately skips up to foreach() + increments $f
}
$HVER = "$1"; # http version; always HTTP/1.1 for me
$HSTA = "$2"; # http status; we are only interested in 200 or 203
$HTTP = "HTTP/$HVER $HSTA";
# 4 Obtain http header fields
if( $LEN > 0 ) { # yes, some pages have Content-Length:0
if( $cache_buffer =~ /\x00Content-Encoding:\s*br/i ) { $BROTLI = 1; }
if( $cache_buffer =~ /\x00Content-Encoding:\s*gzip/i ) { $GZIP = 1; }
if( $cache_buffer =~ /\x00Content-Length:\s*(\d+)/i ) {
if( $1 != $LEN ) {
print STDERR "'$IN/$f': data-length \$LEN=$LEN differs from http Content-Length=$1 at $0 line:". __LINE__ ."\n";
}
if( !$1 ) { print STDERR "'$IN/$f': len=0 at $0 line:". __LINE__ ."\n"; }
}
if( $cache_buffer =~ /\x00Last-Modified:\s*([ A-Za-z0-9,:]+)/i ) {
$MOD = $1; # some web servers ignore case + introduce spaces!
} else {
if( $cache_buffer =~ /\x00Date:\s*([ A-Za-z0-9,:]+)/i ) {# did page did not want to be cached? (Chromium did it anyway!)
$MOD = $1; # (all pages should have a date (or a Date))
}
}
if( $cache_buffer =~ /\x00Content-Type:\s*([a-z-]+\/[a-z0-9.+-]+)/i ) {
$MIME = $1;
} # variable $1 NOT reset on failed match (v stupid)
} else { next; } # if( $LEN > 0 )
# easy to mixup mime/media-types & encoding (compression schemes) here
# Content-Type == mime-type refers to the type of file that is being transferred
# Content-Encoding == compression scheme refers to the type of compression used during transfer
# so, a text file (js txt xml, etc) with gzip magic will be a gzipped-textfile (eg file.xml.gz)
# gzip encoding (+ brotli) are only support; deflate no support, compress not even mentioned
# see https://httpd.apache.org/docs/current/mod/mod_deflate.html
# see https://www.iana.org/assignments/media-types/media-types.xhtml
given( $MIME ) {
when ('application/font-woff' ) { $MAGIC = 'wOFF'; $OFF = 0; $TLS = 'woff'; }
when ('application/font-woff2') { $MAGIC = 'wOF2'; $OFF = 0; $TLS = 'woff2'; }
when ('application/javascript') { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; } # magic for gzip encoding
when ('application/json') { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'json'; } # magic for gzip encoding
when ('application/manifest+json'){ $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'json'; } # magic for gzip encoding
when ('application/x-javascript'){ $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; } # magic for gzip encoding
when ('application/xml') { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'xml'; } # magic for gzip encoding
when ('binary/octet-stream') { $MAGIC = "GIF89a"; $OFF = 0; $TLS = 'gif'; }
when ('font/ttf') { $MAGIC = "\x{00}\x{01}\x{00}\x{00}\x{00}"; $OFF = 0; $TLS = 'ttf'; }
when ('font/woff') { $MAGIC = 'wOFF'; $OFF = 0; $TLS = 'woff'; }
when ('font/woff2') { $MAGIC = 'wOF2'; $OFF = 0; $TLS = 'woff2'; }
when ('image/gif') { $MAGIC = 'GIF87a'; $OFF = 0; $TLS = 'gif'; }
# when ('image/gif') { $MAGIC = 'GIF89a'; $OFF = 0; $TLS = 'gif'; }
when ('image/jpeg') { $MAGIC = 'JFIF'; $OFF = 6; $TLS = 'jpg'; }
# when ('image/jpeg') { $MAGIC = 'Exif'; $OFF = 6; $TLS = 'jpeg'; }
# when ('image/jpeg') { $MAGIC = "\x{ff}\x{d8}\x{ff}\x{e0}"; $OFF = 6; $TLS = 'jpg'; }
when ('image/png') { $MAGIC = "\x{89}PNG"; $OFF = 0; $TLS = 'png'; }
when ('image/svg+xml') { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'svg'; } # magic for gzip encoding
when ('image/vnd.microsoft.icon'){ $MAGIC = "\x{00}\x{00}\x{01}\x{00}"; $OFF = 0; $TLS = 'ico'; }
when ('image/webp') { $MAGIC = 'RIFF'; $OFF = 0; $TLS = 'webp'; }
when ('image/x-icon') { $MAGIC = "\x{00}\x{00}\x{01}\x{00}"; $OFF = 0; $TLS = 'ico'; }
when ('text/css') { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'css'; } # magic for gzip encoding
when ('text/fragment+html') { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'htm'; } # magic for gzip encoding
when ('text/html') { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'html'; } # magic for gzip encoding
when ('text/javascript') { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; } # magic for gzip encoding
when ('text/plain') { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'txt'; } # magic for gzip encoding
when ('text/xml') { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'xml'; } # magic for gzip encoding
when ('video/mp4') { $MAGIC = 'ftypisom'; $OFF = 4; $TLS = 'mp4'; } # most unlikely
default { $MAGIC = ''; $OFF = 0; $TLS = ''; }
}
# gzip encoding overrides file magic (is earlier in file-stream)
# brotli encoding overrides file magic (there is none)
if( $GZIP ) { $MAGIC = "\x{1f}\x{8b}\x{08}"; $OFF = 0; } elsif( $BROTLI ) { $MAGIC = ""; $OFF = 0; }
if( $MAGIC ) {
if( $MAGIC eq 'GIF87a') { # account for gif + jpeg multiple $MAGIC
if( ! index( $cache_buffer, "$MAGIC" )) {
$MAGIC = 'GIF89a';
}
} elsif( $MAGIC eq 'JFIF') {
if( ! index( $cache_buffer, "$MAGIC" )) {
$MAGIC = 'Exif';
$TLS = 'jpeg';
if( ! index( $cache_buffer, "$MAGIC" )) {
$MAGIC = "\x{ff}\x{d8}\x{ff}\x{e0}";
$TLS = 'jpg';
}
}
}
}
# suffixes (holy m$)
if( $TLS ) {
$TLS = ".$TLS";
if( $GZIP || $BROTLI ) { # compression-encoding
if( $GZIP ) { $TLS = "$TLS.gz"; } else { $TLS = "$TLS.br"; }
}
}
# 5 print the files out
if( $BEG > -1 && $LEN > -1 ) {
`$DD if="$IN/$f" of="$OUT/$f$TLS" skip=$BEG count=$LEN iflag=skip_bytes,count_bytes status=none`;
# 6 set the date to last-modified
if( $MOD ) { `$TOUCH "$OUT/$f$TLS" -d "$MOD"`; }
# 7 decompress if necessary
if( $GZIP || $BROTLI ) { # compression-encoding
if( $GZIP ) { # decompressed; .gz/.br suffix removed
`$GUNZIP "$OUT/$f$TLS"`; # original file removed; date retained
} else {
`$UNBROT -j "$OUT/$f$TLS"`;
}
}
} # lots of Content-Length:0 files
# print "$MIME; $URL[0]; $f; \$key_length=$key_length; \$key_hash=$key_hash; \$BEG=$BEG; \$END=$END; \$LEN=$LEN; \$TLS=$TLS \n";
} # if( $cache_buffer =~ /\x{00}\x{00}HTTP\/(\d.\d*)\s(200|203)/i ) # other pages mostly HTTP 204 No Content
}
Thursday update: small improvement to comments
This should be the last code update for now (below).
It is tested as well as I can manage in a short time. ~64% of cache are HTTP 200, with most of the rest being 204 No Content. A number of the 200 OK files are also Content-Length:0 (js files for search-results in many cases). The script is written so that no attempt is made to extract no-content files.
The final search was for Content-Encoding: (compression before delivery). My main source was latest Apache modules and that showed that only gzip & brotli are currently used. The statement was that "deflate is not supported", whilst compress was not even mentioned.
#!/usr/bin/perl
# get Chrome Cache
# suggestion: save as ~/.getCC; chmod +x; chmod 700
# A PERL script to iterate through Chromium/Chrome 'Cache_Data/' dir
#+ & extract all http-delivered files stored within those data-files
# 2023-03-12: Account for multiple http version + 200|203 status
# 2023-03-08: bugfix: COUNT removed; LEN used instead
# + (F_OFF used for BEG, not COUNT)
# + brotli now works
# + (no magic for brotli (a mistake imo))
# 2023-03-07: bugfix: corrected miss on most magic files (my bad)
# + excluded compound header fields to eliminate wrong values
# added $F_OFF (diff between HTTP-begin ($END - $LEN) & magic-begin ($BEG))
# + (*every* file with both $BEG & $LEN has diff == x34) (h-begin is bigger)
# + thus if no magic but LEN then BEG = END - LEN - 52
# + if magic but no LEN then LEN = END - BEG - 52 (yes, this *does* happen)
# 2023-03-05: bugfix: coded to exclude 711 zero-length files
# + account for multiple-same-value $mime (fixes ~1000 gif + jpg files)
# + added 'Content-Encoding:br' Brotli compression
# + (you may need 'sudo apt install brotli' to view those files)
use strict;
use warnings;
use autodie;
use experimental qw( switch );
# save algorithm:
# 1) only save HTTP 200 files ($END)
# 2) try first to set file beginning ($BEG) from magic bytes
# 3) if (2) fails, set $BEG from $LEN; if no length, then ignore file
# 4) extract section $BEG to $END from $IN file into $OUT dir
# 5) touch file to conform with http header date
# Stats 2023-03-06:
# 10978 HTTP 200 from 23594 files in Cache_Data
# 6 do NOT contain a MIME field
# 10979 files saved to disk (real 1m23.219s)
# Global CONSTANTS
my $IN = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data/"; # Chromium cache folder.
my $OUT = "/home/alexk/Personal/ChromeCache/Files/"; # Place for extracted files
my $HTTP = "HTTP/1.1 200"; # '200 OK' not in all files
my $F_OFF= 52; # Offset of HTTP-begin from magic-begin (BEG) + LEN
opendir( my $d, "$IN") or die "Cannot open directory $IN: $!\n"; # Open cache dir
my @list
= grep {
!/^\.\.?$/ # miss /. + /.. files
&& -f "$IN/$_" # is a file (not dir, etc)
} readdir( $d );
closedir( $d );
foreach my $f (@list) { # Iterate through each cached data-file
# my $f = "000420fedcafe6ff_0";
# section variables
my $BEG = -1; # Extract begins (bytes)
my $BROTLI = 0; # brotli encoding (0/1)
my $END = -1; # Extract ends (bytes)
my $GZIP = 0; # gzip encoding (0/1)
my $HPOS = -1; # 'HTTP' string begins (bytes)
my $HSTA = -1; # 'HTTP' status string (only interested in '200' or '203')
my $HVER = ''; # 'HTTP' version string (eg '1.1')
my $magic = '';
my $MIME = ""; # content-type
my $MOD = ""; # last-modified
my $OFF = -1; # Offset of magic from file beginning
my $TLS = ""; # TLS==Three Letter Suffix
my $LEN = -1; # content-length
open my $fhi, '<:raw', "$IN/$f" or die $!;
read( $fhi, my $cache_buffer, -s "$IN/$f" );
close( $fhi ) or die "could not close $IN/$f: $!";
if( $cache_buffer =~ /\x{00}\x{00}HTTP\/(\d.\d*)\s(200|203)/i ) {
$HPOS = $-[0] + 2;
$HVER = "$1";
$HSTA = "$2";
$HTTP = "HTTP/$HVER $HSTA";
}
$END = index( $cache_buffer, "$HTTP", $HPOS); # Check for presence of HTTP 200|203 header (paranoia coding)
if( $END > -1 ) { #+(and therefore std header fields for successful access)
if( $cache_buffer =~ /\x00Content-Encoding:\s*br/i ) { $BROTLI = 1; }
if( $cache_buffer =~ /\x00Content-Encoding:\s*gzip/i ) { $GZIP = 1; }
if( $cache_buffer =~ /\x00Content-Length:\s*(\d+)/i ) {
$LEN = $1;
if( !$LEN ) { $LEN = -1; } # yes, some pages have Content-Length:0
}
if( $cache_buffer =~ /\x00Last-Modified:\s*([ A-Za-z0-9,:]+)/i ) {
$MOD = $1; # some web servers ignore case + introduce spaces!
} else {
if( $cache_buffer =~ /\x00Date:\s*([ A-Za-z0-9,:]+)/i ) { # did page did not want to be cached? (Chromium did it anyway!)
$MOD = $1; # (all pages should have a date (or a Date))
}
}
if( $cache_buffer =~ /\x00Content-Type:\s*([a-z-]+\/[a-z0-9.+-]+)/i ) {
$MIME = $1;
} # variable $1 NOT reset on failed match (v stupid)
# easy to mixup mime/media-types & encoding (compression schemes) here
# Content-Type == mime-type refers to the type of file that is being transferred
# Content-Encoding == compression scheme refers to the type of compression used during transfer
# so, a text file (js txt xml, etc) with gzip magic will be a gzipped-textfile (eg file.xml.gz)
# gzip encoding (+ brotli) are only support; deflate no support, compress not even mentioned
# see https://httpd.apache.org/docs/current/mod/mod_deflate.html
# see https://www.iana.org/assignments/media-types/media-types.xhtml
given( $MIME ) {
when ('application/font-woff' ) { $magic = 'wOFF'; $OFF = 0; $TLS = 'woff'; }
when ('application/font-woff2') { $magic = 'wOF2'; $OFF = 0; $TLS = 'woff2'; }
when ('application/javascript') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; } # magic for gzip encoding
when ('application/json') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'json'; } # magic for gzip encoding
when ('application/manifest+json'){ $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'json'; } # magic for gzip encoding
when ('application/x-javascript'){ $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; } # magic for gzip encoding
when ('application/xml') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'xml'; } # magic for gzip encoding
when ('binary/octet-stream') { $magic = "GIF89a"; $OFF = 0; $TLS = 'gif'; }
when ('font/ttf') { $magic = "\x{00}\x{01}\x{00}\x{00}\x{00}"; $OFF = 0; $TLS = 'ttf'; }
when ('font/woff') { $magic = 'wOFF'; $OFF = 0; $TLS = 'woff'; }
when ('font/woff2') { $magic = 'wOF2'; $OFF = 0; $TLS = 'woff2'; }
when ('image/gif') { $magic = 'GIF87a'; $OFF = 0; $TLS = 'gif'; }
# when ('image/gif') { $magic = 'GIF89a'; $OFF = 0; $TLS = 'gif'; }
when ('image/jpeg') { $magic = 'JFIF'; $OFF = 6; $TLS = 'jpg'; }
# when ('image/jpeg') { $magic = 'Exif'; $OFF = 6; $TLS = 'jpeg'; }
# when ('image/jpeg') { $magic = "\x{ff}\x{d8}\x{ff}\x{e0}"; $OFF = 6; $TLS = 'jpg'; }
when ('image/png') { $magic = "\x{89}PNG"; $OFF = 0; $TLS = 'png'; }
when ('image/svg+xml') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'svg'; } # magic for gzip encoding
when ('image/vnd.microsoft.icon'){ $magic = "\x{00}\x{00}\x{01}\x{00}"; $OFF = 0; $TLS = 'ico'; }
when ('image/webp') { $magic = 'RIFF'; $OFF = 0; $TLS = 'webp'; }
when ('image/x-icon') { $magic = "\x{00}\x{00}\x{01}\x{00}"; $OFF = 0; $TLS = 'ico'; }
when ('text/css') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'css'; } # magic for gzip encoding
when ('text/fragment+html') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'htm'; } # magic for gzip encoding
when ('text/html') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'html'; } # magic for gzip encoding
when ('text/javascript') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; } # magic for gzip encoding
when ('text/plain') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'txt'; } # magic for gzip encoding
when ('text/xml') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'xml'; } # magic for gzip encoding
when ('video/mp4') { $magic = 'ftypisom'; $OFF = 4; $TLS = 'mp4'; } # most unlikely
default { $magic = ''; $OFF = 0; $TLS = ''; }
}
if( $magic ) {
if( $magic eq 'GIF87a') { # account for gif + jpeg multiple $magic
$BEG = index( $cache_buffer, "$magic" );
if( $BEG < 0 ) {
$magic = 'GIF89a';
$BEG = index( $cache_buffer, "$magic" );
}
} elsif( $magic eq 'JFIF') {
$BEG = index( $cache_buffer, "$magic" );
if( $BEG < 0 ) {
$magic = 'Exif';
$TLS = 'jpeg';
$BEG = index( $cache_buffer, "$magic" );
if( $BEG < 0 ) {
$magic = "\x{ff}\x{d8}\x{ff}\x{e0}";
$TLS = 'jpg';
$BEG = index( $cache_buffer, "$magic" );
}
}
}
$BEG = index( $cache_buffer, "$magic" );
}
# fix $BEG + $LEN
if( $BEG > -1 ) {
$BEG -= $OFF;
if( $LEN < 1 ) { $LEN = $END - $BEG - $F_OFF; } # v rare, but happens
} elsif( $LEN > -1 ) { $BEG = $END - $LEN - $F_OFF; } # no magic (text + brotli files)
# suffixes (holy m$)
if( $TLS ) {
$TLS = ".$TLS";
if( $GZIP || $BROTLI ) { # compression-encoding
if( $GZIP ) { $TLS = "$TLS.gz"; } else { $TLS = "$TLS.br"; }
}
}
# print the files out
if( $BEG > -1 && $LEN > -1 ) {
`dd if="$IN/$f" of="$OUT/$f$TLS" skip=$BEG count=$LEN iflag=skip_bytes,count_bytes status=none`;
if( $MOD ) { `touch "$OUT/$f$TLS" -d "$MOD"`; }
# print "$MIME: $f; \$TLS=$TLS; \$BEG=$BEG; \$LEN=$LEN; \$END=$END; \$MOD=$MOD; \n";
} # lots of Content-Length:0 files
} # if( $END > -1 ) # other pages mostly HTTP 204 No Content
}
getCC, the ChromeCache decrypt script:
Added the ability to decode any HTTP version + status 200 or 203 files.
Testing results:
$ ~/Personal/.getCC
image/webp: 000420fedcafe6ff_0; $TLS=.webp; $HPOS=5185; $END=5185; $HVER=1.1; $HSTA=200; $HTTP=HTTP/1.1 200; $MOD=Fri, 03 Mar 2023 20:27:56 GMT;
$ cd ~/Personal/ChromeCache/Files
$ time ~/Personal/.getCC
real 1m28.431s
user 0m53.738s
sys 0m34.723s
I had noticed that all HTTP/1.1 server responses were preceded by two null bytes in the cache files:
00001430 fc 9b 54 2f 00 a4 ec 1b fc 9b 54 2f 00 57 02 00 |..T/......T/.W..|
00001440 00 48 54 54 50 2f 31 2e 31 20 32 30 30 00 61 63 |.HTTP/1.1 200.ac|
00002800 02 72 ab 33 c9 16 55 2f 00 8d 9c 35 c9 16 55 2f |.r.3..U/...5..U/|
00002810 00 56 02 00 00 48 54 54 50 2f 31 2e 31 20 32 30 |.V...HTTP/1.1 20|
00003740 ea 4c 55 2f 00 48 7c d5 ea 4c 55 2f 00 55 02 00 |.LU/.H|..LU/.U..|
00003750 00 48 54 54 50 2f 31 2e 31 20 32 30 30 00 61 63 |.HTTP/1.1 200.ac|
I used that fact to guarantee that the HTTP string that was being indexed for was the correct one + updated $HTTP to contain the correct strings.
Here is the latest code:
#!/usr/bin/perl
# get Chrome Cache
# suggestion: save as ~/.getCC; chmod +x; chmod 700
# A PERL script to iterate through Chromium/Chrome 'Cache_Data/' dir
#+ & extract all http-delivered files stored within those data-files
# 2023-03-12: Account for multiple http version + 200|203 status
# 2023-03-08: bugfix: COUNT removed; LEN used instead
# + (F_OFF used for BEG, not COUNT)
# + brotli now works
# + (no magic for brotli (a mistake imo))
# 2023-03-07: bugfix: corrected miss on most magic files (my bad)
# + excluded compound header fields to eliminate wrong values
# added $F_OFF (diff between HTTP-begin ($END - $LEN) & magic-begin ($BEG))
# + (*every* file with both $BEG & $LEN has diff == x34) (h-begin is bigger)
# + thus if no magic but LEN then BEG = END - LEN - 52
# + if magic but no LEN then LEN = END - BEG - 52 (yes, this *does* happen)
# 2023-03-05: bugfix: coded to exclude 711 zero-length files
# + account for multiple-same-value $mime (fixes ~1000 gif + jpg files)
# + added 'Content-Encoding:br' Brotli compression
# + (you may need 'sudo apt install brotli' to view those files)
use strict;
use warnings;
use autodie;
use experimental qw( switch );
# save algorithm:
# 1) only save HTTP 200 files ($END)
# 2) try first to set file beginning ($BEG) from magic bytes
# 3) if (2) fails, set $BEG from $LEN; if no length, then ignore file
# 4) extract section $BEG to $END from $IN file into $OUT dir
# 5) touch file to conform with http header date
# Stats 2023-03-06:
# 10978 HTTP 200 from 23594 files in Cache_Data
# 6 do NOT contain a MIME field
# 10979 files saved to disk (real 1m23.219s)
# Global CONSTANTS
my $IN = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data/"; # Chromium cache folder.
my $OUT = "/home/alexk/Personal/ChromeCache/Files/"; # Place for extracted files
my $HTTP = "HTTP/1.1 200"; # '200 OK' not in all files
my $F_OFF= 52; # Offset of HTTP-begin from magic-begin (BEG) + LEN
opendir( my $d, "$IN") or die "Cannot open directory $IN: $!\n"; # Open cache dir
my @list
= grep {
!/^\.\.?$/ # miss /. + /.. files
&& -f "$IN/$_" # is a file (not dir, etc)
} readdir( $d );
closedir( $d );
foreach my $f (@list) { # Iterate through each cached data-file
# my $f = "000420fedcafe6ff_0";
# section variables
my $BEG = -1; # Extract begins (bytes)
my $BROTLI = 0; # brotli encoding (0/1)
my $END = -1; # Extract ends (bytes)
my $GZIP = 0; # gzip encoding (0/1)
my $HPOS = -1; # 'HTTP' string begins (bytes)
my $HSTA = -1; # 'HTTP' status string (only interested in '200' or '203')
my $HVER = ''; # 'HTTP' version string (eg '1.1')
my $magic = '';
my $MIME = ""; # content-type
my $MOD = ""; # last-modified
my $OFF = -1; # Offset of magic from file beginning
my $TLS = ""; # TLS==Three Letter Suffix
my $LEN = -1; # content-length
open my $fhi, '<:raw', "$IN/$f" or die $!;
read( $fhi, my $cache_buffer, -s "$IN/$f" );
close( $fhi ) or die "could not close $IN/$f: $!";
if( $cache_buffer =~ /\x{00}\x{00}HTTP\/(\d.\d*)\s(200|203)/i ) {
$HPOS = $-[0] + 2;
$HVER = "$1";
$HSTA = "$2";
$HTTP = "HTTP/$HVER $HSTA";
}
$END = index( $cache_buffer, "$HTTP", $HPOS); # Check for presence of HTTP 200|203 header (paranoia coding)
if( $END > -1 ) { #+(and therefore std header fields for successful access)
if( $cache_buffer =~ /\x00Content-Encoding:\s*br/i ) { $BROTLI = 1; }
if( $cache_buffer =~ /\x00Content-Encoding:\s*gzip/i ) { $GZIP = 1; }
if( $cache_buffer =~ /\x00Content-Length:\s*(\d+)/i ) {
$LEN = $1;
if( !$LEN ) { $LEN = -1; } # yes, some pages have Content-Length:0
}
if( $cache_buffer =~ /\x00Last-Modified:\s*([ A-Za-z0-9,:]+)/i ) {
$MOD = $1; # some web servers ignore case + introduce spaces!
} else {
if( $cache_buffer =~ /\x00Date:\s*([ A-Za-z0-9,:]+)/i ) { # did page did not want to be cached? (Chromium did it anyway!)
$MOD = $1; # (all pages should have a date (or a Date))
}
}
if( $cache_buffer =~ /\x00Content-Type:\s*([a-z-]+\/[a-z0-9.+-]+)/i ) {
$MIME = $1;
} # variable $1 NOT reset on failed match (v stupid)
given( $MIME ) {
when ('application/font-woff' ) { $magic = 'wOFF'; $OFF = 0; $TLS = 'woff'; }
when ('application/font-woff2') { $magic = 'wOF2'; $OFF = 0; $TLS = 'woff2'; }
when ('application/javascript') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; } # magic for gzip encoding
when ('application/json') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'json'; }
when ('application/x-javascript'){ $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; }
when ('application/xml') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; }
when ('binary/octet-stream') { $magic = "GIF89a"; $OFF = 0; $TLS = 'gif'; }
when ('font/ttf') { $magic = "\x{00}\x{01}\x{00}\x{00}\x{00}"; $OFF = 0; $TLS = 'ttf'; }
when ('font/woff') { $magic = 'wOFF'; $OFF = 0; $TLS = 'woff'; }
when ('font/woff2') { $magic = 'wOF2'; $OFF = 0; $TLS = 'woff2'; }
when ('image/gif') { $magic = 'GIF87a'; $OFF = 0; $TLS = 'gif'; }
# when ('image/gif') { $magic = 'GIF89a'; $OFF = 0; $TLS = 'gif'; }
when ('image/jpeg') { $magic = 'JFIF'; $OFF = 6; $TLS = 'jpg'; }
# when ('image/jpeg') { $magic = 'Exif'; $OFF = 6; $TLS = 'jpeg'; }
# when ('image/jpeg') { $magic = "\x{ff}\x{d8}\x{ff}\x{e0}"; $OFF = 6; $TLS = 'jpg'; }
when ('image/png') { $magic = "\x{89}PNG"; $OFF = 0; $TLS = 'png'; }
when ('image/svg+xml') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'svg'; }
when ('image/vnd.microsoft.icon'){ $magic = "\x{00}\x{00}\x{01}\x{00}"; $OFF = 0; $TLS = 'ico'; }
when ('image/webp') { $magic = 'RIFF'; $OFF = 0; $TLS = 'webp'; }
when ('text/css') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'css'; }
when ('text/fragment+html') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'htm'; }
when ('text/html') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'html'; }
when ('text/javascript') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; }
when ('text/plain') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'txt'; }
when ('video/mp4') { $magic = 'ftypisom'; $OFF = 4; $TLS = 'mp4'; } # most unlikely
default { $magic = ''; $OFF = 0; $TLS = ''; }
}
if( $magic ) {
if( $magic eq 'GIF87a') { # account for gif + jpeg multiple $magic
$BEG = index( $cache_buffer, "$magic" );
if( $BEG < 0 ) {
$magic = 'GIF89a';
$BEG = index( $cache_buffer, "$magic" );
}
} elsif( $magic eq 'JFIF') {
$BEG = index( $cache_buffer, "$magic" );
if( $BEG < 0 ) {
$magic = 'Exif';
$TLS = 'jpeg';
$BEG = index( $cache_buffer, "$magic" );
if( $BEG < 0 ) {
$magic = "\x{ff}\x{d8}\x{ff}\x{e0}";
$TLS = 'jpg';
$BEG = index( $cache_buffer, "$magic" );
}
}
}
$BEG = index( $cache_buffer, "$magic" );
}
# # trying to decode where each file begins (determine common offsets)
# if( $LEN < 1 && $BEG > -1 ) { }
# if( $BEG > -1 && $LEN > -1 ) {
# # at this point $BEG - $OFF == start of magic
# # $END == start of $HTTP
# # $LEN == length of content from header
# my $mbeg = $BEG - $OFF; my $mhex = sprintf("0x%X", $mbeg);
# my $hbeg = $END - $LEN; my $hhex = sprintf("0x%X", $hbeg);
# my $diff = $hbeg - $mbeg;
# my $dhex = sprintf("0x%X", $diff);
# print "$MIME: $f; \$END/\$LEN=$END / $LEN; \$mbeg=$mbeg / $mhex; \$hbeg=$hbeg / $hhex; \$diff=$diff / $dhex; \n";
# }
if( $BEG > -1 ) {
$BEG -= $OFF;
if( $LEN < 1 ) { $LEN = $END - $BEG - $F_OFF; } # v rare, but happens
} elsif( $LEN > -1 ) { $BEG = $END - $LEN - $F_OFF; } # no magic (text, xml + brotli files)
# suffixes (holy m$)
if( $TLS ) {
$TLS = ".$TLS";
if( $GZIP || $BROTLI ) { # account for different compression-encodings
if( $GZIP ) { $TLS = "$TLS.gz"; } else { $TLS = "$TLS.br"; }
}
}
# print the files out
if( $BEG > -1 && $LEN > -1 ) {
`dd if="$IN/$f" of="$OUT/$f$TLS" skip=$BEG count=$LEN iflag=skip_bytes,count_bytes status=none`;
if( $MOD ) { `touch "$OUT/$f$TLS" -d "$MOD"`; }
# print "$MIME: $f; \$TLS=$TLS; \$HPOS=$HPOS; \$END=$END; \$HVER=$HVER; \$HSTA=$HSTA; \$HTTP=$HTTP; \$MOD=$MOD; \n";
}
} # if( $END > -1 ) # other pages are most likely to be HTTP 204 No Content
}
getCC, the ChromeCache decrypt script:
I needed to know whether it needed to activate on other HTTP Status Codes than just 200, so did some calculations:
$ la ~/.cache/chromium/Default/Cache/Cache_Data/* | wc -l
22525
$ strings ~/.cache/chromium/Default/Cache/Cache_Data/* | fgrep "HTTP/1.1" | sort | uniq -c
strings: Warning: '/home/alexk/.cache/chromium/Default/Cache/Cache_Data/index-dir' is a directory
14055 HTTP/1.1 200
1 HTTP/1.1 200 200
564 HTTP/1.1 200 OK
7490 HTTP/1.1 204
45 HTTP/1.1 204 No Content
5 HTTP/1.1 206
42 HTTP/1.1 301
15 HTTP/1.1 301 Moved Permanently
236 HTTP/1.1 302
1 HTTP/1.1 302 Found
1 HTTP/1.1 303 See Other
1 HTTP/1.1 307
2 HTTP/1.1 400
2 HTTP/1.1 403
84 HTTP/1.1 404
5 HTTP/1.1 404 Not Found
1 HTTP/1.1 410
11 HTTP/1.1 500
Sums:
65% 14,620 HTTP 200 OK
33% 7,535 HTTP 204 No Content
0% 5 HTTP 206 Partial Content
0% 57 HTTP 301 Moved Permanently
1% 237 HTTP 302 Found
0% 1 HTTP 303 See Other
0% 1 HTTP 307 Temporary Redirect
0% 2 HTTP 400 Bad Request
0% 2 HTTP 403 Forbidden
0% 89 HTTP 404 Not Found
0% 1 HTTP 410 Gone
0% 11 HTTP 500 Internal Server Error
Ah well, that's ok then. The script can stick with Status 200, no problem. There is a small chance that 203 Non-Authoritative Information may be involved (responses from a proxy, although never features in my accesses), but I'm happy to consider the chance of that being remote.
All of the 22 thousand files in the current cache were from servers reporting themselves to be version 1.1. HTTP/0.9 & HTTP/1.0 are now considered obsolete (I bet that some still exist). Both HTTP/2 & HTTP/3 are now supposed to be a thing, although no server reported either version in my accesses. However, I obviously need to modify the PERL regex to accept such possibilities, and that will come with the next post.
Explanation + info on setting up getCC, the ChromeCache decrypt script:
Install PERL if necessary
(makes use of switch which was installed by default in version 5.10, but also available from CPAN)
Place the script where you will
Make executable
(chmod +x; chmod 700)
Set the values of $IN & $OUT
(lines 41 + 42; be careful to check permissions, particularly for $OUT)
Run the command from a command-prompt
(there are often 10s of thousands of files decrypted, so there is zero terminal output if no errors)
Install brotli
(sudo apt install brotli)
(this is to facilitate viewing text files)
(I run Chimaera & it is available as standard)
All lines beginning with a # are comments.
Lines 137 - 148 are all commented. It was exploratory code to determine if there was a common offset to the beginning of the cached file. There *was* indeed such an offset ($diff). This was important as not all files contained magic, and the start-of-file varied in ways that I could not decrypt.
The Chrome CacheData dir contains data-files which each contain the data + http-header from a single HTTP file delivered from a server during a Chrome/Chromium browser session.
HTTP files consist of a HTTP header + data.
The CacheData files have the file-data near the top of the file, then the HTTP header & then a bunch of other stuff. Here is a *very* small gif-file to make the point (look for 'GIF89a', the gif magic-marker, at ca in the hex-dump below). Notice how the gif is just 43 bytes, yet the cache-file that contains it is 4k bytes:
$ la ~/.cache/chromium/Default/Cache/Cache_Data/fff822c2bb27d828_0
-rw------- 1 alexk alexk 4389 Feb 24 02:31 /home/alexk/.cache/chromium/Default/Cache/Cache_Data/fff822c2bb27d828_0
$ la ~/Personal/ChromeCache/Files/fff822c2bb27d828_0.gif
-rw-r--r-- 1 alexk alexk 43 Feb 24 02:31 /home/alexk/Personal/ChromeCache/Files/fff822c2bb27d828_0.gif
$ hexdump ~/.cache/chromium/Default/Cache/Cache_Data/fff822c2bb27d828_0 -C | head -31
00000000 30 5c 72 a7 1b 6d fb fc 05 00 00 00 b2 00 00 00 |0\r..m..........|
00000010 23 84 68 3b 00 00 00 00 31 2f 30 2f 5f 64 6b 5f |#.h;....1/0/_dk_|
00000020 68 74 74 70 73 3a 2f 2f 61 6d 61 7a 6f 6e 2e 63 |https://amazon.c|
00000030 6f 2e 75 6b 20 68 74 74 70 73 3a 2f 2f 61 6d 61 |o.uk https://ama|
00000040 7a 6f 6e 2e 63 6f 2e 75 6b 20 68 74 74 70 73 3a |zon.co.uk https:|
00000050 2f 2f 61 61 78 2d 65 75 2e 61 6d 61 7a 6f 6e 2e |//aax-eu.amazon.|
00000060 63 6f 2e 75 6b 2f 65 2f 6c 6f 69 2f 69 6d 70 3f |co.uk/e/loi/imp?|
00000070 62 3d 4a 48 4f 6b 41 4c 63 55 4e 66 59 35 4f 61 |b=JHOkALcUNfY5Oa|
00000080 54 5f 5a 31 61 39 4c 32 67 41 41 41 47 47 67 55 |T_Z1a9L2gAAAGGgU|
00000090 4b 4d 77 67 4d 41 41 41 48 32 41 51 42 4f 4c 30 |KMwgMAAAH2AQBOL0|
000000a0 45 67 49 43 41 67 49 43 41 67 49 43 41 67 49 43 |EgICAgICAgICAgIC|
000000b0 42 4f 4c 30 45 67 49 43 41 67 49 43 41 67 49 43 |BOL0EgICAgICAgIC|
000000c0 41 67 49 43 41 2d 55 71 38 45 47 49 46 38 39 61 |AgICA-Uq8EGIF89a|
000000d0 01 00 01 00 f0 00 00 00 00 00 00 00 00 21 f9 04 |.............!..|
000000e0 01 00 00 00 00 2c 00 00 00 00 01 00 01 00 00 02 |.....,..........|
000000f0 02 44 01 00 3b d8 41 0d 97 45 6f fa f4 01 00 00 |.D..;.A..Eo.....|
00000100 00 ab bd 8a cb 2b 00 00 00 00 00 00 00 dc 0f 00 |.....+..........|
00000110 00 03 0d 45 02 86 fc 8d 34 ff 53 2f 00 e7 d9 8e |...E....4.S/....|
00000120 34 ff 53 2f 00 bd 00 00 00 48 54 54 50 2f 31 2e |4.S/.....HTTP/1.|
00000130 31 20 32 30 30 20 4f 4b 00 53 65 72 76 65 72 3a |1 200 OK.Server:|
00000140 20 53 65 72 76 65 72 00 44 61 74 65 3a 20 46 72 | Server.Date: Fr|
00000150 69 2c 20 32 34 20 46 65 62 20 32 30 32 33 20 30 |i, 24 Feb 2023 0|
00000160 32 3a 33 31 3a 30 38 20 47 4d 54 00 43 6f 6e 74 |2:31:08 GMT.Cont|
00000170 65 6e 74 2d 54 79 70 65 3a 20 69 6d 61 67 65 2f |ent-Type: image/|
00000180 67 69 66 00 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 |gif.Content-Leng|
00000190 74 68 3a 20 34 33 00 78 2d 61 6d 7a 2d 72 69 64 |th: 43.x-amz-rid|
000001a0 3a 20 42 37 35 4d 32 37 57 4e 38 38 32 54 59 4d |: B75M27WN882TYM|
000001b0 45 56 32 4e 46 48 00 56 61 72 79 3a 20 43 6f 6e |EV2NFH.Vary: Con|
000001c0 74 65 6e 74 2d 54 79 70 65 2c 41 63 63 65 70 74 |tent-Type,Accept|
000001d0 2d 45 6e 63 6f 64 69 6e 67 2c 55 73 65 72 2d 41 |-Encoding,User-A|
000001e0 67 65 6e 74 00 00 00 00 00 03 00 00 00 0d 07 00 |gent............|
$ hexdump fff822c2bb27d828_0.gif -C
00000000 47 49 46 38 39 61 01 00 01 00 f0 00 00 00 00 00 |GIF89a..........|
00000010 00 00 00 21 f9 04 01 00 00 00 00 2c 00 00 00 00 |...!.......,....|
00000020 01 00 01 00 00 02 02 44 01 00 3b |.......D..;|
0000002b
So, in the Cache file:
hex CA: filedata begins ('GIF89a')
hex 129: http header begins ('HTTP/1.1 200 OK')
Amongst other things, the HTTP header can give the Type of file, the length of file, delivery Date & Encoding (type of compression).
Every sensible Internet Server compresses most of the files that it delivers, and particularly text-files. atm getCC only detects gzip & brotli compression:-
gzip: shown as 'file.txt.gz'
brotli: shown as 'file.txt.br'
If viewed from a terminal with less file.txt.gz the gzip-file will be auto-decompressed & shown as plain text within the less-screen. That will NOT work the same for Brotli files unless you take the following steps:-
My version of BASH uses ~/.bashrc as a shell-script to initialise it. The following code within ~/.bashrc enables less to auto-decode a wealth of different compressions (though not Brotli) in conjunction with LESSPIPE:-
# make less more friendly for non-text input files, see lesspipe(1)
[ -x /usr/bin/lesspipe ] && eval "$(SHELL=/bin/sh lesspipe)"
Take the following steps to add Brotli to all the other auto-decoded compressions:
Install Brotli
Save the script below as "~/.lessfilter"
Make it executable
#!/bin/sh
# ~/.lessfilter
# 2023-03-11 add brotli to all other encodings for less
case "$1" in
*.br)
brotli -dc "$1"
;;
*)
# We don't handle this format.
exit 1
esac
# No further processing by lesspipe necessary
exit 0
I'm setting this thread to "SOLVED" now.
WINE has been fixed by removing it, and the script I added in the previous post now works fully to extract all of the files within CacheData. The one thing that is missing is a description of the script + how to setup less to auto-show the compressed Brotli files, so I'll put that in the next post.
I'm simply astonished that so few people (seemingly just one) have produced a Chrome cache viewer.
There *is* another on Github. It was a little heavyweight for me, so I spent a week learning PERL whilst writing a script to extract all the Chrome-cached files into a directory. ~100 lines. Below for your elucidation:
4pm update: +20 lines to fix ~2000 bad files
5pm update: added Brotli compression encoding; still not sure if that works ok
Mar 8 update: Brotli now works; ~150 active lines (+ ~10 debug lines commented out)
#!/usr/bin/perl
# get Chrome Cache
# suggestion: save as ~/.getCC; chmod +x; chmod 700
# A PERL script to iterate through the Chromium/Chrome 'Cache_Data/'
#+extract all http-delivered files stored within those data-files
# 2023-03-08: bugfix: COUNT removed; LEN used instead
# + (F_OFF used for BEG, not COUNT)
# + brotli now works
# + (no magic for brotli (a mistake imo))
# 2023-03-07: bugfix: corrected miss on most magic files (my bad)
# + excluded compound header fields to eliminate wrong values
# added $F_OFF (diff between HTTP-begin ($END - $LEN) & magic-begin ($BEG))
# + (*every* file with both $BEG & $LEN has diff == x34) (h-begin is bigger)
# + thus if no magic but LEN then BEG = END - LEN - 52
# + if magic but no LEN then LEN = END - BEG - 52 (yes, this *does* happen)
# 2023-03-05: bugfix: coded to exclude 711 zero-length files
# + account for multiple-same-value $mime (fixes ~1000 gif + jpg files)
# + added 'Content-Encoding:br' Brotli compression
# + (you may need 'sudo apt install brotli' to view those files)
use strict;
use warnings;
use autodie;
use experimental qw( switch );
# save algorithm:
# 1) only save HTTP 200 files ($END)
# 2) try first to set file beginning ($BEG) from magic bytes
# 3) if (2) fails, set $BEG from $LEN; if no length, then ignore file
# 4) extract section $BEG to $END from $IN file into $OUT dir
# 5) touch file to conform with http header date
# Stats 2023-03-06:
# 10978 HTTP 200 from 23594 files in Cache_Data
# 6 do NOT contain a MIME field
# 10979 files saved to disk (real 1m23.219s)
# Global CONSTANTS
my $IN = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data/"; # Chromium cache folder.
my $OUT = "/home/alexk/Personal/ChromeCache/Files/"; # Place for extracted files
my $HTTP = "HTTP/1.1 200"; # '200 OK' not in all files
my $F_OFF= 52; # Offset of HTTP-begin from magic-begin (BEG) + LEN
opendir( my $d, "$IN") or die "Cannot open directory $IN: $!\n"; # Open cache dir
my @list
= grep {
!/^\.\.?$/ # miss /. + /.. files
&& -f "$IN/$_" # is a file (not dir, etc)
} readdir( $d );
closedir( $d );
foreach my $f (@list) { # Iterate through each cached data-file
# my $f = "0f0ce6df8548452e_0";
# section variables
my $BEG = -1; # Extract begins (bytes)
my $BROTLI = 0; # brotli encoding (0/1)
my $END = -1; # Extract ends (bytes)
my $GZIP = 0; # gzip encoding (0/1)
my $magic = '';
my $MIME = ""; # content-type
my $MOD = ""; # last-modified
my $OFF = -1; # Offset of magic from file beginning
my $TLS = ""; # TLS==Three Letter Suffix
my $LEN = -1; # content-length
open my $fhi, '<:raw', "$IN/$f" or die $!;
read( $fhi, my $cache_buffer, -s "$IN/$f" );
close( $fhi ) or die "could not close $IN/$f: $!";
$END = index( $cache_buffer, "$HTTP"); # Check for presence of HTTP 200 OK header
if( $END > -1 ) { #+(and therefore std header fields)
if( $cache_buffer =~ /\x00Content-Encoding:\s*br/i ) { $BROTLI = 1; }
if( $cache_buffer =~ /\x00Content-Encoding:\s*gzip/i ) { $GZIP = 1; }
if( $cache_buffer =~ /\x00Content-Length:\s*(\d+)/i ) {
$LEN = $1;
if( !$LEN ) { $LEN = -1; } # yes, some pages have Content-Length:0
}
if( $cache_buffer =~ /\x00Last-Modified:\s*([ A-Za-z0-9,:]+)/i ) {
$MOD = $1; # some web servers ignore case + introduce spaces!
} else {
if( $cache_buffer =~ /\x00Date:\s*([ A-Za-z0-9,:]+)/i ) { # did page did not want to be cached? (Chromium did it anyway!)
$MOD = $1; # (all pages should have a date (or a Date))
}
}
if( $cache_buffer =~ /\x00Content-Type:\s*([a-z-]+\/[a-z0-9.+-]+)/i ) {
$MIME = $1;
} # variable $1 NOT reset on failed match (v stupid)
given( $MIME ) {
when ('application/font-woff' ) { $magic = 'wOFF'; $OFF = 0; $TLS = 'woff'; }
when ('application/font-woff2') { $magic = 'wOF2'; $OFF = 0; $TLS = 'woff2'; }
when ('application/javascript') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; } # magic for gzip encoding
when ('application/json') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'json'; }
when ('application/x-javascript'){ $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; }
when ('application/xml') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; }
when ('binary/octet-stream') { $magic = "GIF89a"; $OFF = 0; $TLS = 'gif'; }
when ('font/ttf') { $magic = "\x{00}\x{01}\x{00}\x{00}\x{00}"; $OFF = 0; $TLS = 'ttf'; }
when ('font/woff') { $magic = 'wOFF'; $OFF = 0; $TLS = 'woff'; }
when ('font/woff2') { $magic = 'wOF2'; $OFF = 0; $TLS = 'woff2'; }
when ('image/gif') { $magic = 'GIF87a'; $OFF = 0; $TLS = 'gif'; }
# when ('image/gif') { $magic = 'GIF89a'; $OFF = 0; $TLS = 'gif'; }
when ('image/jpeg') { $magic = 'JFIF'; $OFF = 6; $TLS = 'jpg'; }
# when ('image/jpeg') { $magic = 'Exif'; $OFF = 6; $TLS = 'jpeg'; }
# when ('image/jpeg') { $magic = "\x{ff}\x{d8}\x{ff}\x{e0}"; $OFF = 6; $TLS = 'jpg'; }
when ('image/png') { $magic = "\x{89}PNG"; $OFF = 0; $TLS = 'png'; }
when ('image/svg+xml') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'svg'; }
when ('image/vnd.microsoft.icon'){ $magic = "\x{00}\x{00}\x{01}\x{00}"; $OFF = 0; $TLS = 'ico'; }
when ('image/webp') { $magic = 'RIFF'; $OFF = 0; $TLS = 'webp'; }
when ('text/css') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'css'; }
when ('text/fragment+html') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'htm'; }
when ('text/html') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'html'; }
when ('text/javascript') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'js'; }
when ('text/plain') { $magic = "\x{1f}\x{8b}\x{08}"; $OFF = 0; $TLS = 'txt'; }
when ('video/mp4') { $magic = 'ftypisom'; $OFF = 4; $TLS = 'mp4'; } # most unlikely
default { $magic = ''; $OFF = 0; $TLS = ''; }
}
if( $magic ) {
if( $magic eq 'GIF87a') { # account for gif + jpeg multiple $magic
$BEG = index( $cache_buffer, "$magic" );
if( $BEG < 0 ) {
$magic = 'GIF89a';
$BEG = index( $cache_buffer, "$magic" );
}
} elsif( $magic eq 'JFIF') {
$BEG = index( $cache_buffer, "$magic" );
if( $BEG < 0 ) {
$magic = 'Exif';
$TLS = 'jpeg';
$BEG = index( $cache_buffer, "$magic" );
if( $BEG < 0 ) {
$magic = "\x{ff}\x{d8}\x{ff}\x{e0}";
$TLS = 'jpg';
$BEG = index( $cache_buffer, "$magic" );
}
}
}
$BEG = index( $cache_buffer, "$magic" );
}
# # trying to decode where each file begins (determine common offsets)
# if( $LEN < 1 && $BEG > -1 ) { }
# if( $BEG > -1 && $LEN > -1 ) {
# # at this point $BEG - $OFF == start of magic
# # $END == start of $HTTP
# # $LEN == length of content from header
# my $mbeg = $BEG - $OFF; my $mhex = sprintf("0x%X", $mbeg);
# my $hbeg = $END - $LEN; my $hhex = sprintf("0x%X", $hbeg);
# my $diff = $hbeg - $mbeg;
# my $dhex = sprintf("0x%X", $diff);
# print "$MIME: $f; \$END/\$LEN=$END / $LEN; \$mbeg=$mbeg / $mhex; \$hbeg=$hbeg / $hhex; \$diff=$diff / $dhex; \n";
# }
if( $BEG > -1 ) {
$BEG -= $OFF;
if( $LEN < 1 ) { $LEN = $END - $BEG - $F_OFF; } # v rare, but happens
} elsif( $LEN > -1 ) { $BEG = $END - $LEN - $F_OFF; } # no magic (text, xml + brotli files)
# suffixes (holy m$)
if( $TLS ) {
$TLS = ".$TLS";
if( $GZIP || $BROTLI ) { # account for different compression-encodings
if( $GZIP ) { $TLS = "$TLS.gz"; } else { $TLS = "$TLS.br"; }
}
}
# print the files out
if( $BEG > -1 && $LEN > -1 ) {
`dd if="$IN/$f" of="$OUT/$f$TLS" skip=$BEG count=$LEN iflag=skip_bytes,count_bytes status=none`;
if( $MOD ) { `touch "$OUT/$f$TLS" -d "$MOD"`; }
# print "$MIME: $f; \$TLS=$TLS; \$END=$END; \$BEG=$BEG; \$LEN=$LEN; \$MOD=$MOD; \n";
}
} # if( $END > -1 ) # other pages are most likely to be HTTP 204 No Content
}
Mark Hindley in the bug report was able to get to the gates of success in installing wine32 on a vanilla chimaera, and has therefore fingered backports as the reason for the error on my system. That log-file reported a terrifyingly-large number of i386 packages to install as helpers to wine32.
I would like to give public thanks to Mark for his help so far, but I'm going to remove all traces of Wine & the i386 architecture from my system.
BeginnerForever at this StackOverflow page has a PHP script which, after just a couple of tweaks, will extract all JPEG + PNG files from the Chromium/Chrome dir to a dir. Fast & very impressive
There now follows my small update to that script. I've added a section for GIF files (those files get extracted, but do not work as image files):
#!/usr/bin/php
<?php
// getCC (get Chrome Cache)
// suggestion: save as ~/.getCC; chmod +x; chmod 700
$dir = "/home/alexk/.cache/chromium/Default/Cache/Cache_Data/"; // Chromium cache folder.
$ppl = "/home/alexk/Personal/ChromeCache/Files/"; // Place for extracted files
// $END = "HTTP/1.1 200 OK"; // Search in cache-file (works, yet not in some files)
$END = "HTTP/1.1 200"; // Search in cache-file (works, and IS in all files)
$FTL = ""; // Filetype lowercase
$FTU = ""; // Filetype uppercase
$MOFF = 0; // Offset of magic from file beginning
$list = scandir( $dir );
foreach( $list as $filename ) {
if( is_file( $dir.$filename )) {
$content = file_get_contents( $dir.$filename );
if( strstr( $content, 'JFIF')) {
$FTL = "jpg";
$FTU = "JPEG";
$MOFF = 6;
echo( $filename." $FTU \n");
$start = ( strpos( $content, "JFIF", 0 ) - $MOFF );
$end = strpos( $content, $END, 0 );
$content = substr( $content, $start, $end - $MOFF );
$length = strlen( $content );
$wholenm = $ppl.$filename.".$FTL";
file_put_contents( $wholenm, $content );
// echo( "Saving :".$wholenm." \n");
echo( "start : $start \n");
echo( "end : $end \n");
$diff = $end - $start;
echo( "length: $length (s/b $diff)\n");
}
elseif( strstr( $content, "\211PNG")) {
$FTL = "png";
$FTU = "PNG";
$MOFF = 1;
echo( $filename." $FTU \n");
$start = ( strpos( $content, "$FTU", 0 ) - $MOFF );
$end = strpos( $content, $END, 0 );
$content = substr( $content, $start, $end - $MOFF );
$length = strlen( $content );
$wholenm = $ppl.$filename.".$FTL";
file_put_contents( $wholenm, $content );
// echo( "Saving :".$wholenm." \n");
echo( "start : $start \n");
echo( "end : $end \n");
$diff = $end - $start;
echo( "length: $length (s/b $diff)\n");
}
elseif( strstr( $content, "GIF89a")) {
$FTL = "gif";
$FTU = "GIF";
$MOFF = 0;
echo( $filename." $FTU \n");
$start = ( strpos( $content, "GIF89a", 0 ) - $MOFF );
$end = strpos( $content, $END, 0 );
$newc = substr( $content, $start, $end );
$length = strlen( $newc );
$wholenm = $ppl.$filename.".$FTL";
file_put_contents( $wholenm, $newc );
echo( "Saving :".$wholenm." \n");
echo( "start : $start \n");
echo( "end : $end \n");
$diff = $end - $start;
echo( "length: $length (s/b $diff)\n");
}
else {
echo( $filename." UNKNOWN \n");
}
}
}
?>
There are a couple of strange occurrences that I cannot explain nor fix, and have added some echo lines to try to debug it. I'm going to rewrite the script in BASH which, hopefully, will be more reliable. If so, I will not need WINE (hooray!).
Line 8 has $END = "HTTP/1.1 200 OK"; & each section has $end = strpos( $content, $END, 0 );. I discovered that some files do not have an "OK" in the cache-file, yet they were found (not by grep) & the image correctly extracted. I cannot explain what is going on there.
The file content is concatenated within the Cache_Data file immediately before the $END string. Somehow, none of the extracted files is the length that they should be. JPEG + PNG files do not seem to mind, but GIF files refuse to play. I put some echo lines into the script to try to debug what on earth is going on.
Here is the very end of the script text output, to try to give some sense of the difficulty:
ffa41e3d8b4e0cf9_0 PNG
start : 150
end : 14212
length: 14211 (s/b 14062)
ffa78518232ea9f2_0 PNG
start : 170
end : 1417
length: 1416 (s/b 1247)
ffad48f3aefb6cd7_0 GIF
Saving :/home/alexk/Personal/ChromeCache/Files/ffad48f3aefb6cd7_0.gif
start : 1089
end : 1183
length: 1183 (s/b 94)
ffba1f5387a04a08_0 JPEG
start : 166
end : 972
length: 966 (s/b 806)
ffbf8448256da635_0 UNKNOWN
ffc1ebd8d62551b6_0 GIF
Saving :/home/alexk/Personal/ChromeCache/Files/ffc1ebd8d62551b6_0.gif
start : 193
end : 288
length: 288 (s/b 95)
ffc2019c23af2000_0 UNKNOWN
ffc239239bc4e4a9_0 JPEG
start : 195
end : 1920
length: 1914 (s/b 1725)
ffc57d9b41cebadd_0 UNKNOWN
ffcbd7258d6a0aea_0 UNKNOWN
ffda4d6b8e2937fd_0 UNKNOWN
ffdac4bf770719a1_0 UNKNOWN
ffde560cb8ad0eaf_0 UNKNOWN
fff42f6de6d58540_0 UNKNOWN
fff530252c03d813_0 UNKNOWN
fff55afc8b58e35f_0 UNKNOWN
fff822c2bb27d828_0 GIF
Saving :/home/alexk/Personal/ChromeCache/Files/fff822c2bb27d828_0.gif
start : 202
end : 297
length: 297 (s/b 95)
index UNKNOWN
PHP seems to be unworkable now, so I'm going to switch to BASH.
Obviously, somewhere in the Chromium code will be routines for accessing, exploring & extracting these cached files. I'm simply astonished that so few people (seemingly just one) have produced a Chrome cache viewer.
Have you looked at any of the files to determine contents?
Yes. It's not easy, since they all give the same enigmatic response to file:
~/.cache/chromium/Default/Cache/Cache_Data$ file -z 00037beb6d874770_0
00037beb6d874770_0: data
Some refer to css files, some to image files, and so on. I've tried to find html files but that is difficult, as almost all files contain the text 'text/html' without actually containing any reference to such a file. Annoyingly, none seem to contain any actual css, png, html nor any other filetype content, though they do seem to contain packet headers. I'll try to illustrate:
$ hexdump 47c5717d1de790a5_0 -C
00000000 30 5c 72 a7 1b 6d fb fc 05 00 00 00 78 00 00 00 |0\r..m......x...|
00000010 31 0b 69 2c 00 00 00 00 31 2f 30 2f 5f 64 6b 5f |1.i,....1/0/_dk_|
00000020 68 74 74 70 73 3a 2f 2f 79 6f 75 74 75 62 65 2e |https://youtube.|
00000030 63 6f 6d 20 68 74 74 70 73 3a 2f 2f 79 6f 75 74 |com https://yout|
00000040 75 62 65 2e 63 6f 6d 20 68 74 74 70 73 3a 2f 2f |ube.com https://|
00000050 77 77 77 2e 67 73 74 61 74 69 63 2e 63 6f 6d 2f |www.gstatic.com/|
00000060 79 6f 75 74 75 62 65 2f 69 6d 67 2f 62 72 61 6e |youtube/img/bran|
00000070 64 69 6e 67 2f 66 61 76 69 63 6f 6e 2f 66 61 76 |ding/favicon/fav|
00000080 69 63 6f 6e 5f 31 34 34 78 31 34 34 2e 70 6e 67 |icon_144x144.png|
00000090 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 |.PNG........IHDR|
000000a0 00 00 00 90 00 00 00 90 08 03 00 00 00 d0 98 12 |................|
000000b0 8a 00 00 00 63 50 4c 54 45 00 00 00 ff 00 00 ff |....cPLTE.......|
000000c0 00 00 ff 00 00 ff 00 00 ff 00 00 ff 00 00 ff 00 |................|
# (snip)
00000390 00 fc 86 92 50 21 39 2f 00 e8 02 00 00 48 54 54 |....P!9/.....HTT|
000003a0 50 2f 31 2e 31 20 32 30 30 00 61 63 63 65 70 74 |P/1.1 200.accept|
000003b0 2d 72 61 6e 67 65 73 3a 62 79 74 65 73 00 63 72 |-ranges:bytes.cr|
000003c0 6f 73 73 2d 6f 72 69 67 69 6e 2d 72 65 73 6f 75 |oss-origin-resou|
000003d0 72 63 65 2d 70 6f 6c 69 63 79 3a 63 72 6f 73 73 |rce-policy:cross|
000003e0 2d 6f 72 69 67 69 6e 00 63 72 6f 73 73 2d 6f 72 |-origin.cross-or|
000003f0 69 67 69 6e 2d 6f 70 65 6e 65 72 2d 70 6f 6c 69 |igin-opener-poli|
00000400 63 79 2d 72 65 70 6f 72 74 2d 6f 6e 6c 79 3a 73 |cy-report-only:s|
00000410 61 6d 65 2d 6f 72 69 67 69 6e 3b 20 72 65 70 6f |ame-origin; repo|
00000420 72 74 2d 74 6f 3d 22 73 74 61 74 69 63 2d 6f 6e |rt-to="static-on|
00000430 2d 62 69 67 74 61 62 6c 65 22 00 72 65 70 6f 72 |-bigtable".repor|
00000440 74 2d 74 6f 3a 7b 22 67 72 6f 75 70 22 3a 22 73 |t-to:{"group":"s|
00000450 74 61 74 69 63 2d 6f 6e 2d 62 69 67 74 61 62 6c |tatic-on-bigtabl|
00000460 65 22 2c 22 6d 61 78 5f 61 67 65 22 3a 32 35 39 |e","max_age":259|
00000470 32 30 30 30 2c 22 65 6e 64 70 6f 69 6e 74 73 22 |2000,"endpoints"|
00000480 3a 5b 7b 22 75 72 6c 22 3a 22 68 74 74 70 73 3a |:[{"url":"https:|
00000490 2f 2f 63 73 70 2e 77 69 74 68 67 6f 6f 67 6c 65 |//csp.withgoogle|
000004a0 2e 63 6f 6d 2f 63 73 70 2f 72 65 70 6f 72 74 2d |.com/csp/report-|
000004b0 74 6f 2f 73 74 61 74 69 63 2d 6f 6e 2d 62 69 67 |to/static-on-big|
000004c0 74 61 62 6c 65 22 7d 5d 7d 00 63 6f 6e 74 65 6e |table"}]}.conten|
000004d0 74 2d 6c 65 6e 67 74 68 3a 37 32 39 00 78 2d 63 |t-length:729.x-c|
000004e0 6f 6e 74 65 6e 74 2d 74 79 70 65 2d 6f 70 74 69 |ontent-type-opti|
000004f0 6f 6e 73 3a 6e 6f 73 6e 69 66 66 00 73 65 72 76 |ons:nosniff.serv|
00000500 65 72 3a 73 66 66 65 00 78 2d 78 73 73 2d 70 72 |er:sffe.x-xss-pr|
00000510 6f 74 65 63 74 69 6f 6e 3a 30 00 64 61 74 65 3a |otection:0.date:|
00000520 53 75 6e 2c 20 31 33 20 4d 61 72 20 32 30 32 32 |Sun, 13 Mar 2022|
00000530 20 31 36 3a 34 32 3a 33 39 20 47 4d 54 00 65 78 | 16:42:39 GMT.ex|
00000540 70 69 72 65 73 3a 4d 6f 6e 2c 20 31 33 20 4d 61 |pires:Mon, 13 Ma|
00000550 72 20 32 30 32 33 20 31 36 3a 34 32 3a 33 39 20 |r 2023 16:42:39 |
00000560 47 4d 54 00 63 61 63 68 65 2d 63 6f 6e 74 72 6f |GMT.cache-contro|
00000570 6c 3a 70 75 62 6c 69 63 2c 20 6d 61 78 2d 61 67 |l:public, max-ag|
00000580 65 3d 33 31 35 33 36 30 30 30 00 61 67 65 3a 34 |e=31536000.age:4|
00000590 37 35 37 39 34 00 6c 61 73 74 2d 6d 6f 64 69 66 |75794.last-modif|
000005a0 69 65 64 3a 54 68 75 2c 20 30 33 20 4f 63 74 20 |ied:Thu, 03 Oct |
000005b0 32 30 31 39 20 31 30 3a 31 35 3a 30 30 20 47 4d |2019 10:15:00 GM|
000005c0 54 00 63 6f 6e 74 65 6e 74 2d 74 79 70 65 3a 69 |T.content-type:i|
000005d0 6d 61 67 65 2f 70 6e 67 00 61 6c 74 2d 73 76 63 |mage/png.alt-svc|
000005e0 3a 68 33 3d 22 3a 34 34 33 22 3b 20 6d 61 3d 32 |:h3=":443"; ma=2|
000005f0 35 39 32 30 30 30 2c 68 33 2d 32 39 3d 22 3a 34 |592000,h3-29=":4|
00000600 34 33 22 3b 20 6d 61 3d 32 35 39 32 30 30 30 2c |43"; ma=2592000,|
00000610 68 33 2d 51 30 35 30 3d 22 3a 34 34 33 22 3b 20 |h3-Q050=":443"; |
00000620 6d 61 3d 32 35 39 32 30 30 30 2c 68 33 2d 51 30 |ma=2592000,h3-Q0|
00000630 34 36 3d 22 3a 34 34 33 22 3b 20 6d 61 3d 32 35 |46=":443"; ma=25|
00000640 39 32 30 30 30 2c 68 33 2d 51 30 34 33 3d 22 3a |92000,h3-Q043=":|
00000650 34 34 33 22 3b 20 6d 61 3d 32 35 39 32 30 30 30 |443"; ma=2592000|
00000660 2c 71 75 69 63 3d 22 3a 34 34 33 22 3b 20 6d 61 |,quic=":443"; ma|
00000670 3d 32 35 39 32 30 30 30 3b 20 76 3d 22 34 36 2c |=2592000; v="46,|
00000680 34 33 22 00 00 03 00 00 00 c0 04 00 00 30 82 04 |43"..........0..|
00000690 bc 30 82 03 a4 a0 03 02 01 02 02 11 00 89 50 eb |.0............P.|
The inevitable conclusion is that the actual content must be somewhere else within the labyrinth of dirs.
I *did* originally grep the files for '3040s' (not 1100) & found it in a number of dates (4 different dates if I remember correctly). I have a record in bash history only of 'Feb 18', 3 different files, due to specific checks made at the time. However, today *none* of the files contain '3040s', which is why in the abbreviated results (there are *far* more records after the last one above) I used 'amazon' as the search term.
I'd suggest it's very unlikely that Amazon caches search results for a week
$ cd ~/.cache/chromium/Default/Cache/Cache_Data
$ fgrep amazon * -l > amazon.txt
grep: index-dir: Is a directory
$ wc -l amazon.txt
588 amazon.txt
$ fgrep amazon * -l | xargs ls -ltr
grep: index-dir: Is a directory
-rw------- 1 alexk alexk 6640 Jan 14 10:40 a25a4684dc578add_0
-rw------- 1 alexk alexk 6283 Feb 16 11:25 53cb04645ec61dbe_0
-rw------- 1 alexk alexk 15666 Feb 17 23:58 e8f89e2a5b7a01f1_0
-rw------- 1 alexk alexk 13867 Feb 17 23:58 886a5cd11ba0631f_0
-rw------- 1 alexk alexk 5874 Feb 17 23:58 316a7542b7befa08_0
-rw------- 1 alexk alexk 8924 Feb 17 23:58 0178e5420f91ea0d_0
-rw------- 1 alexk alexk 7659 Feb 17 23:58 ad11df88c0edb21b_0
-rw------- 1 alexk alexk 10943 Feb 17 23:58 548924d727f4b76c_0
-rw------- 1 alexk alexk 6091 Feb 17 23:58 3f2cf4a8f4ed3da1_0
-rw------- 1 alexk alexk 11344 Feb 17 23:58 5bee7426f918804c_0
-rw------- 1 alexk alexk 7267 Feb 17 23:58 5602f4c5219938bd_0
-rw------- 1 alexk alexk 14755 Feb 17 23:58 847822115a578d5c_0
-rw------- 1 alexk alexk 5492 Feb 17 23:58 421a319a89da0977_0
-rw------- 1 alexk alexk 7717 Feb 17 23:58 21287a7168c435cf_0
-rw------- 1 alexk alexk 6705 Feb 17 23:58 467f6b3bbdaba59b_0
-rw------- 1 alexk alexk 11348 Feb 18 00:09 fe2bc889ad53c7e8_0
-rw------- 1 alexk alexk 12090 Feb 18 00:09 b9868882a8c66b57_0
-rw------- 1 alexk alexk 5113 Feb 18 00:21 705dafad26790491_0
I'd probably try the python one first.
python package to retrieve (almost) any browser's history on (almost) any platform
I'm interested in neither the History (which I can obtain with a simple Ctrl-H) nor the Bookmarks, so fail to understand the point of installing that. I want to be able to view the historic pages in a browser, not the History.
Same with the askubuntu.com question:
Is it possible to view Google Chrome bookmarks and history from the terminal
The History is a binary file in SQLite format 3
Thanks for trying, Andy, but you appear to have misunderstood my query.
Have you thought about using a different browser to look at the cache?
The Chromium cache is an encrypted binary mess of connected directories (specialised JSON for the ones I've looked at), with not a single html nor css file within them. Can you suggest a browser that *can* view them?