webkfs - k.'s filesystem

network scan analysis tool

2025-04-15: mycene, sec, code, source, networking, projects, idea-tripping, ascii4lyfe

first tool in scanning pipeline toolkit - mipdb - working PoC.

partially cleaned up code, should build anywhere unix-like with no deps (uses posix threads), wrote useful-ish README, and made repo public:

mycene on github (yesyes github not good, MS policies, etc.)

Has useful diagram (ascii of course), general dataflow + components

I refer to this ongoing project in various ways in my ramblings down below.

Diagram from mycene README included here

Owning                                                                         
Entity:     DRAFT FLOW SUMMARY:                                                
            ===================                                                
                                                                               
           ┌──────────────────┐                                                
User       │Define scan scope │                                                
           │(IP range, ports) │                                                
           └───────┬──────────┘                                                
                Data: configs                                                  
           ┌───────▼──────────────┐                                            
Workflow   │Auto provision VPS,   │                                            
mgr        │provision rentable IPs│                                            
           │if needed             │                                            
           └───────┬──────────────┘                                            
                Data: configs                                                  
           ┌───────▼────────┐         ┌──────────────┐                         
Workflow   │Phase1 scan:    │ Import1 │Multistage    │                         
mgr        │Masscan, sharded┼─────────►Modular       │                         
           │TCP SYN only    │         │Results data  ┼─────┐                   
           └───────┬────────┘ ┌───────┼Analysis & DB │     │                   
                Data: IP lists│ Provide  * mipdb PoC │     │                   
           ┌───────▼──────────▼┐      └──▲───┬────▲──┘     │                   
Workflow   │Phase2 scan:       │         │   │    │        │                   
mgr        │nmap, service      ┼─Import2─┘   │  Import3    │                   
           │discovery incl. -sV│             │    │        │                   
           └───────┬───────────┘             │    │        │                   
                Data: IP lists               │    │        │                   
          ┌────────▼─────────┐               │    │        │                   
Workflow  │ Phase3 scan:     │               │    │        │                   
mgr │     │ nmap, scripts,   ◄────Provide────┘    │        │                   
    │     │ more details tbd ┌────────────────────┘        │                   
    │     └──────────────────┘                             │                   
    │                                                      │                   
    │                            ┌─────────────────────────▼────────┐          
    │      ┌────────────────┐    │mipdb: scan results DB & Analysis │          
    └─Is───►Workflow manager┼────► * PoC implementation exists      │          
           ├────────────────┘Uses└──────────────────────────────────┘          
           │  ┌────────────────────────────────────────────────────────────┐   
           │  │Scan orchestrator                                           │   
         Manages                                                           │   
           │  │ - masscan mgr (phase1: host & port discovery)              │   
           │  │ - nmap mgr (phase2: service identification & banners)      │   
           ├──►            (phase3: service probing, version checks        │   
           │  │             via nmap scripts)                              │   
           │  │ - selective re-scan (some kind of modular phase thing)     │   
           │  │ - fetch results, discover remote files if can't find easily│   
           │  └────────────────────────────────────────────────────────────┘   
           │  ┌───────────────────────────────────────────────────────────────┐
           │  │Scan host provisioner                                          │
         Manages                                                              │
           │  │ - right now 10-20 VPS manually created (hourly billing),      │
           │  │   some bash & rsync to set up masscan with shard option       │
           │  │   (VERY useful option, i don't hear enough ppl using it,      │
           └──►    very easy just need to share same random seed)             │
              │   and results file fetch via ssh/rsync (iirc ssh - sftp)      │
              │ - it's simple to define sequential hostnames (s1-s16 e.g.),   │
              │   tell masscan which shard it is easily then, and fetch e.g.  │
              │   s1.bin (bash iterates and fetches)                          │
              │ - I want a proper tool, not this hacky stuff, though          │
              │ - and most VPS have API - e.g. here I used scaleway           │
              │ - and so I want to try out their API                          │
              │ - and include its client as a plugin thing                    │
              │ - so that you could easily add your own using simple interface│
              └───────────────────────────────────────────────────────────────┘

pending writeups queue

2025-04-14: blog, kfs, idea-tripping

have started digitising notes on writeups documenting either stuff already done (rare but extant examples), in progress (isn't all life), and also mere unrefined but too-attractive-not-to-ramble-about-it ideas. some of these i don't plan to explore any time soon (except for writing thoughts in more detail) so i have a bit of hope that maybe someone will get inspired and pick some up (and then tell me how it went!!)

mostly rather techie stuff for now, but there's gonna be a wider range. prolly. no but yes.

jump to that list (subset of all TODOs; see table of notes & TODOs categories)

p.s. note i'm trying to process other backlogs of stuff (not just "this recent list from paper"), also noting down new stuff which should be easy to prototype, e.g. audiovisual experiments sandbox + constrained LLM (ideally, pipeline of LLMs (or internally run NNs with constant feedback between author and NN; see: prolly the first artist i truly am fascinated by - Memo Akten (in a "moved deeply, this is how art should feel like, an almost violent jolt and perspective shift" way))), with varying (by default decreasing) sizes of allowed output-space (ideation -> shader lang code (+ invoked helper built-in funcs) -> subset of shader code + toggles -> toggles mostly)) (read and look and listen then check all his other stuff). basically: hyperparameter tuning in high dim spaces within feedback loop between authors and the high-dimensional latent spaces getting tuned.

painting

2025-04-14: externalising, painting, activity-ideas, recommendations

Zed Shaw, a dude i like quite a bit (you may have heard of him from his Learn Python/C the Hard Way tutorials for beginners, they are quite excellent (and he forbids you to copy/paste! hence hard; it's good; successfully recommended to multiple people, saw some good outcomes from this)) made a seven hour long youtube video teaching how to draw. i trust him and plan to actually slowly go thru this (my drawing skill is at a tidy 0.0%; this should actually be great, seriously)):

7 Hour Introduction to Drawing for Programmers and Other Total Beginners

why

2025-04-14: blog, kfs, externalising, rambling

i had planned to leave this whole page starting with welcome as-is, before i implement my weirdo filesystem abstraction layer for blogging/web: a nice distraction and typical low key narcissistic childish nerdy need to show off custom (if overblown, and certainly overcomplicated) infrastructure for web publishing ("because i can"); leaving myself in limbo until later, and in a soup of simplistic sarcastic low quality performance art. (for a simple example (and convoluted explanation what i mean by my filesystem abstraction layer), see below, esp. look at ascii diagram. i mean, not great but no need to berate myself, and to pretend i do not enjoy this.

but then (elinį sykį) i stumbled upon Gwern's website, and in particular: his About This Website. in particular: Long Content:

What has been done, thought, written, or spoken is not culture; culture is only that fraction which is remembered.
(Gary Taylor qtd. by Gwern)
[...] the best blogs always seem to be building something: they are rough drafts—works in progress. So I did not wish to write a blog. Then what? More than just “evergreen content”, what would constitute Long Content as opposed to the existing culture of Short Content? How does one live in a Long Now sort of way?
It’s shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult. Muad’Dib knew that every experience carries its lesson
(Herbert qtd. by Gwern)

My answer is that one uses such a framework to work on projects that are too big to work on normally or too tedious. (Conscientiousness is often lacking online or in volunteer communities and many useful things go undone.) Knowing your site will survive for decades to come gives you the mental wherewithal to tackle long-term tasks like gathering information for years, and such persistence can be useful

i mean, nothing new i suppose. i'm trying to start ramble-blog precisely because it is a (1) thinking and (2) thought refinement tool, as pompous as it sounds. and nothing new with what Gwern wrote; maybe too somber and dramatic, even. but i find myself in a more urgently sudden state of mind, i need to start sooner, and to develop habit; even if most text looks like this, at least for a while, hopefully interspersed with half-coherent terse compressed ideas / notes to self. it is better than what i am doing right now; which is nothing. and memory from written thought has helped me therapeutically before already, and so i must start so i start layering it in, so to speak, as soon as possible.

by the way, the instance when it has in fact helped me before was when i managed to have (if very infrequent) dialogue with myself over written letter form; timestamped honest-to-god letters in a google doc. it really did feel like i was rekindling and finally more attentively developing a relationship with myself, and layered in written retainable memory. i could recall states of mind much more easily, and so in fact develop an actual dialogue; not an overstatement.

i heartily recommend this to everyone; and i need this now myself as well.

p.s. (2025-04-15): feels like i want to add an actual conclusion explicitly here: and so upon stumbling on Gwern's take on why to blog, why long form (and why he opted for his website in its raw form to be a bunch of human-readable Markdown text files (=> longevity)) - i felt a pang and a more sudden need to start right now, to start emitting words however chaotic at first (or forever): to not delay. an actual sense of urgency; and conviction of feeling like knowing precisely what my mind and body need right now (just like with those letters to myself - i don't know i just started - not letting "intellect" to catch up to interfere by doing its post hoc rationalisation (with optional overthinking) thing. no reasons for/against, those can help, maybe much later, but not now; not in my case at any rate.

welcome

2025-04-12: blog, kfs, idea-tripping, externalising, rambling, nerding, ascii4lyfe, homelab, fuse

dear god where am i?

you are at kfs.mkj.lt. kfs should be pretty pleasant to type in on most keyboards. that will do for now, i think.

(p.s. stylesheet was picked up very promptly, and i mostly chose it cuz it was called simple.css; you get the idea).

no but seriously

you are here:
parent "personal" website (should link to LI / migrate / burn)
-> a place & server hosting assorted thingies
-> khome - my actual home (homelab/rabbithole) every refresh heats up my living room
-> webkfs (custom web-filesystem development project serving the below)
-> kfs - personal website - this place right here, a humble mindspace served from home as well

but what is here? please see below for a humble fully clarifying diagram.

NOTE: the visitor's location is most helpfully marked with an asterisk '*':

mkj   ->
khome -> you are inside a raspberry computer 
      -> caddy (very nifty webserver software) -> fs experiment -> 
      -> DIY blog filesystem webkfs
      -> /home/k/kfs/webmount1

do you like being inside a computer?

+-khome.mkj.lt / AS8764 / boring residential network--------------------------+
|                                                                             |
|  a smallish apartment in vilnius @ baltics  +-debian arm64-RPi5-----------+ |
|                                             |  small-ass raspberry pi,    | |
|     ~some socks, a plant~                   |  stabl-ish pw supply <= 27W | |
|                                             |+-caddy server-------------+ | |
|+-------------------------------+            ||-custom fusefs experiment-+ | |
|| pending ideas & words @ brain |-writing?-->||>*DIY blog filesystem[why]| | |
|+-------------------------------+            ++--------------------------+-+ |
|                                                                             |
|                                   * legend: [why]: it's cool? masochism?    |
|                                   / weaponised denial on a spectrum         |
+-----------------------------------------------------------------------------+

there's a word writing in the middle there :o

so i have decided to start a blog, finally.

the logorrhea and mental health of my friends have to be accounted for, and so i need an outlet; a healthy outlet.

however: a simple non-OCD normie approach simply won't do for me of course (my plan instead is hastily outlined below), and so:

this will happen when it happens. however, this page will make it slightly more difficult for me to wiggle out of it - in fact consider this "welcome" post my f1rst p0st - and so hopefully i can co-opt my ego into avoiding bruising self and into aiding me with this convoluted quest. hopefully i do not in fact wiggle out of it, and hopefully i do start a good new habit. i will for once appreciate hearing about how i should get on with it: a measured dose of that violent but almost too-cozily-well-known homely-feeling good old guilt-trip may serve me well here. ask me when do i actually update this thing. thank you kindly in advance, i trust my friends will support me on my new fresh navelgazing quest.
p.s. goes without saying that i can host for free (and more reliably than on RPi (tho where's the fun)), maybe you should put something online as well.

thanks for reading and see you when i see you!

~~~

below follows a hasty writeup of how i intend to proceed, sorta. it's a semitech ramble.

last few bulletpoints (click me) include general non-techy summary and point.

simple approach won't do (because: i don't wanna) - so to do the thing i have to first do the thing:
i shall self-host (as in at home on the ground, not in cloud; not just in terms of "will do maintenance 90s style")
- current gear:
  - an old overkill noisy Dell Precision 7910 Rack w/ 2x Xeon E5 2658 v4, and an overkill of new ECC DDR4 memory (and fresh NVMe disks), for clouds at home and ??; thinking ephemeral firecracker VMs on FreeBSD; ideas welcome; maybe you have some cpu-heavy project
  - ODROID HC4 Plus with an i3 and NVMe disks - for future personal ASN/BGP misuse and misconfigured networks
  - this RPi5 with a broken fan, running like a champ
  - other random stuff (unused 2nd RPi and idling el. boards etc.)
  - this is running on RPi, will likely migrate to HC4 unless i start using latter as an actual router
- see last bulletpoints below starting with "tl;dr" for summary (below this technical tl;dr right below)
- tl;dr plan: (semi-tech-sidequest, note)
  - part 1: web as dynamic filesystem (hence you know kfs (wow))
  - implement a small userland fusefs driver/util which provides dirs & files dynamically
  - think: `some-note.md` -> `/some-note.html`, links in `/index.html`, RSS feed, etc.
  - but also -> `/2025/some-note.html`, `/2025/04/some-note.html` (SEO whatever don't care), maybe .txt
  - but also -> `/search?title=note`, `/search?content=some+words`
  - but also -> `/tags/notes+quick+politics` (p.s. `ln -s tags/notes+quick+politics polnotes` => structure thru directories, symlinks, easy metadata by shuffling files around in one's file browser or terminal; base struct review => `tree .`)
  - user wants to put under category `notes`: they create dir `notes` and move file into it - or symlink it; that's it (hey that's cool, i want this)
  - recurring point/theme: user is able to understand overall base struct of their website by looking at file/dir layout and changing it with their own directory management tools; structure is self-evident; layout is definitive source of truth (regardless how it's implemented and driven underneath)
  - can still find under `/some-note.html` by default but also under `/notes`
  - content gen: `header.html` gets prepended, `footer.html` gets appended
  - but also: `custom-post.go` gets run, output cached and served
  - `/middleware/stats-and-stuff.go` gets run
  - stuff can be invoked from within templated text / markdown / html, Rust mdBook style
  - ```
  `echo "Why not invoke shell: {{ $contents := $(grep str somefile | sort | uniq) }}" > unsafe/writeup.html`
```
- and so on, isn't that fun
- if abstraction is irresistible: generalize content transformation interfaces so that one can plug in their own static content/site generators, etc.; anything from elaborate sitegen to `{{ regex(...) }}`
- manage site structure by moving directories and files, as in, one can dragndrop
- probably still get pwned, at least get ppl confused with inode mess
- part 2: deploy (PoC: here, behind webserver set to serve files; nifty FT/HA later/unneded
- part 3: realize no excuse to avoid writing and now kinda hafta
- part 4: put (optionally clean) old notes (decent-ish) because at least now it's so easy (just copy files; thing should sort things out; this may actually be useful to some people with bunch of txt/md)
- part 5: establish discipline / habit of writing in a structured manner - even if it's not good at all - very important
- fediverse / bluesky / mastodon integration and/or linkage and/or some presence; but i want space for peaceful, tidy, longer writeups even if nobody ends up reading them; it will do me well and my calmer mind will thank me for it i am sure
- part 6: clean up code and publish this thing itself as well (ideally sooner than last step)

wtf do i have in mind (geek version):
- we have all these fancy static site generators
- why not make webserver serve static content but make the filesystem make it static'y?
- sort of turn table upside down; all non-static parts are pushed way upstream onto FS layer; which itself can act as fire-once content generator, outputting rendered static content persisted to disk
- can still be backed up, synced, served as a bunch of static files, or even .gz/.zip archive thrown anywhere
- stuff gets cached and loaded from memory, with full text search indices etc.
- fully backed by actual underlying FS (and/or like a single sqlite file (see: sqlite fuse fs, exists)
- ideally tho util can act as transparent proxy towards regular FS - should we use e.g. mmap?
- (recall also that new shm / mmap stuff is coming to linux kernel
- also TODO review MAP_SHARED mmap type, mlock, shm_open and esp. memfd_create()
- things must make sense and persist if fs server util suddenly disappears / crashes
- some kind of nifty wrangling of symlink and mountpoints, must be able to read/write if fs program exits (backup: watchdog)
- out-of-band additional attr management? (file/node attrs or _.metadata implicit files or other bad ideas)
- can go forever with ridiculous stuff (more ridiculous;
```
`echo "finish this poem: roses are red," > /auto-llm/poem.txt; cat ..`
```

~~~

slightly less chaotic TL;DR:

tl;dr: it would help my own sanity if i established discipline of writing things even if shitty things
i have committed myself onto this path finally, however, i will elaborately procrastinate by creating a very unnecessary blog-as-fileystem abstraction layer first because i just really feel like it
the world will have to wait for my genius outpouring of enlightened ideas
thank you

unimportant NOTE: this is the end of, uh, "posts", below are as-of-now-ad-hoc "sticky" articles which are likely to keep growing for the time being. will likely chomp off sans 1st para or so and move to separate web pages. that said, it is nice to have the whole mess in one place (at least as an option - for sitegen this could be nicely customizable -> aand that's a new TODO)

reminder that jumping around with anchored links makes this a bit easier

~~~

misc. unsorted notes

see html comments for disclaimer. some of these one may regard as contents of as-yet-unclassified unprocessed temporary mental buffer.

jump to: yeah i need to autogen this

name & link	category	tags	prio
todo-blog	pending shortish term writeups	TODO BLOG WRITE NOTES	`med`
todo-misc	unsorted TODO	TODO MISC VAR-ENTROPY NOTES	`low`
notes-misc	misc quick notes	MISC UNPROCESSED VAR-ENTROPY NOTES	`low`
important-reminders	high priority reminders duplicated from Cal	TODO CALENDAR-SYNC IMPORTANT REMINDER NOTES	`high`

todo-blog - category: pending shortish term writeups

tags: TODO BLOG WRITE NOTES
modified: 2025-04-15 created: 2025-04-13 prio: high

niftier new thing to review before i forget: not that difficult: use embeddings lib to construct and represent whole site as latent space, and then in each article and note have auto-gen notes and eventually small map to explore similar concepts -> leading to other posts. i'm sure there's some halfway-there lib for it etc., but would be nice to do this cleanly and properly; and eventually with enough content actually useful, and enables very nice discovery and relating node for others and myself. ok good i was about to forget this one for a year.
TODO expand on each of the below, have notes; but it's a start digitalising all this stuff
quick temporary list to be filled in from paper notes:
NOTE: for more (and perhaps newer) non-physical-medium-originating ideas / stuff which i may be noting down meanwhile as well, see notes-misc section below; and maybe todo-misc for blog-related work some of which is also ideation related. re: the former, some of that will be random neural spikes. either way though, i would be interested to hear of any efforts to validate some of this stuff.
1. blog filesystem / kfs
2. mipdb - multithreaded masscan scan importer, parser, whole ipv4 0.0.0.0/0 space analysis from TCP SYN scans
3. i've already done (re-done actually, did that once in 2016 or 2017) /0 TCP SYN scan on top 25 ports, and wrote nifty analysis tool, uses bunch of memory (but each portmap is bitfield, for each ipv4) but is verrrry fast; i want to do proper prefix trie, include AS ranges, and after cleaning make repo public; it does have some promise i think
4. masscan-orchestrator (in this case not done except some bash scripts to mass-deploy via ssh, then rsync and collect results) (ideally multistage masscan for discovery (autoprovision & deploy to scaleway vps cluster) -> nmap for service ident and banners -> nmap scripts for vuln enumeration, CVE checks)
5. plans for mipdb future, enhancements low and high level
6. ip/net/sec/bgp info lookup as web service (with some notes on bgp, and questions)
7. dwarf fortress psychology mechanics - paste and comment research output from gpt4.5 - with nice diagrams it generated
8. general compressed input/output a.i. chatbot approach comments, some pending side notes on entropy from infotheory PoV ,
9. ontology tool
10. homelab review, maybe live stats
11. ip cam streaming RPi / odroid for funsies
12. ssh-chat is running on this chip, do ssh in here but also - put in ssh webclient here and autoconnect
13. ferrofluids (review output from quick linux compile)
14. ferrofluids on webgl / html5 canvas?
  - actually yes
  - very nice web demo! (nice shader mmmmm) (microphone input very nice) (NOTE: need to whistle specifically)
  - author's post about it; nice articles looks like
  - source
15. RTLSDR, software defined radio list of experiment ideas, comment on satellite feed reading concept / PoC; also if interested btw it really is cool stuff
  if interested re: satellite feeds btw: How to Pull Images from Satellites in Orbit
16. list of novel compression approaches / ideas; i have a queue of arxiv articles and actual novel feedback from gpt4.5, i want to actually stuff like - ok i will paste in headings for each subset of compression ideas, i do want to nicely format and include full details, if not myself maybe someone will look into some of these where there's still novel ground to explore / quickly prototype to check viability:
  1. Generalized Quantization via Statistical Bucketing (Entropy-based Binning)
  2. Hierarchical Entropy Reduction (Tree-Structured Compression)
  3. Frequency-Aware Randomized Hashing (Bloom-Entropy Hashing)
  4. Lossy Embedding via Generalized Autoencoders (Entropy Autoencoders)
  5. Adaptive Fourier/Wavelet Entropy Filtering
  6. Stochastic Data Thinning (Entropy-based Sampling)
  7. Entropy-guided Dictionary Encoding (doesn't sound novel at all but context was lossy compression actually; and idea here is to "keep dictionary entries selectively by evaluating entropy reduction rather than solely frequency.")
  8. some other assorted stuff
17. more FUSE FS ideas
18. VOXLISP; online voxel sandbox / game-like thing with lisp repl and "everything is code" lisp-ish approach, think programmable tower defence / wannabe-mini-DF
19. actually there's a bunch of stuff and pending-refinement thoughts and musings re: above; i have a very primitive crawler-with-resource-gathering in Go, for prototoyping; writing things down will be good
20. gentle self-scanning botnet, inspired by Carna / Internet Census 2012, but more modular and with incrementally more added stuff for gentle knocking
21. on that topic, some compiled executable obfs thoughts; landscape itself is insane tho, so much complex stuff
22. wide range netowrk / internet visualisation ideas - a bunch - incl AS hierarchy; have collected a bunch of stuff incl from BGP sessions (!) - it's good - i want to show one quick prototype; enumerate bunch of vis ideas; and invite ppl to explore CAIDA among other places
23. find OLD whiteboard pic, before current state; had some nice ideas there, need to put them in digital form
short term writeup queue: (not committed, liable to change/disappear)
- item1-test
- digitalize from that stebuklųspa paper, soon, so i can start staring at those things, feel a kind of prickling need to push them out, interesting
- now started; expand on initial quick enumeration above - merge with paper notes etc.
- reminder; i jotted down those ~10 frustrating writeup items (maybe ~14?), will need to sit down and re-recall if can't find source; was good quick list with clear mind aiding
other blog-related sidequest-ish stuff:
- actually one of initial intentions: i have >100 tagged .txt notes with titles, what do?? - want text search, tag structure, realtime semantic search and structure / similarity exploration, persistence, prettification and selective sharing
  likely don't want to wait until post-PoC prod-readiness - (1) want to engage soon and (2) it will guide me into useful functionality and scope, praxis will show the light
- check which laptop/machine/dropbox-old-share has archive, need newest one
- start playing with export and iterating
- define TODO struct for kfs fusefs PoC
- re: writeups (oldest TODO/NOTE before nesting above) what do: enumerate the list here, wrote on paper in Nida, should be in backpack and half-legible
- what do: nesting: decide how to add them here (one by one? and - see next)
- should we have TODO/any-note hierarchy / tree (e.g.: parent pending-blogposts -> enumerate pending ideas above as child nodes)
- and while at it - any very-short-term solution for static content gen, or do i just keep writing this manually?
- consider writing quick PoC parser of this weirdo index.html (into initial indices if possible; but at least nice deserialisation of some kind, and initial structure validation)
- quick ad hoc offsite backup solution before this thing melts with its chip on my table
- quick PoC rsync backcup script, with at least basic sanity checks (enum file tree, check size, check size of each html, grep for some canary values, idk at least 10min worth of effort before i lose my stuff
- model definitions (and general concepts actually as well...):
  - define model: category (what do, keep, is it each item has-1 cat, or many to many; need?; constraints?; etc.
  - define model: tag
  - nesting / hierarchy / recursive trees yes/no/depth/constraints/cycles/DAG re: (1) note (2) post (3) article (4) page (5) category (6) tag; prio on (1) note (5) category (6) tag but mostly on (1) note
  - accept interim idea? - yes to note nesting with max depth of 4 FOR NOW (initial constraint for WIP status for now)
  - enumerate rest of concepts, think of relationships and whether they are infl/constrained by fusefs plans (e.g.: i want multiple hardlinks able to point to same fs inode; hence multiple subtrees enumerating and traversing same edges (e.g. notes) differently)
  - re: above, need general fungibility in general (but this comes with minuses of unconstrained crazy malleable graphs)
  - big one: eventual definition (+/-, actionable) of overall SCOPE of functionality of later non-PoC deliverable.
    
    good example: really need both (1) static sitegen (output == static artefact, like static site style) and (2) intermixable dynamic upstream? i mean the whole idea was to have at least (2) which is the crazier but for me the main attractive part; (1) would be "proper" to have and may come to be more useful to wider audience; so q is whether to drop (1); and i think (1) is actually good candidate for initial PoC-v2 deliverable, before our cool amazing (2); so it makes sense planning-wise
    
    but this is a good example how up in the air even a very abstract notion of scope of eventual functionality for me is; so important point and WIP
- sitegen crappy PoC to validate idea
- sitegen simple multi-view html gen: single homepage vs split files (e.g.: pinned "articles" - only headings) - have both?
- later: general definition for modular plugin-able sitegen arch, will block proper decent implementation; how generic? reuse that JS fusefs proxy sitegen tool ideas? (does not appear that 'abstract')
- need to define initial PoC AC & expectations as well actually (e.g.: simple but general sh pipe (exec -> capture stdout) impl? already for PoC? maybe yes? need care because of amazing attack surface)

todo-misc - category: unsorted TODO

tags: TODO MISC VAR-ENTROPY NOTES
modified: 2025-04-14 created: 2025-04-13 prio: low

investigate maple trees - data structure for vmem / searchable kfs filesystem (maple or r-b / prefix tr* plus hashmap + ? => 2-3 DS for substring search on whole FS tree - me want/need for kfs project; but also for THIS project prototyping > overengineering here (not ">>");
because i need to validate if viable before worrying about cacheline locality vis a vis lookup/sort; push up internal assigned value to "practicality" regarding PoC - yes useless abstact shit is cool and makes it worthwhile living but i do need a sense of finality re: this project - need to keep reminding this to myself)
yanked from my own rambling para somewhere around here - but kfs-wise kinda important::
review MAP_SHARED mmap type, mlock, shm_open and esp. memfd_create()
what to do with google keep at keep.google.com - bunch of good notes (some must stay private)
what to do with 1000s of bookmarks (as of couple years ago i do rudimentary but sufficient tagging), mostly chrome, some firefox, some pinboard.in
i really want my arbitrary dyn fs, i could start by throwing them in at something like (html may be escaping / cutting off some chars)
```
grep "<bookmark>" all_links_export.html | cut -d'>' -f3[1] | awk if needed.. | grep -v some-exlucsion-filter > raw_links.txt; 
for link in $(cat raw_links.txt | likely-some-further-transform); do id=genUuid4(); path="/mnt/kfs/bookmarks/$id"; mkdir -p "$path"; echo "$link" > "$path/_fetch-url.kfstask"; done
```
(but we could also store raw full html with old tags etc, actually serialise as kfs tags, get all the fun semantics...)
[1]: i mean likely need another `cut` to remove suffix etc, just for illustration (plus implicit confession that i tend to overuse `cut` vs say some sometimes very simple regex...)

notes-misc - category: misc quick notes

tags: MISC UNPROCESSED VAR-ENTROPY NOTES
modified: 2025-04-14 created: 2025-04-13 prio: low

one of the best audiovisual things i have seen and something i would love to aspire towards - Norman McLaren - Dots (1940) (less than 2min)
compression and fulltext index stuff: check Enhanced suffix arrays vs regular older suffix arrays (as seen in golang pvz.);
- in particular Li, Li & Huo (2016);
- also https://arxiv.org/abs/1710.01896 Dismantling DivSufSort
entropy-bound indexable dictionaries (EIDs): good paper; also: all papers linked from wiki on Wavelet Tree
related good walkthrough TOREAD
audiovisual sandbox with optional constrained / directed / parametric LLM or internal NN assistance: this i do want to iterate (by using LLM itself...) and write about a bit, for now noting for self:
(click to expand) a long bulletpoint list of what i have in mind; keeps expanding; but i think i should first explore vvvv, it truly is an amazing piece of magic and engineering, imho.

explanation, notes to self, open questions etc. etc.; i use highlight for in my opinion key interesting points::
1. p.s. later added note: with all this LLM crap i keep forgetting we can all run good old NNs including deep recurrent NNs, etc. - and there, we can set up proper author-NN feedback loops, like in Ultrachunk (2018) - i will now need to go thru everything Memo Akten has done, but no regrets, i am officially inspired
2. there are some e.g. web based sandboxes and toolkits for quickly iterating on (with immediate feedback - the way it should be) audiovisual experiments
3. i would very much like to either find, adapt and expand, or implement mvp / PoC version of a simple clean one which allows for:
4. shader code for visual input,
5. optional: actual code (e.g. Processing) for visual (or Lua repl; or Lisp/Scheme repl; tie things up with shader; repl on the side would be great either way) (if open code, then maybe Lua unless there's some decent embeddable Lisp dialect with the kinds of funcs / lib that would be sufficient; would be good to have ready-to-use efficient audio data parsing, chaotic systems (i guess just basic math with some complex numbers, worst case can have a generated auto-importable implementation...)
6. audio input thru file and mic at first
7. transformation into visual using above, parsing etc.
8. built-in tools (invocable etc.) like beat / bpm detection (exists; for WebGL as well; nifty); what other tools?
9. general prepared functions, building blocks, with simple clear interfaces
10. ideally other audio input as well: simple proceduaral gen would be nice - simple sin() etc. waves and simple transformations at first
11. recording of all steps (btw: side window REPL which autofills with actions from buttons / etc. would be nice), deterministic (random seed recorded as well), all ops auto serialised, saved, can cut them paste them as well etc.
12. this is all nice but hopefully the kicker: optional LLM func as well - but by default the way it is integrated is - it is told that its output is highly constrained - in formal (auto exportable easily, schema etc.) language / subset of actions
13. possibly multiple LLM output options, most open and complex one is - shader language; but still, it's just shader lang + it being told "also these helper methods exist: [a..(int inarg ..) -> out type etc.., b..(..) -> .., ..] ..
14. optional later feature ideally - chaining of LLM models, not just LLM, e.g. one paid model, elaborate, e.g. GPT4.5 or the likes, possibly in API call temperature set such that creativity increased, idea is initial broad search in solution space, dreaming vibes encouraged
15. THEN, simulated annealing / structured pipeline style output is refined and structured, and model (perhaps different one) is used with lower creativity more precision - to do this and that - and finally ideally
16. (much later, after many functional iterations and fun, if this sandbox is actually properly developed) eventually, a specifically dedicated trained model is used with optional feedback / backpropagation - can be user based but also automatic - tool in some cases is able to quantify how well model did depending on what it had to do - these metrics are sent back for iterating better
17. dunno re: auto quantificaiton, maybe too difficult, i can think of convoluted auto feedback measure ideas - e.g. if task is marked (or auto discovered) as bpm / rhythm focused visual feedback - then rate of change of visual canvas (at first, number of pixels changed; later, combine with distance in colour; then maybe change in entropy / noisiness; then, parse contrast change rate, "radicality" etc.; detect out-of-rhythm change, etc.; but likely mostly depend on user feedback - which can be quantified of course, in more than one (how much like) dimension - e.g. as above, "beat responsiveness"; but also ability to explore the whole newly defined (by toggle(s)) phase space by itself; and LLM (and perhaps part of same prompt, or right thereafter, but overlapping context window) is asked to give ways to calculate metrics for x/y/z (user is used for some sanity checks? but system can optimistically explore with it anyway)) - would be amazing, auto feedback how to measure certain metrics -> tool auto explores phase space with metric -> can now automate back and forth feedback process, ideally? not easy, buuut this way with some user's help (esp. for some sanity checks and validation - what is nice user just needs to look at screen and confirm if "this part of mesh does indeed wiggle more than before" - nice) - can kinda keep iterating on many specific and general requests; LLM and user can both be consulted to help system catalogue new metrics in terms of how "general" / reusable they are - sometimes they will not be generalisable at all (that's ok, i will want system to record journals of everything anyway; eventually btw all of that can be fed into some big monster LLM / ML system as well...)
18. what toggles is LLM given (besides open berth shader lang with some preset funcs to help out)?
19. would be nice to auto detect procedurally "toggles in current shader or general code" (hmm this is a cool idea, Bret Victor inspired likely, his responsive interfaces, so good) - and then (1) offer sliders etc. to user based on data type, (2) on button click "explore / fuzz", auto toggle them and move around (perhaps record rate of change / distance metric, gather user feedback as well), use THESE auto discovered toggles in code as constrained levers / output options to LLM (of course as an option; and gather feedback from LLM as well whether current state is even useful to iterate on)
20. record and remember output state, user feedback and satisfaction, but also above metrics incl. entropy, rate of change, colour intensity change e.g., etc. - depending on LLM generated output (which acts as input and changes state / puts system into altered phase space). use THIS data to later train a small model ourselves (later automatically), teaching our model while using larger model (kind of inspired by all the recent research work). and in general, aim to find various ways to compress and understand the very multidimensional space of "how to do audiovisuals" - so that the tool can eventually start to be able to process (tagged, quantifiable / easily vectorable etc.) prompts (which the higher-creativity large LLM is to handle at first - and that's what i want at first - this LLM through API processing open requests like below, us seeing immediate feedback, again Bret Victor style, and iterating on it:
21. generate 3d interesting mesh visualisation responding to high bpm song [song profile e.g. compressed/small FFT / freq distribution can be provided as input, what other such audio summary inputs?], with focus on intense colour change, and a way to parametrize colour pallette so i can iterate on it (LLM is auto-told / constrained to give output in shader language v2.., with this short list of optional helper methods and vars (func sigs given), plus x/y/z toggle if any
22. could this work from LLM perspective?
23. are there more optimal way to structure LLM output format? ask LLM how better to compress it, and how much is good to constrain it (e.g. in cases when we want only change in some number of scalars / levers; given it operates in tokens... and does the high dimensionality even work here - i guess yes, still very much used in the middle of processing, but is it good to constrain output state space this much?
24. one more ramble on dimensionality reduction pipeline and also just going with open thought how to use this + various feedback loops:
  so a kind of annealing of: open-solution-space-high-creativity "generate beat-responsive 3d mesh with high focus on colour contrast changing, expose palette as var/toggle [+ tool adds constraint on output in shader lang + helper funcs]" -> parametrizable vars in shader code auto turned into toggles for user and auto fuzzing / exploration -> explore -> then (with feedback incl. quantifiable (-> error backprop)) "smoother mesh wiggle with big wiggle if low frequency beat" [optional compressed freq distribution plot can be added -> also ask (auto?) for metric "how much wiggle" and "how low the freq right now" -> (auto)plot, and also ideally get params to change this -> auto explore new phase space of how-strong-wiggle and how-low-freq-over-time; this auto-parametrization flow, with back and forth iteration actually sounds lovely in general, for other kinds of work; but i think still this kind of medium still good to start exploring this kind of approach because of focus on (and intensivity regarding) immediacy of feedback on input change, and how important this is. and we optionally have highly-dimensional spaces - multiple high-dimensionality ones (audio; visual; can come up with structureed shape-spaces etc.) as a good medium to explore and validate various (in very abstract general terms) compression approaches (dim. reduction, tho i know it's not the same). since i'm tripping on novel compression ideas a bit anyway (i really want to explore some entropy-change-responsive algo ideas - some work exists but also seems like there could be some niches), thinking in those terms here a bit, but really so many things in life and nature turn out to be some or other kind of compression related process. and heh, audiovisual art as dimensionality reduction: a case study is a preposterous tagline / subtitle for this tool (i would love it). :)
  
  p.s. TODO reminder to self that vvvv exists, never got to properly explore it, need to check it out, play with it!!
  p.s. ok i'm now starting to play with it and the tool is AMAZING. just look at all the videos, from node interfaces to all showreels. i love how self-contained and (1) very powerful in its generality and language expressivness BUT (2) offering immediate very concrete and many options for immediate output (and hot reload of course!) it is. such power but also immediate actionability. i mean it is a work of deep intense love since at least (looks like) 2002? community should be great; and the thing is going strong still[1]. amazing. i now remember how deeply curious and attracted it made me feel back in like 2008-2009 still in high school then i forgot about it; i want to play with it now, finally. funny how this works.
  [1] re: "active maintenance": (not that this annoying "is it still actively maintained" criterion does not piss me off; implicit in this is error proneness and endless securtiy vulns; and need for new features; there should be more finished-but-usable-and-used tools (for noncommercial and nonprod use; i tend not to need SLAs for non-work stuff cmon).

important-reminders - category: high priority reminders duplicated from Cal

tags: TODO CALENDAR-SYNC IMPORTANT REMINDER NOTES
modified: 2025-04-13 created: 2025-04-13 prio: high

dantų gyd. (2x tasks)

~~~

changelog

not many changes what with this being placeholder; still, for posterity (most recent first):

2025-04-17 tentatively naming this ridiculous future fusefs filesystem webkfs - seems not really taken - local alias will remain kfs - but there's at least one extant ancient kfs (in NetBSD), and iirc at least two old-ish but extant projects on github (could be more; not a very long acronym...); I'll keep the subdomain; will be using the term webkfs to refer to the actual filesystem in the future; and the repository and all internal naming in code will use webkfs. I guess this should do for now.
2025-04-16 QM by GPT4: generated an intro, reviewing chatgpt refinement flow (side-file-edits turns out i don't like for prose / this sort of flow..), recording for posterity and mayyybe someone finds it somewhat useful (it's very surface level few para thing but in terms of few para scope imho it's ok)o
2025-04-15 network scan analysis tool - made code repo public
2025-04-15 some form enhancements; audiovis sandbox + LLM now i really want this
2025-04-14 start adding backlog / queue of possible/intended writeups
2025-04-14 painting
2025-04-14 why
2025-04-13 add misc. notes section (tagged & classified, kinda; gonna have a nice TODO to parse into actual indices...)
2025-04-12 get kfs idea; decide to write placeholder and put it online, screw it; write up all of this, configure RPi + caddy and deploy; rejoice
~Apr 1-10 2025 interest and readup on userland fuse filesystem internals, random FS idea generation (to write: play with varying-entropy compression (s/n ratio aware + file semantics => dyn compress fs ideas); in-memory-key-wiping encrypted FS (no GC!); etc.)
end of ~Mar - ~Apr 2025 procured a bunch of mostly refurbished hardware, quite promptly and with spontaneous vigour and passion
some time early ~Feb 2025 started feeling like resurrecting old blog / random idea bucket persisted on web for betterment/torment of self/others