misc. unsorted notes
see html comments for disclaimer. some of these one may regard as contents of as-yet-unclassified unprocessed temporary mental buffer.
jump to: yeah i need to autogen this
name & link | category | tags | prio |
---|---|---|---|
todo-blog | pending shortish term writeups | TODO BLOG WRITE NOTES | med |
todo-misc | unsorted TODO | TODO MISC VAR-ENTROPY NOTES | low |
notes-misc | misc quick notes | MISC UNPROCESSED VAR-ENTROPY NOTES | low |
important-reminders | high priority reminders duplicated from Cal | TODO CALENDAR-SYNC IMPORTANT REMINDER NOTES | high |
todo-blog - category: pending shortish term writeups
tags: TODO BLOG WRITE NOTESmodified:
2025-04-15
created: 2025-04-13
prio: high
- niftier new thing to review before i forget: not that difficult: use embeddings lib to construct and represent whole site as latent space, and then in each article and note have auto-gen notes and eventually small map to explore similar concepts -> leading to other posts. i'm sure there's some halfway-there lib for it etc., but would be nice to do this cleanly and properly; and eventually with enough content actually useful, and enables very nice discovery and relating node for others and myself. ok good i was about to forget this one for a year.
- TODO expand on each of the below, have notes; but it's a start digitalising all this stuff
- quick temporary list to be filled in from paper notes:
NOTE: for more (and perhaps newer) non-physical-medium-originating ideas / stuff which i may be noting down meanwhile as well, see notes-misc section below; and maybe todo-misc for blog-related work some of which is also ideation related. re: the former, some of that will be random neural spikes. either way though, i would be interested to hear of any efforts to validate some of this stuff.
- blog filesystem / kfs
- mipdb - multithreaded masscan scan importer, parser, whole ipv4 0.0.0.0/0 space analysis from TCP SYN scans
- i've already done (re-done actually, did that once in 2016 or 2017) /0 TCP SYN scan on top 25 ports, and wrote nifty analysis tool, uses bunch of memory (but each portmap is bitfield, for each ipv4) but is verrrry fast; i want to do proper prefix trie, include AS ranges, and after cleaning make repo public; it does have some promise i think
- masscan-orchestrator (in this case not done except some bash scripts to mass-deploy via ssh, then rsync and collect results) (ideally multistage masscan for discovery (autoprovision & deploy to scaleway vps cluster) -> nmap for service ident and banners -> nmap scripts for vuln enumeration, CVE checks)
- plans for mipdb future, enhancements low and high level/li>
- ip/net/sec/bgp info lookup as web service (with some notes on bgp, and questions)
- dwarf fortress psychology mechanics - paste and comment research output from gpt4.5 - with nice diagrams it generated
- general compressed input/output a.i. chatbot approach comments, some pending side notes on entropy from infotheory PoV ,
- ontology tool
- homelab review, maybe live stats
- ip cam streaming RPi / odroid for funsies
- ssh-chat is running on this chip, do ssh in here but also - put in ssh webclient here and autoconnect
- ferrofluids (review output from quick linux compile)
- ferrofluids on webgl / html5 canvas?
- actually yes
- very nice web demo! (nice shader mmmmm) (microphone input very nice) (NOTE: need to whistle specifically)
- author's post about it; nice articles looks like
- source
- RTLSDR, software defined radio list of experiment ideas, comment on satellite feed reading concept / PoC; also if interested btw it really is cool stuff
if interested re: satellite feeds btw: How to Pull Images from Satellites in Orbit - list of novel compression approaches / ideas; i have a queue of arxiv articles and actual novel feedback from gpt4.5, i want to actually stuff like - ok i will paste in headings for each subset of compression ideas, i do want to nicely format and include full details, if not myself maybe someone will look into some of these where there's still novel ground to explore / quickly prototype to check viability:
- Generalized Quantization via Statistical Bucketing (Entropy-based Binning)
- Hierarchical Entropy Reduction (Tree-Structured Compression)
- Frequency-Aware Randomized Hashing (Bloom-Entropy Hashing)
- Lossy Embedding via Generalized Autoencoders (Entropy Autoencoders)
- Adaptive Fourier/Wavelet Entropy Filtering
- Stochastic Data Thinning (Entropy-based Sampling)
- Entropy-guided Dictionary Encoding (doesn't sound novel at all but context was lossy compression actually; and idea here is to "keep dictionary entries selectively by evaluating entropy reduction rather than solely frequency.")
- some other assorted stuff
- more FUSE FS ideas
- VOXLISP; online voxel sandbox / game-like thing with lisp repl and "everything is code" lisp-ish approach, think programmable tower defence / wannabe-mini-DF
- actually there's a bunch of stuff and pending-refinement thoughts and musings re: above; i have a very primitive crawler-with-resource-gathering in Go, for prototoyping; writing things down will be good
- gentle self-scanning botnet, inspired by Carna / Internet Census 2012, but more modular and with incrementally more added stuff for gentle knocking
- on that topic, some compiled executable obfs thoughts; landscape itself is insane tho, so much complex stuff
- wide range netowrk / internet visualisation ideas - a bunch - incl AS hierarchy; have collected a bunch of stuff incl from BGP sessions (!) - it's good - i want to show one quick prototype; enumerate bunch of vis ideas; and invite ppl to explore CAIDA among other places
- find OLD whiteboard pic, before current state; had some nice ideas there, need to put them in digital form
- short term writeup queue: (not committed, liable to change/disappear)
- item1-test
- digitalize from that stebuklųspa paper, soon, so i can start staring at those things, feel a kind of prickling need to push them out, interesting
- now started; expand on initial quick enumeration above - merge with paper notes etc.
- reminder; i jotted down those ~10 frustrating writeup items (maybe ~14?), will need to sit down and re-recall if can't find source; was good quick list with clear mind aiding
- other blog-related sidequest-ish stuff:
- actually one of initial intentions: i have >100 tagged .txt notes with titles, what do?? - want text search, tag structure, realtime semantic search and structure / similarity exploration, persistence, prettification and selective sharing
likely don't want to wait until post-PoC prod-readiness - (1) want to engage soon and (2) it will guide me into useful functionality and scope, praxis will show the light - check which laptop/machine/dropbox-old-share has archive, need newest one
- start playing with export and iterating
- define TODO struct for kfs fusefs PoC
- re: writeups (oldest TODO/NOTE before nesting above) what do: enumerate the list here, wrote on paper in Nida, should be in backpack and half-legible
- what do: nesting: decide how to add them here (one by one? and - see next)
- should we have TODO/any-note hierarchy / tree (e.g.: parent pending-blogposts -> enumerate pending ideas above as child nodes)
- and while at it - any very-short-term solution for static content gen, or do i just keep writing this manually?
- consider writing quick PoC parser of this weirdo index.html (into initial indices if possible; but at least nice deserialisation of some kind, and initial structure validation)
- quick ad hoc offsite backup solution before this thing melts with its chip on my table
- quick PoC rsync backcup script, with at least basic sanity checks (enum file tree, check size, check size of each html, grep for some canary values, idk at least 10min worth of effort before i lose my stuff
- model definitions (and general concepts actually as well...):
- define model: category (what do, keep, is it each item has-1 cat, or many to many; need?; constraints?; etc.
- define model: tag
- nesting / hierarchy / recursive trees yes/no/depth/constraints/cycles/DAG re: (1) note (2) post (3) article (4) page (5) category (6) tag; prio on (1) note (5) category (6) tag but mostly on (1) note
- accept interim idea? - yes to note nesting with max depth of 4 FOR NOW (initial constraint for WIP status for now)
- enumerate rest of concepts, think of relationships and whether they are infl/constrained by fusefs plans (e.g.: i want multiple hardlinks able to point to same fs inode; hence multiple subtrees enumerating and traversing same edges (e.g. notes) differently)
- re: above, need general fungibility in general (but this comes with minuses of unconstrained crazy malleable graphs)
- big one: eventual definition (+/-, actionable) of overall SCOPE of functionality of later non-PoC deliverable.
good example: really need both (1) static sitegen (output == static artefact, like static site style) and (2) intermixable dynamic upstream? i mean the whole idea was to have at least (2) which is the crazier but for me the main attractive part; (1) would be "proper" to have and may come to be more useful to wider audience; so q is whether to drop (1); and i think (1) is actually good candidate for initial PoC-v2 deliverable, before our cool amazing (2); so it makes sense planning-wise
but this is a good example how up in the air even a very abstract notion of scope of eventual functionality for me is; so important point and WIP
- sitegen crappy PoC to validate idea
- sitegen simple multi-view html gen: single homepage vs split files (e.g.: pinned "articles" - only headings) - have both?
- later: general definition for modular plugin-able sitegen arch, will block proper decent implementation; how generic? reuse that JS fusefs proxy sitegen tool ideas? (does not appear that 'abstract')
- need to define initial PoC AC & expectations as well actually (e.g.: simple but general sh pipe (exec -> capture stdout) impl? already for PoC? maybe yes? need care because of amazing attack surface)
- actually one of initial intentions: i have >100 tagged .txt notes with titles, what do?? - want text search, tag structure, realtime semantic search and structure / similarity exploration, persistence, prettification and selective sharing
todo-misc - category: unsorted TODO
tags: TODO MISC VAR-ENTROPY NOTESmodified:
2025-04-14
created: 2025-04-13
prio: low
- investigate maple trees - data structure for vmem / searchable kfs filesystem (maple or r-b / prefix tr* plus hashmap + ? => 2-3 DS for substring search on whole FS tree - me want/need for kfs project; but also for THIS project prototyping > overengineering here (not ">>");
because i need to validate if viable before worrying about cacheline locality vis a vis lookup/sort; push up internal assigned value to "practicality" regarding PoC - yes useless abstact shit is cool and makes it worthwhile living but i do need a sense of finality re: this project - need to keep reminding this to myself) - yanked from my own rambling para somewhere around here - but kfs-wise kinda important::
reviewMAP_SHARED
mmap type,mlock
,shm_open
and esp. memfd_create() - what to do with google keep at keep.google.com - bunch of good notes (some must stay private)
- what to do with 1000s of bookmarks (as of couple years ago i do rudimentary but sufficient tagging), mostly chrome, some firefox, some pinboard.in
- i really want my arbitrary dyn fs, i could start by throwing them in at something like (html may be escaping / cutting off some chars)
grep "<bookmark>" all_links_export.html | cut -d'>' -f3[1] | awk if needed.. | grep -v some-exlucsion-filter > raw_links.txt; for link in $(cat raw_links.txt | likely-some-further-transform); do id=genUuid4(); path="/mnt/kfs/bookmarks/$id"; mkdir -p "$path"; echo "$link" > "$path/_fetch-url.kfstask"; done
(but we could also store raw full html with old tags etc, actually serialise as kfs tags, get all the fun semantics...)
[1]: i mean likely need another `cut` to remove suffix etc, just for illustration (plus implicit confession that i tend to overuse `cut` vs say some sometimes very simple regex...)
notes-misc - category: misc quick notes
tags: MISC UNPROCESSED VAR-ENTROPY NOTESmodified:
2025-04-14
created: 2025-04-13
prio: low
- one of the best audiovisual things i have seen and something i would love to aspire towards - Norman McLaren - Dots (1940) (less than 2min)
- compression and fulltext index stuff: check Enhanced suffix arrays vs regular older suffix arrays (as seen in golang pvz.);
- in particular Li, Li & Huo (2016);
- also https://arxiv.org/abs/1710.01896 Dismantling DivSufSort - entropy-bound indexable dictionaries (EIDs): good paper; also: all papers linked from wiki on Wavelet Tree
- related good walkthrough TOREAD
- audiovisual sandbox with optional constrained / directed / parametric LLM or internal NN assistance: this i do want to iterate (by using LLM itself...) and write about a bit, for now noting for self:
(click to expand) a long bulletpoint list of what i have in mind; keeps expanding; but i think i should first explore vvvv, it truly is an amazing piece of magic and engineering, imho.
explanation, notes to self, open questions etc. etc.; i use highlight for in my opinion key interesting points::
- p.s. later added note: with all this LLM crap i keep forgetting we can all run good old NNs including deep recurrent NNs, etc. - and there, we can set up proper author-NN feedback loops, like in Ultrachunk (2018) - i will now need to go thru everything Memo Akten has done, but no regrets, i am officially inspired
- there are some e.g. web based sandboxes and toolkits for quickly iterating on (with immediate feedback - the way it should be) audiovisual experiments
- i would very much like to either find, adapt and expand, or implement mvp / PoC version of a simple clean one which allows for:
- shader code for visual input,
- optional: actual code (e.g. Processing) for visual (or Lua repl; or Lisp/Scheme repl; tie things up with shader; repl on the side would be great either way) (if open code, then maybe Lua unless there's some decent embeddable Lisp dialect with the kinds of funcs / lib that would be sufficient; would be good to have ready-to-use efficient audio data parsing, chaotic systems (i guess just basic math with some complex numbers, worst case can have a generated auto-importable implementation...)
- audio input thru file and mic at first
- transformation into visual using above, parsing etc.
- built-in tools (invocable etc.) like beat / bpm detection (exists; for WebGL as well; nifty); what other tools?
- general prepared functions, building blocks, with simple clear interfaces
- ideally other audio input as well: simple proceduaral gen would be nice - simple sin() etc. waves and simple transformations at first
- recording of all steps (btw: side window REPL which autofills with actions from buttons / etc. would be nice), deterministic (random seed recorded as well), all ops auto serialised, saved, can cut them paste them as well etc.
- this is all nice but hopefully the kicker: optional LLM func as well - but by default the way it is integrated is - it is told that its output is highly constrained - in formal (auto exportable easily, schema etc.) language / subset of actions
- possibly multiple LLM output options, most open and complex one is - shader language; but still, it's just shader lang + it being told "also these helper methods exist:
[a..(int inarg ..) -> out type etc.., b..(..) -> .., ..] ..
- optional later feature ideally - chaining of LLM models, not just LLM, e.g. one paid model, elaborate, e.g. GPT4.5 or the likes, possibly in API call temperature set such that creativity increased, idea is initial broad search in solution space, dreaming vibes encouraged
- THEN, simulated annealing / structured pipeline style output is refined and structured, and model (perhaps different one) is used with lower creativity more precision - to do this and that - and finally ideally
- (much later, after many functional iterations and fun, if this sandbox is actually properly developed) eventually, a specifically dedicated trained model is used with optional feedback / backpropagation - can be user based but also automatic - tool in some cases is able to quantify how well model did depending on what it had to do - these metrics are sent back for iterating better
- dunno re: auto quantificaiton, maybe too difficult, i can think of convoluted auto feedback measure ideas - e.g. if task is marked (or auto discovered) as bpm / rhythm focused visual feedback - then rate of change of visual canvas (at first, number of pixels changed; later, combine with distance in colour; then maybe change in entropy / noisiness; then, parse contrast change rate, "radicality" etc.; detect out-of-rhythm change, etc.; but likely mostly depend on user feedback - which can be quantified of course, in more than one (how much like) dimension - e.g. as above, "beat responsiveness"; but also ability to explore the whole newly defined (by toggle(s)) phase space by itself; and LLM (and perhaps part of same prompt, or right thereafter, but overlapping context window) is asked to give ways to calculate metrics for x/y/z (user is used for some sanity checks? but system can optimistically explore with it anyway)) - would be amazing, auto feedback how to measure certain metrics -> tool auto explores phase space with metric -> can now automate back and forth feedback process, ideally? not easy, buuut this way with some user's help (esp. for some sanity checks and validation - what is nice user just needs to look at screen and confirm if "this part of mesh does indeed wiggle more than before" - nice) - can kinda keep iterating on many specific and general requests; LLM and user can both be consulted to help system catalogue new metrics in terms of how "general" / reusable they are - sometimes they will not be generalisable at all (that's ok, i will want system to record journals of everything anyway; eventually btw all of that can be fed into some big monster LLM / ML system as well...)
- what toggles is LLM given (besides open berth shader lang with some preset funcs to help out)?
- would be nice to auto detect procedurally "toggles in current shader or general code" (hmm this is a cool idea, Bret Victor inspired likely, his responsive interfaces, so good) - and then (1) offer sliders etc. to user based on data type, (2) on button click "explore / fuzz", auto toggle them and move around (perhaps record rate of change / distance metric, gather user feedback as well), use THESE auto discovered toggles in code as constrained levers / output options to LLM (of course as an option; and gather feedback from LLM as well whether current state is even useful to iterate on)
- record and remember output state, user feedback and satisfaction, but also above metrics incl. entropy, rate of change, colour intensity change e.g., etc. - depending on LLM generated output (which acts as input and changes state / puts system into altered phase space). use THIS data to later train a small model ourselves (later automatically), teaching our model while using larger model (kind of inspired by all the recent research work). and in general, aim to find various ways to compress and understand the very multidimensional space of "how to do audiovisuals" - so that the tool can eventually start to be able to process (tagged, quantifiable / easily vectorable etc.) prompts (which the higher-creativity large LLM is to handle at first - and that's what i want at first - this LLM through API processing open requests like below, us seeing immediate feedback, again Bret Victor style, and iterating on it:
- generate 3d interesting mesh visualisation responding to high bpm song [song profile e.g. compressed/small FFT / freq distribution can be provided as input, what other such audio summary inputs?], with focus on intense colour change, and a way to parametrize colour pallette so i can iterate on it (LLM is auto-told / constrained to give output in shader language v2.., with this short list of optional helper methods and vars (func sigs given), plus x/y/z toggle if any
- could this work from LLM perspective?
- are there more optimal way to structure LLM output format? ask LLM how better to compress it, and how much is good to constrain it (e.g. in cases when we want only change in some number of scalars / levers; given it operates in tokens... and does the high dimensionality even work here - i guess yes, still very much used in the middle of processing, but is it good to constrain output state space this much?
- one more ramble on dimensionality reduction pipeline and also just going with open thought how to use this + various feedback loops:
so a kind of annealing of: open-solution-space-high-creativity "generate beat-responsive 3d mesh with high focus on colour contrast changing, expose palette as var/toggle [+ tool adds constraint on output in shader lang + helper funcs]" -> parametrizable vars in shader code auto turned into toggles for user and auto fuzzing / exploration -> explore -> then (with feedback incl. quantifiable (-> error backprop)) "smoother mesh wiggle with big wiggle if low frequency beat" [optional compressed freq distribution plot can be added -> also ask (auto?) for metric "how much wiggle" and "how low the freq right now" -> (auto)plot, and also ideally get params to change this -> auto explore new phase space of how-strong-wiggle and how-low-freq-over-time; this auto-parametrization flow, with back and forth iteration actually sounds lovely in general, for other kinds of work; but i think still this kind of medium still good to start exploring this kind of approach because of focus on (and intensivity regarding) immediacy of feedback on input change, and how important this is. and we optionally have highly-dimensional spaces - multiple high-dimensionality ones (audio; visual; can come up with structureed shape-spaces etc.) as a good medium to explore and validate various (in very abstract general terms) compression approaches (dim. reduction, tho i know it's not the same). since i'm tripping on novel compression ideas a bit anyway (i really want to explore some entropy-change-responsive algo ideas - some work exists but also seems like there could be some niches), thinking in those terms here a bit, but really so many things in life and nature turn out to be some or other kind of compression related process. and heh, audiovisual art as dimensionality reduction: a case study is a preposterous tagline / subtitle for this tool (i would love it). :)
p.s. TODO reminder to self that vvvv exists, never got to properly explore it, need to check it out, play with it!!
p.s. ok i'm now starting to play with it and the tool is AMAZING. just look at all the videos, from node interfaces to all showreels. i love how self-contained and (1) very powerful in its generality and language expressivness BUT (2) offering immediate very concrete and many options for immediate output (and hot reload of course!) it is. such power but also immediate actionability. i mean it is a work of deep intense love since at least (looks like) 2002? community should be great; and the thing is going strong still[1]. amazing. i now remember how deeply curious and attracted it made me feel back in like 2008-2009 still in high school then i forgot about it; i want to play with it now, finally. funny how this works.
[1] re: "active maintenance": (not that this annoying "is it still actively maintained" criterion does not piss me off; implicit in this is error proneness and endless securtiy vulns; and need for new features; there should be more finished-but-usable-and-used tools (for noncommercial and nonprod use; i tend not to need SLAs for non-work stuff cmon).
important-reminders - category: high priority reminders duplicated from Cal
tags: TODO CALENDAR-SYNC IMPORTANT REMINDER NOTESmodified:
2025-04-13
created: 2025-04-13
prio: high
- dantų gyd. (2x tasks)