GitHub - infiniteregrets/kv-psi: Use Linux Pressure Stall Information to trim an LLM KV cache
Navigation Menu
Toggle navigation
[](https://github.com/)
Appearance settings
* Platform
* AI CODE CREATION
- GitHub Copilot Write better code with AI
- GitHub Copilot app Direct agents from issue to merge
- MCP Registry New Integrate external tools
* DEVELOPER WORKFLOWS
- Actions Automate any workflow
- Codespaces Instant dev environments
- Code Review Manage code changes
* APPLICATION SECURITY
- GitHub Advanced Security Find and fix vulnerabilities
- Code security Secure your code as you build
- Secret protection Stop leaks before they start
* EXPLORE
- Blog
* Solutions
* BY COMPANY SIZE
- Startups
* BY USE CASE
- DevOps
- CI/CD
* BY INDUSTRY
* Resources
* EXPLORE BY TOPIC
- AI
- DevOps
- Security
* EXPLORE BY TYPE
* SUPPORT & SERVICES
- Partners
* Open Source
* COMMUNITY
- GitHub Sponsors Fund open source developers
* PROGRAMS
* REPOSITORIES
- Topics
- Trending
* Enterprise
* ENTERPRISE SOLUTIONS
- Enterprise platform AI-powered developer platform
* AVAILABLE ADD-ONS
- GitHub Advanced Security Enterprise-grade security features
- Copilot for Business Enterprise-grade AI features
- Premium Support Enterprise-grade 24/7 support
- Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search
Clear
Provide feedback
We read every piece of feedback, and take your input very seriously.
- [x] Include my email address so I can be contacted
Cancel Submit feedback
Saved searches
Use saved searches to filter your results more quickly
Name
Query
To see all available qualifiers, see our documentation.
Cancel Create saved search
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
infiniteregrets/**kv-psi**Public
- NotificationsYou must be signed in to change notification settings
- Fork 0
- Star 3
- Code
- Issues 0
- Actions
- Projects
- Insights
Additional navigation options
- Code
- Issues
- Actions
- Projects
- Insights
[](https://github.com/infiniteregrets/kv-psi)
infiniteregrets/kv-psi
main
[](https://github.com/infiniteregrets/kv-psi/branches)[](https://github.com/infiniteregrets/kv-psi/tags)
Go to file
Code
Open more actions menu
Folders and files
| Name | Name | Last commit message | Last commit date | | --- | --- | --- | --- |
| ## Latest commit infiniteregrets Initial commit Jun 27, 2026 df8802c·Jun 27, 2026 ## History 1 Commit Open commit details [](https://github.com/infiniteregrets/kv-psi/commits/main/)1 Commit |
| [benchmarks](https://github.com/infiniteregrets/kv-psi/tree/main/benchmarks "benchmarks") | [benchmarks](https://github.com/infiniteregrets/kv-psi/tree/main/benchmarks "benchmarks") | [Initial commit](https://github.com/infiniteregrets/kv-psi/commit/df8802ce14a8e314e9d780ffbb9df56e88657672 "Initial commit") | Jun 27, 2026 |
| [examples](https://github.com/infiniteregrets/kv-psi/tree/main/examples "examples") | [examples](https://github.com/infiniteregrets/kv-psi/tree/main/examples "examples") | [Initial commit](https://github.com/infiniteregrets/kv-psi/commit/df8802ce14a8e314e9d780ffbb9df56e88657672 "Initial commit") | Jun 27, 2026 |
| [integrations/llama.cpp](https://github.com/infiniteregrets/kv-psi/tree/main/integrations/llama.cpp "This path skips through empty directories") | [integrations/llama.cpp](https://github.com/infiniteregrets/kv-psi/tree/main/integrations/llama.cpp "This path skips through empty directories") | [Initial commit](https://github.com/infiniteregrets/kv-psi/commit/df8802ce14a8e314e9d780ffbb9df56e88657672 "Initial commit") | Jun 27, 2026 |
| [scripts](https://github.com/infiniteregrets/kv-psi/tree/main/scripts "scripts") | [scripts](https://github.com/infiniteregrets/kv-psi/tree/main/scripts "scripts") | [Initial commit](https://github.com/infiniteregrets/kv-psi/commit/df8802ce14a8e314e9d780ffbb9df56e88657672 "Initial commit") | Jun 27, 2026 |
| [src/psi_kv_governor](https://github.com/infiniteregrets/kv-psi/tree/main/src/psi_kv_governor "This path skips through empty directories") | [src/psi_kv_governor](https://github.com/infiniteregrets/kv-psi/tree/main/src/psi_kv_governor "This path skips through empty directories") | [Initial commit](https://github.com/infiniteregrets/kv-psi/commit/df8802ce14a8e314e9d780ffbb9df56e88657672 "Initial commit") | Jun 27, 2026 |
| [tests](https://github.com/infiniteregrets/kv-psi/tree/main/tests "tests") | [tests](https://github.com/infiniteregrets/kv-psi/tree/main/tests "tests") | [Initial commit](https://github.com/infiniteregrets/kv-psi/commit/df8802ce14a8e314e9d780ffbb9df56e88657672 "Initial commit") | Jun 27, 2026 |
| [.gitignore](https://github.com/infiniteregrets/kv-psi/blob/main/.gitignore ".gitignore") | [.gitignore](https://github.com/infiniteregrets/kv-psi/blob/main/.gitignore ".gitignore") | [Initial commit](https://github.com/infiniteregrets/kv-psi/commit/df8802ce14a8e314e9d780ffbb9df56e88657672 "Initial commit") | Jun 27, 2026 |
| [README.md](https://github.com/infiniteregrets/kv-psi/blob/main/README.md "README.md") | [README.md](https://github.com/infiniteregrets/kv-psi/blob/main/README.md "README.md") | [Initial commit](https://github.com/infiniteregrets/kv-psi/commit/df8802ce14a8e314e9d780ffbb9df56e88657672 "Initial commit") | Jun 27, 2026 |
| [pyproject.toml](https://github.com/infiniteregrets/kv-psi/blob/main/pyproject.toml "pyproject.toml") | [pyproject.toml](https://github.com/infiniteregrets/kv-psi/blob/main/pyproject.toml "pyproject.toml") | [Initial commit](https://github.com/infiniteregrets/kv-psi/commit/df8802ce14a8e314e9d780ffbb9df56e88657672 "Initial commit") | Jun 27, 2026 | | View all files |
Repository files navigation
- README
More items
PSI KV Governor
[](https://github.com/infiniteregrets/kv-psi#psi-kv-governor)
PSI KV Governor is a small reference implementation for using Linux Pressure Stall Information to trim an LLM KV cache when the system is under memory pressure.
Requirements
[](https://github.com/infiniteregrets/kv-psi#requirements)
- Linux with PSI enabled: cgroup `memory.pressure` or `/proc/pressure/memory`
- Python 3.10+
- llama.cpp build dependencies for the runner
- a GGUF model, for example `models/SmolLM2-135M-Instruct-Q2_K.gguf`
Check PSI:
undefinedshell cat /proc/pressure/memory PYTHONPATH=src python benchmarks/pressure_bench.py --preflight-only undefined
Basic Usage
[](https://github.com/infiniteregrets/kv-psi#basic-usage)
Run the reference simulator:
undefinedshell PYTHONPATH=src python -m psi_kv_governor.cli simulate undefined
Build the llama.cpp runner:
undefinedshell scripts/build_llama_runner.sh undefined
Download the small benchmark model if needed:
undefinedshell python scripts/download_demo_model.py undefined
PSI Benchmark
[](https://github.com/infiniteregrets/kv-psi#psi-benchmark)
Run both variant orders. This matters because PSI `avg10`, cache, and zram/swap state can carry over from the first pressure run into the second.
undefinedshell PYTHONPATH=src python benchmarks/pressure_bench.py \ -c 2048 \ -n 1536 \ --keep 64 \ --tail 256 \ --min-prune 64 \ --pressure-mib 6000 \ --pressure-step-mib 1024 \ --pressure-warmup-s 10 \ --variant-cooldown-s 45 \ --out-dir data/bench-pressure/fixed-first
PYTHONPATH=src python benchmarks/pressure_bench.py \ --variant-order psi-first \ -c 2048 \ -n 1536 \ --keep 64 \ --tail 256 \ --min-prune 64 \ --pressure-mib 6000 \ --pressure-step-mib 1024 \ --pressure-warmup-s 10 \ --variant-cooldown-s 45 \ --out-dir data/bench-pressure/psi-first undefined
Recent Jetson result:
| run | variant | decoded | tok/s | prunes | final KV | external PSI some/full | | --- | --- | ---: | ---: | ---: | ---: | --- | | fixed-first | fixed | 1536 | 94.00 | 0 | 1547 | 1.61/1.61 | | fixed-first | PSI | 1536 | 88.80 | 4 | 1291 | 4.14/3.94 | | psi-first | PSI | 1536 | 96.16 | 2 | 1004 | 2.46/2.33 | | psi-first | fixed | 1536 | 89.76 | 0 | 1547 | 5.56/5.56 |
Result directories:
- `data/bench-pressure/real-psi-6000m-1536tok-cooldown`
- `data/bench-pressure/real-psi-6000m-1536tok-cooldown-psi-first`
About
Use Linux Pressure Stall Information to trim an LLM KV cache
Resources
Uh oh!
There was an error while loading. Please reload this page.
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Contributors 1
- **infiniteregrets**Mehul Arora
Languages
- C++27.2%
Footer
[](https://github.com/) © 2026 GitHub,Inc.
Footer navigation
- Terms
- Privacy
- Security
- Status
- Docs
- Contact
- Manage cookies
- Do not share my personal information
You can’t perform that action at this time.