Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why is this faster than running llama.cpp main directly? I’m getting 7 tokens/ sec with this. But 2 with llama.cpp by itself


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: