Draft:Llama.cpp

llama.cpp
Original author(s)	Georgi Gerganov
Developer(s)	Georgi Gerganov and community
Initial release	Alpha ( b1083 ) / August 26, 2023; 8 months ago
Written in	C++
License	MIT License
Website	github.com/ggerganov/llama.cpp

Review waiting, please be patient.

This may take 3 months or more, since drafts are reviewed in no specific order. There are 2,576 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL
Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Llama.cpp (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Bing, Wikipedia) · Submitted 23 days ago by 65.242.132.98 (talk: D · +) · Last edited 2 hours ago by Citation bot

Submission declined on 15 April 2024 by KylieTastic (talk).

This draft's references do not show that the subject qualifies for a Wikipedia article. In summary, the draft needs multiple published sources that are:

in-depth (not just passing mentions about the subject)
reliable
secondary
independent of the subject

Make sure you add references that meet these criteria before resubmitting. Learn about mistakes to avoid when addressing this issue. If no additional references exist, the subject is not suitable for Wikipedia.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL
Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by KylieTastic 23 days ago. Last edited by Citation bot 2 hours ago. Reviewer: Inform author.

This draft has been resubmitted and is currently awaiting re-review.

Llama.cpp is an open source software library that performs inference on various Large Language Models such as LLaMA.^[1] It is written in C++ and is generally smaller in size and complexity than most existing inference frameworks like TensorFlow. It currently has 55 thousand stars on GitHub.^[2]

History[edit]

Llama.cpp began development by Georgi Gerganov to implement LLaMA in pure C++ with no dependencies. The advantage of this method was that it could run on more hardware compared to other inference libraries that depended on hardware dependent closed source libraries like CUDA. Before Lamma.cpp, Gerganov worked on a similar library called whisper.cpp^[3] which implemented OpenAI's "whisper" speech to text model. Lamma.cpp gained traction from users who did not have specialized hardware as it could run on just a CPU including on Android devices.^[4] In March 2023 Gerganov started a company around llama.cpp called ggml.ai.^[5]

Architecture[edit]

Llama.cpp initially could only run on CPUs but now can run on GPUs using multiple different back-ends including Vulkan and SYCL. These back-ends make up the GGML tensor library which is used by the front-end model-specific llama.cpp code and is also used by other projects such as whisper.cpp.^[6] Llama.cpp has it's own model format called GGUF (previously referred to as GMML format).^[7] It is required to convert models from other formats to GGUF, and sometimes not all tensor functions required by a given model are supported by GGML/GGUF. Llama.cpp in general follows the KISS principle in order to make it as small and easy to use a dependency as possible. Llama.cpp supports ahead of time model quantization as opposed to on-the-fly quantization^[8]

References[edit]

^ Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.
^ "ggerganov/llama.cpp". GitHub.
^ "ggerganov/whisper.cpp". GitHub.
^ Edwards, Benj (13 March 2023). "You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi". arstechnica.com. Retrieved 15 April 2024.
^ "GGML - AI at the edge".
^ "GGML - AI at the edge". ggml.ai. Retrieved 16 April 2024.
^ Pounder, Les (25 March 2023). "How To Create Your Own AI Chatbot Server With Raspberry Pi 4". tomshardware.com. Retrieved 16 April 2024.
^ Walkowiak, Bartosz; Walkowiak, Tomasz (2024). "Implementation of language models within an infrastructure designed for Natural Language Processing" (PDF). International Journal of Electronics and Telecommunications. 70 (1): 153–159. doi:10.24425/ijet.2024.149525. Retrieved 8 May 2024.

[1] Connatser, Matthew. "How this open source LLM chatbot runner hit the gas on x86, Arm CPUs". theregister.com. Retrieved 15 April 2024.

[2] "ggerganov/llama.cpp". GitHub.

[3] "ggerganov/whisper.cpp". GitHub.

[4] Edwards, Benj (13 March 2023). "You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi". arstechnica.com. Retrieved 15 April 2024.

[5] "GGML - AI at the edge".

[6] "GGML - AI at the edge". ggml.ai. Retrieved 16 April 2024.

[7] Pounder, Les (25 March 2023). "How To Create Your Own AI Chatbot Server With Raspberry Pi 4". tomshardware.com. Retrieved 16 April 2024.

[8] Walkowiak, Bartosz; Walkowiak, Tomasz (2024). "Implementation of language models within an infrastructure designed for Natural Language Processing" (PDF). International Journal of Electronics and Telecommunications. 70 (1): 153–159. doi:10.24425/ijet.2024.149525. Retrieved 8 May 2024.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]