Unpacking phi-3-mini: Architecture Driving Phone-Deployable LLM Power

from Hackernoon 1 year ago

The phi-3-mini model is a transformer decoder architecture with default context length of 4K, extended to 128K in its long context variant. It employs a similar structure to Llama-2 for compatibility with open-source projects. The phi-3-small model, featuring 7 billion parameters, uses the tiktoken tokenizer and adopts a standard decoder architecture with enhanced performance through GEGLU activation and Maximal Update Parametrization. Also, a blocksparse attention module was introduced for improved efficiency in training and inference.

The phi-3-mini model is a transformer decoder architecture with a default context length of 4K, extending to 128K in the long context version, phi-3-mini-128K.

The phi-3-small model, with 7 billion parameters, utilizes a tiktoken tokenizer, 32 heads, and a hidden size of 4096, ensuring better multilingual tokenization.

We switched from GELU activation to GEGLU for improved performance and training stability in the phi-3-small model, optimizing hyperparameters using Maximal Update Parametrization.

A novel blocksparse attention module was designed to optimize training and inference speed by applying different sparsity patterns over KV cache for each attention head.

Read at Hackernoon

#transformers #ai #model-architecture #machine-learning #open-source

Collection

[

...

]

Unpacking phi-3-mini: Architecture Driving Phone-Deployable LLM Power | HackerNoonUnpacking phi-3-mini: Architecture Driving Phone-Deployable LLM Power | HackerNoon Briefly

Unpacking phi-3-mini: Architecture Driving Phone-Deployable LLM Power | HackerNoon
Unpacking phi-3-mini: Architecture Driving Phone-Deployable LLM Power | HackerNoon
Briefly