Data and AI on Power

IBM Power systems provide a robust and scalable platform for a wide range of data and AI workloads, offering benefits in performance, security, and ease of use.

View Only

Back to Blog List

Run vLLM on ppc64le Architecture

By Manjunath Kumatagi posted Thu June 27, 2024 06:39 AM

Introduction

Large Language Models (LLMs) are revolutionizing various fields, and vLLM emerges as a powerful library for LLM inference and serving. Great news for users with ppc64le hardware! Recent developments (https://github.com/vllm-project/vllm/pull/5652) indicate that vLLM now added support for this architecture. This blog outlines the steps to get started with vLLM on ppc64le.

Build the container image:

$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm/
$ podman build --security-opt label=disable --format docker -t vllm:ppc64le -f Dockerfile.ppc64le .

Using the Built Image:

# listing the images built
$ podman images
REPOSITORY                     TAG         IMAGE ID      CREATED         SIZE
localhost/vllm                 ppc64le     9a44a2021b41  38 minutes ago  4.32 GB
<none>                         <none>      a80bc1de136c  43 minutes ago  2.51 GB
docker.io/mambaorg/micromamba  latest      358d7e727885  9 days ago      137 MB
$
# creating the directory for caching the models from the huggingface
$ mkdir -p ~/.cache/huggingface
$ podman run -ti -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host localhost/vllm:ppc64le

Above command starts the server with the default model(facebook/opt-125m) pulled from hugginface and this server can be queried in the same format as OpenAI API. For example,

List the models:

$ curl http://localhost:8000/v1/models | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   484  100   484    0     0   236k      0 --:--:-- --:--:-- --:--:--  236k
{
  "object": "list",
  "data": [
    {
      "id": "facebook/opt-125m",
      "object": "model",
      "created": 1719484379,
      "owned_by": "vllm",
      "root": "facebook/opt-125m",
      "parent": null,
      "max_model_len": 2048,
      "permission": [
        {
          "id": "modelperm-8ba9ddf949764d359f2db7eb1fa92090",
          "object": "model_permission",
          "created": 1719484379,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

Chat completion:

$ curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "facebook/opt-125m",
        "prompt": "San Francisco is a",
        "max_tokens": 7,
        "temperature": 0
    }' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   443  100   308  100   135    306    134  0:00:01  0:00:01 --:--:--   440
{
  "id": "cmpl-f77c7f6e64df4221836d85d64d28ae04",
  "object": "text_completion",
  "created": 1719484486,
  "model": "facebook/opt-125m",
  "choices": [
    {
      "index": 0,
      "text": " great place to live.  I",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 12,
    "completion_tokens": 7
  }
}

For more information on the usage please refer the vllm document - https://docs.vllm.ai/en/stable/index.html

#Featured-area-1
#Featured-area-1-home

5 comments

88 views

Permalink

https://community.ibm.com/community/user/blogs/manjunath-kumatagi/2024/06/27/run-vllm-on-ppc64le-architecture

Comments

Andre Lutz

Tue January 21, 2025 03:26 PM

Have the same problem now as this user on Github

https://github.com/vllm-project/vllm/issues/11837

ERROR 01-21 20:18:03 engine.py:136] return forward_call(*args, **kwargs)
ERROR 01-21 20:18:03 engine.py:136] file ‘/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/attention/layer.py’, line 161, in forward
ERROR 01-21 20:18:03 engine.py:136] return torch.ops.vllm.unified_attention(
ERROR 01-21 20:18:03 engine.py:136] file ‘/opt/conda/lib/python3.10/site-packages/torch/_ops.py’, line 921, in __getattr__
ERROR 01-21 20:18:03 engine.py:136] raise AttributeError(
ERROR 01-21 20:18:03 engine.py:136] AttributeError: ‘_OpNamespace’ ‘vllm’ object has no attribute ‘unified_attention’
INFO: Shutdown
INFO: Waiting for the application to shut down.
INFO: Application shutdown completed.

(base) [root@andre-infra ~]# curl http://localhost:8000/v1/models | jq
% Total    % Received % Xferd Average Speed   Time    Time     Time Current
                                Dload Upload   Total   Spent    Left Speed
100   484 100   484    0     0   472k      0 --:--:-- --:--:-- --:--:-- 472k
{
"object": "list",
"data": [
   {
     "id": "facebook/opt-125m",
     "object": "model",
     "created": 1737490653,
     "owned_by": "vllm",
     "root": "facebook/opt-125m",
     "parent": null,
     "max_model_len": 2048,
     "permission": [
       {
         "id": "modelperm-429ffd8065de4c40acbafac6631800f3",
         "object": "model_permission",
         "created": 1737490653,
         "allow_create_engine": false,
         "allow_sampling": true,
         "allow_logprobs": true,
         "allow_search_indices": false,
         "allow_view": true,
         "allow_fine_tuning": false,
         "organization": "*",
         "group": null,
         "is_blocking": false
       }
     ]
   }
]
}
(base) [root@andre-infra ~]# curl http://localhost:8000/v1/completions \
   -H "Content-Type: application/json" \
   -d '{
       "model": "facebook/opt-125m",
       "prompt": "San Francisco is a",
       "max_tokens": 7,
       "temperature": 0
   }' | jq
% Total    % Received % Xferd Average Speed   Time    Time     Time Current
                                Dload Upload   Total   Spent    Left Speed
100   135    0     0 100   135      0   2177 --:--:-- --:--:-- --:--:-- 2177

Manjunath Kumatagi

Sun January 19, 2025 06:55 AM

similar behaviour seen while running on one of the machine with 16GB with default model, when I checked the dmesg and seen following messages:

[ 803.573297] __vm_enough_memory: pid: 3672, comm: pt_main_thread, not enough memory for the allocation
[ 809.613661] __vm_enough_memory: pid: 3678, comm: pt_main_thread, not enough memory for the allocation

Tried the following workaround and able to run the model on the same machine without any issues.

sysctl -w vm.overcommit_memory=1

Note: this may unblock you for the testing if same error but not really sure about its side effect.

Andre Lutz

Fri January 17, 2025 02:16 PM

How much memory did your Lpar have?

Manjunath Kumatagi

Fri January 17, 2025 12:43 PM

OSError: [Errno 12] Cannot allocate memory

I see above error in the trace, wondering if you can bump the memory for the lpar and retry?

Andre Lutz

Fri January 17, 2025 11:40 AM

Hello,

I get these errors .
Can you help me with this?

[root@andre-infra vllm]# podman run -ti -v ~/.ollama/vllm/ -p 8000:8000 --ipc=host localhost/vllm:ppc64le /opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1 017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libj peg` or `libpng` installed before building `torchvision` from source? warn( INFO 01-17 16:36:29 __init__.py:179] Automatically detected platform cpu. INFO 01-17 16:36:31 api_server.py:768] vLLM API server version 0.6.6.post2.dev256+g87a0c076 INFO 01-17 16:36:31 api_server.py:769] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=No ne, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_ path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin=' ', model='facebook/opt-125m', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local _media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_back end='xgrammar', logits_processor_pattern=None, distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_ns ight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, nu m_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=F alse, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, dis able_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, f ully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, ena ble_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculat ive_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_pos terior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name =None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config =None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=Fals e) INFO 01-17 16:36:31 api_server.py:195] Started engine process with PID 5 config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 651/651 [00:00<00:00, 6.39MB/s] INFO 01-17 16:36:32 config.py:2273] For POWERPC, we cast models to bfloat16 instead of using float16 by default. Float16 is not currently supported for POWERPC. WARNING 01-17 16:36:32 config.py:2317] Casting torch.float16 to torch.bfloat16. /opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1 017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libj peg` or `libpng` installed before building `torchvision` from source? warn( INFO 01-17 16:36:37 __init__.py:179] Automatically detected platform cpu. INFO 01-17 16:36:40 config.py:2273] For POWERPC, we cast models to bfloat16 instead of using float16 by default. Float16 is not currently supported for POWERPC. WARNING 01-17 16:36:40 config.py:2317] Casting torch.float16 to torch.bfloat16. ERROR 01-17 16:36:40 registry.py:296] Error in inspecting model architecture 'OPTForCausalLM' ERROR 01-17 16:36:40 registry.py:296] Traceback (most recent call last): ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 491, in _run _in_subprocess ERROR 01-17 16:36:40 registry.py:296] returned.check_returncode() ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/subprocess.py", line 457, in check_returncode ERROR 01-17 16:36:40 registry.py:296] raise CalledProcessError(self.returncode, self.args, self.stdout, ERROR 01-17 16:36:40 registry.py:296] subprocess.CalledProcessError: Command '['/opt/conda/bin/python3', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1. ERROR 01-17 16:36:40 registry.py:296] ERROR 01-17 16:36:40 registry.py:296] The above exception was the direct cause of the following exception: ERROR 01-17 16:36:40 registry.py:296] ERROR 01-17 16:36:40 registry.py:296] Traceback (most recent call last): ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 294, in _try _inspect_model_cls ERROR 01-17 16:36:40 registry.py:296] return model.inspect_model_cls() ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 265, in insp ect_model_cls ERROR 01-17 16:36:40 registry.py:296] return _run_in_subprocess( ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 494, in _run _in_subprocess ERROR 01-17 16:36:40 registry.py:296] raise RuntimeError(f"Error raised in subprocess:\n" ERROR 01-17 16:36:40 registry.py:296] RuntimeError: Error raised in subprocess: ERROR 01-17 16:36:40 registry.py:296] /opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvis ion/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong wit h your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? ERROR 01-17 16:36:40 registry.py:296] warn( ERROR 01-17 16:36:40 registry.py:296] /opt/conda/lib/python3.10/runpy.py:126: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour ERROR 01-17 16:36:40 registry.py:296] warn(RuntimeWarning(msg)) ERROR 01-17 16:36:40 registry.py:296] Traceback (most recent call last): ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main ERROR 01-17 16:36:40 registry.py:296] return _run_code(code, main_globals, None, ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code ERROR 01-17 16:36:40 registry.py:296] exec(code, run_globals) ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 515, in <mod ule> ERROR 01-17 16:36:40 registry.py:296] _run() ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 508, in _run ERROR 01-17 16:36:40 registry.py:296] result = fn() ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 266, in <lam bda> ERROR 01-17 16:36:40 registry.py:296] lambda: _ModelInfo.from_model_cls(self.load_model_cls())) ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 269, in load _model_cls ERROR 01-17 16:36:40 registry.py:296] mod = importlib.import_module(self.module_name) ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/importlib/__init__.py", line 126, in import_module ERROR 01-17 16:36:40 registry.py:296] return _bootstrap._gcd_import(name[level:], package, level) ERROR 01-17 16:36:40 registry.py:296] File "<frozen importlib._bootstrap>", line 1050, in _gcd_import ERROR 01-17 16:36:40 registry.py:296] File "<frozen importlib._bootstrap>", line 1027, in _find_and_load ERROR 01-17 16:36:40 registry.py:296] File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked ERROR 01-17 16:36:40 registry.py:296] File "<frozen importlib._bootstrap>", line 688, in _load_unlocked ERROR 01-17 16:36:40 registry.py:296] File "<frozen importlib._bootstrap_external>", line 883, in exec_module ERROR 01-17 16:36:40 registry.py:296] File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/opt.py", line 34, in <module> ERROR 01-17 16:36:40 registry.py:296] from vllm.model_executor.layers.logits_processor import LogitsProcessor ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/logits_processor.py", line 12, in <module> ERROR 01-17 16:36:40 registry.py:296] from vllm.model_executor.layers.vocab_parallel_embedding import ( ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/vocab_parallel_embedding.py", l ine 137, in <module> ERROR 01-17 16:36:40 registry.py:296] def get_masked_input_and_mask( ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1875, in fn ERROR 01-17 16:36:40 registry.py:296] return compile(model, ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1893, in compile ERROR 01-17 16:36:40 registry.py:296] return torch._dynamo.optimize(backend=backend, nopython=fullgraph, dynamic=dynamic, disable=disable)(model) ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 681, in optimize ERROR 01-17 16:36:40 registry.py:296] compiler_config=backend.get_compiler_config() ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1734, in get_compiler_config ERROR 01-17 16:36:40 registry.py:296] from torch._inductor.compile_fx import get_patched_config_dict ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 40, in <module> ERROR 01-17 16:36:40 registry.py:296] from torch._inductor.codecache import code_hash, CompiledFxGraph, FxGraphCache ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2727, in <module> ERROR 01-17 16:36:40 registry.py:296] AsyncCompile.warm_pool() ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2632, in warm_pool ERROR 01-17 16:36:40 registry.py:296] pool._adjust_process_count() ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 697, in _adjust_process_count ERROR 01-17 16:36:40 registry.py:296] self._spawn_process() ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 714, in _spawn_process ERROR 01-17 16:36:40 registry.py:296] p.start() ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 121, in start ERROR 01-17 16:36:40 registry.py:296] self._popen = self._Popen(self) ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/multiprocessing/context.py", line 281, in _Popen ERROR 01-17 16:36:40 registry.py:296] return Popen(process_obj) ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__ ERROR 01-17 16:36:40 registry.py:296] self._launch(process_obj) ERROR 01-17 16:36:40 registry.py:296] File "/opt/conda/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch ERROR 01-17 16:36:40 registry.py:296] self.pid = os.fork() ERROR 01-17 16:36:40 registry.py:296] OSError: [Errno 12] Cannot allocate memory ERROR 01-17 16:36:40 registry.py:296] Traceback (most recent call last): File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/entrypoints/openai/api_server.py", line 832, in <module> uvloop.run(run_server(args)) File "/opt/conda/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run return loop.run_until_complete(wrapper()) File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete File "/opt/conda/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper return await main File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/entrypoints/openai/api_server.py", line 796, in run_server async with build_async_engine_client(args) as engine_client: File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/entrypoints/openai/api_server.py", line 125, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/entrypoints/openai/api_server.py", line 206, in build_async_engine_client_from_engine_args engine_config = engine_args.create_engine_config() File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 1043, in create_engine_config model_config = self.create_model_config() File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 969, in create_model_config return ModelConfig( File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 342, in __init__ self.multimodal_config = self._init_multimodal_config( File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 402, in _init_multimodal_config if ModelRegistry.is_multimodal_model(architectures): File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 430, in is_multimodal_model model_cls, _ = self.inspect_model_cls(architectures) File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 390, in inspect_model_cls return self._raise_for_unsupported(architectures) File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 347, in _raise_for_unsupported raise ValueError( ValueError: Model architectures ['OPTForCausalLM'] failed to be inspected. Please check the logs for more details. ERROR 01-17 16:36:48 registry.py:296] Error in inspecting model architecture 'OPTForCausalLM' ERROR 01-17 16:36:48 registry.py:296] Traceback (most recent call last): ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 491, in _run _in_subprocess ERROR 01-17 16:36:48 registry.py:296] returned.check_returncode() ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/subprocess.py", line 457, in check_returncode ERROR 01-17 16:36:48 registry.py:296] raise CalledProcessError(self.returncode, self.args, self.stdout, ERROR 01-17 16:36:48 registry.py:296] subprocess.CalledProcessError: Command '['/opt/conda/bin/python3', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1. ERROR 01-17 16:36:48 registry.py:296] ERROR 01-17 16:36:48 registry.py:296] The above exception was the direct cause of the following exception: ERROR 01-17 16:36:48 registry.py:296] ERROR 01-17 16:36:48 registry.py:296] Traceback (most recent call last): ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 294, in _try _inspect_model_cls ERROR 01-17 16:36:48 registry.py:296] return model.inspect_model_cls() ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 265, in insp ect_model_cls ERROR 01-17 16:36:48 registry.py:296] return _run_in_subprocess( ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 494, in _run _in_subprocess ERROR 01-17 16:36:48 registry.py:296] raise RuntimeError(f"Error raised in subprocess:\n" ERROR 01-17 16:36:48 registry.py:296] RuntimeError: Error raised in subprocess: ERROR 01-17 16:36:48 registry.py:296] /opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvis ion/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong wit h your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? ERROR 01-17 16:36:48 registry.py:296] warn( ERROR 01-17 16:36:48 registry.py:296] /opt/conda/lib/python3.10/runpy.py:126: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour ERROR 01-17 16:36:48 registry.py:296] warn(RuntimeWarning(msg)) ERROR 01-17 16:36:48 registry.py:296] Traceback (most recent call last): ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main ERROR 01-17 16:36:48 registry.py:296] return _run_code(code, main_globals, None, ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code ERROR 01-17 16:36:48 registry.py:296] exec(code, run_globals) ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 515, in <mod ule> ERROR 01-17 16:36:48 registry.py:296] _run() ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 508, in _run ERROR 01-17 16:36:48 registry.py:296] result = fn() ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 266, in <lam bda> ERROR 01-17 16:36:48 registry.py:296] lambda: _ModelInfo.from_model_cls(self.load_model_cls())) ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 269, in load _model_cls ERROR 01-17 16:36:48 registry.py:296] mod = importlib.import_module(self.module_name) ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/importlib/__init__.py", line 126, in import_module ERROR 01-17 16:36:48 registry.py:296] return _bootstrap._gcd_import(name[level:], package, level) ERROR 01-17 16:36:48 registry.py:296] File "<frozen importlib._bootstrap>", line 1050, in _gcd_import ERROR 01-17 16:36:48 registry.py:296] File "<frozen importlib._bootstrap>", line 1027, in _find_and_load ERROR 01-17 16:36:48 registry.py:296] File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked ERROR 01-17 16:36:48 registry.py:296] File "<frozen importlib._bootstrap>", line 688, in _load_unlocked ERROR 01-17 16:36:48 registry.py:296] File "<frozen importlib._bootstrap_external>", line 883, in exec_module ERROR 01-17 16:36:48 registry.py:296] File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/opt.py", line 34, in <module> ERROR 01-17 16:36:48 registry.py:296] from vllm.model_executor.layers.logits_processor import LogitsProcessor ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/logits_processor.py", line 12, in <module> ERROR 01-17 16:36:48 registry.py:296] from vllm.model_executor.layers.vocab_parallel_embedding import ( ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/vocab_parallel_embedding.py", l ine 137, in <module> ERROR 01-17 16:36:48 registry.py:296] def get_masked_input_and_mask( ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1875, in fn ERROR 01-17 16:36:48 registry.py:296] return compile(model, ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1893, in compile ERROR 01-17 16:36:48 registry.py:296] return torch._dynamo.optimize(backend=backend, nopython=fullgraph, dynamic=dynamic, disable=disable)(model) ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 681, in optimize ERROR 01-17 16:36:48 registry.py:296] compiler_config=backend.get_compiler_config() ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1734, in get_compiler_config ERROR 01-17 16:36:48 registry.py:296] from torch._inductor.compile_fx import get_patched_config_dict ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 40, in <module> ERROR 01-17 16:36:48 registry.py:296] from torch._inductor.codecache import code_hash, CompiledFxGraph, FxGraphCache ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2727, in <module> ERROR 01-17 16:36:48 registry.py:296] AsyncCompile.warm_pool() ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2632, in warm_pool ERROR 01-17 16:36:48 registry.py:296] pool._adjust_process_count() ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 697, in _adjust_process_count ERROR 01-17 16:36:48 registry.py:296] self._spawn_process() ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 714, in _spawn_process ERROR 01-17 16:36:48 registry.py:296] p.start() ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 121, in start ERROR 01-17 16:36:48 registry.py:296] self._popen = self._Popen(self) ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/multiprocessing/context.py", line 281, in _Popen ERROR 01-17 16:36:48 registry.py:296] return Popen(process_obj) ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__ ERROR 01-17 16:36:48 registry.py:296] self._launch(process_obj) ERROR 01-17 16:36:48 registry.py:296] File "/opt/conda/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch ERROR 01-17 16:36:48 registry.py:296] self.pid = os.fork() ERROR 01-17 16:36:48 registry.py:296] OSError: [Errno 12] Cannot allocate memory ERROR 01-17 16:36:48 registry.py:296] ERROR 01-17 16:36:48 engine.py:381] Model architectures ['OPTForCausalLM'] failed to be inspected. Please check the logs for more details. ERROR 01-17 16:36:48 engine.py:381] Traceback (most recent call last): ERROR 01-17 16:36:48 engine.py:381] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 372, in run_mp_ engine ERROR 01-17 16:36:48 engine.py:381] engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ERROR 01-17 16:36:48 engine.py:381] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 115, in from_en gine_args ERROR 01-17 16:36:48 engine.py:381] engine_config = engine_args.create_engine_config(usage_context) ERROR 01-17 16:36:48 engine.py:381] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 1043, in create_engine_confi g ERROR 01-17 16:36:48 engine.py:381] model_config = self.create_model_config() ERROR 01-17 16:36:48 engine.py:381] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 969, in create_model_config ERROR 01-17 16:36:48 engine.py:381] return ModelConfig( ERROR 01-17 16:36:48 engine.py:381] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 342, in __init__ ERROR 01-17 16:36:48 engine.py:381] self.multimodal_config = self._init_multimodal_config( ERROR 01-17 16:36:48 engine.py:381] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 402, in _init_multimodal_config ERROR 01-17 16:36:48 engine.py:381] if ModelRegistry.is_multimodal_model(architectures): ERROR 01-17 16:36:48 engine.py:381] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 430, in is_mul timodal_model ERROR 01-17 16:36:48 engine.py:381] model_cls, _ = self.inspect_model_cls(architectures) ERROR 01-17 16:36:48 engine.py:381] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 390, in inspec t_model_cls ERROR 01-17 16:36:48 engine.py:381] return self._raise_for_unsupported(architectures) ERROR 01-17 16:36:48 engine.py:381] File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 347, in _raise _for_unsupported ERROR 01-17 16:36:48 engine.py:381] raise ValueError( ERROR 01-17 16:36:48 engine.py:381] ValueError: Model architectures ['OPTForCausalLM'] failed to be inspected. Please check the logs for more details. Process SpawnProcess-1: Traceback (most recent call last): File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 383, in run_mp_engine raise e File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 372, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 115, in from_engine_args engine_config = engine_args.create_engine_config(usage_context) File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 1043, in create_engine_config model_config = self.create_model_config() File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 969, in create_model_config return ModelConfig( File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 342, in __init__ self.multimodal_config = self._init_multimodal_config( File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 402, in _init_multimodal_config if ModelRegistry.is_multimodal_model(architectures): File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 430, in is_multimodal_model model_cls, _ = self.inspect_model_cls(architectures) File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 390, in inspect_model_cls return self._raise_for_unsupported(architectures) File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 347, in _raise_for_unsupported raise ValueError( ValueError: Model architectures ['OPTForCausalLM'] failed to be inspected. Please check the logs for more details.

Data and AI on Power

Data and AI on Power

Run vLLM on ppc64le Architecture

By Manjunath Kumatagi posted Thu June 27, 2024 06:39 AM

Introduction

Build the container image:

Using the Built Image:

List the models:

Chat completion:

Permalink

Comments

Additional
Resources

Office

Quick Links

Data and AI on Power

Data and AI on Power

Run vLLM on ppc64le Architecture

By Manjunath Kumatagi posted Thu June 27, 2024 06:39 AM

Introduction

Build the container image:

Using the Built Image:

List the models:

Chat completion:

Permalink

Comments

Additional Resources

Office

Quick Links

Additional
Resources