اولین کنفرانس بین‌المللی علوم نوین در مهندسی

DeepSeek Coder: Fine-Tuned LLM for Accurate Firmware Decompilation

کد مقاله : 1071-NAEC

نویسندگان

سعید پارسا *

استاد تمام گروه مهندسی کامپیوتر

چکیده مقاله

In the competitive world of digital control industries, reverse engineering of firmware is crucial for maintaining competitiveness and accelerating development in the modern landscape of intelligence-driven embedded systems across various industrial sectors. With the increasing reliance on cyber-physical systems (CPS), effective decompilation of embedded binary code has become critical for software maintainability, security, and competitive analysis. However, a persistent challenge in decompilation is the loss of meaningful and contextually relevant names, which significantly hampers the comprehensibility of recovered source code. Existing tools, including Hex-Rays IDA Pro, Ghidra, and RetDec, struggle with reconstructing semantically accurate functions and their parameters names, especially in the presence of compiler optimizations, obfuscation, and missing debugging symbols. Beyond the challenge of assigning appropriate functions and their parameter names, these and other decompilation tools also face difficulties in accurately analyzing and comprehending the underlying algorithms, which is crucial for facilitating potential competitive firmware improvements and optimizations. In this article, I present a fine-tuned DeepSeek Coder large language model (LLM) that significantly improves identifier reconstruction in decompiled C and C++ binaries. Unlike general-purpose LLMs such as OpenAI’s ChatGPT, Google’s Gemini, and Meta’s Code Llama, our model is explicitly optimized for clean code principles. It generates function and variable names that enhance readability, maintainability, and semantic recovery. By leveraging prompt engineering and domain-specific fine-tuning, our approach reconstructs functionally meaningful identifiers and high-level abstractions with greater accuracy than existing machine learning-driven decompilation methods.

کلیدواژه ها

LLM, reverse engineering, method name, prediction, algorithm, execution order

وضعیت: پذیرفته شده