Automatic radiology report generation from chest X-rays has attracted increasing attention as a way to assist clinicians and reduce the workload associated with medical image interpretation. Despite recent advances in multimodal learning, generating clinically coherent and factually consistent reports remains a challenging task. This paper presents a multimodal transformer-based framework for automatic chest X-ray report generation that integrates radiographic images and patient clinical history. Visual representations are extracted from frontal and lateral chest X-ray images using a ResNet-50 backbone with progressive fine-tuning. Clinical context is encoded using Bio_ClinicalBERT, allowing the model to incorporate domain-specific medical knowledge. These multimodal representations are fused and processed by a Transformer encoder–decoder architecture that generates radiology reports autoregressively. Experiments conducted on the MIMIC-CXR dataset demonstrate that the proposed model can produce structured radiology reports that capture clinically relevant findings. The model achieves a BLEU-4 score of 0.089, ROUGE-L of 0.263, and a METEOR score of 0.282, indicating strong semantic similarity between generated and reference reports. These results suggest that integrating clinical context with visual features is a promising direction for improving automated radiology report generation systems.