Practical problems in artificial intelligence often involve both large state and/or action spaces where only partial information is available to the agent. In high-dimensional cases, function approximation methods, such as neural networks, are often used to overcome limitations of traditional tabular schemes. In the context of reinforcement learning, the actor-critic architecture has received much attention in recent years, in which an actor network maps states to actions and a critic produces value function approximation given a state-action pair. This framework involves training two separate networks, thus requiring the critic network to effectively converge before the actor is able to produce a suitable policy, resulting in duplication of effort in modeling the environment. This paper presents a novel approach for consolidating the actor and critic networks into a single network that provides the functionality offered by the two separate networks. We demonstrate the proposed architecture on a partially observable maze learning problem.