We present a new version of the forward-backward splitting expectation-maximization network (FBSEM-Net). FBSEM-Net is a deep learned method unrolling a maximum a posteriori EM algorithm. The regularisation step is replaced by a residual convolutional neural network (CNN) to model the gradient of the prior and the regularisation strength is learned, avoiding any hyperparameter tuning. An additional channel with a magnetic resonance (MR) image can be added to the CNN to guide the PET reconstruction. However, when mismatches exist between the two images, MR-unique structures can appear in the PET image and PET-unique features are at risk of being suppressed. We propose an enhanced version of FBSEM-Net by introducing an anato-functional step. Based on the MR image and the current PET estimate, Gaussian similarity kernels are computed for every voxel with its first-order neighbors. The two feature vectors obtained are then compared in a similar manner to detect where structures between the two images differ. Such similarity maps are computed at every iteration and are multiplied with the global regularisation strength to modulate its effect spatially. The hyperparameters introduced by this step can be learned along the parameters of the neural network. We also introduce in this work a new custom loss function focusing on regions of interest where PET-unique lesions can be removed. We investigated the benefits of the custom loss by comparing the performance of various networks trained with the mean square error loss, the custom loss function, or the sum of both. Results on 2D simulated test data show that the AF step and the custom loss function reduce the impact of MR misguidance in the presence of mismatches between the two modalities. In future work, the performance of the network will be assessed with 3D real data.