• 15-01-2020

    Paper accepted at ECAI 2020

    Our paper with M.Bahri (Telecom ParisTech), A.Bifet (Telecom ParisTech), S.Maniu (ParisSud), and R.Mello (University of São Paulo), "Compressed k-Nearest Neighbors Ensembles for Evolving Data Streams" has been accepted for publication at ECAI 2020.
  • 14-01-2020

    PC Member at KDD and IJCAI-PRICAI 2020

    I will be on the program committee of KDD 2020 and IJCAI-PRICAI 2020!
  • 22-04-2019

    Tutorial accepted at KDD, IJCAI, and ECML-PKDD

    Our tutorial with B. Cautis and S. Maniu has been accepted for presentation at KDD'19 (Alaska, USA, August 2019), IJCAI'19 (Macau, China, August 2019) and ECML-PKDD'19 (Würzburg, Germany, September 2019)), entitled: Adaptive Influence Maximization
  • 13-01-2019

    PC Member at KDD and ECML-PKDD 2019

    I will be on the program committee of KDD 2019 and ECML-PKDD 2019!
  • 13-12-2018

    Alliance Joint Research Grand

    Our joint research proposal with Columbia University will be granded by Alliance, entitled: e‐discovering (Astro) Physics Laws Using Machine Learning.
  • 30-07-2018

    Two papers accepted at European Workshop of Reinforcement Learning

    Two papers have been accepted for presentation at EWRL'18 (Lille, France, October 2018), entitled: i) Randomised Bayesian Least-Squares Policy Iteration, \w C. Dimitrakakis, and M. Vazirgiannis, and ii) Reinforcement learning for supply chain optimisation, \w L. Kemmer, H. Kleist, D. Rochebouët, and J. Read.
  • 19-07-2018

    Paper accepted at Pattern Analysis & Applications Journal

    Our paper with Jesse Read and Michalis Vazirgiannis entitled “Error-space Representations for Multi-dimensional Data-Streams with Temporal Dependence" has been accepted for publication in Pattern Analysis and Applications journal!
  • 21-06-2018

    PC Member at AAAI 2019

    I will be on the program committee of AAAI 2019!
  • 18-06-2017

    Paper accepted at ASONAM 2018

    Our paper with G. Salha and M. Vazirgiannis, "Adaptive Submodular Influence Maximization with Myopic Feedback" has been accepted for publication at ASONAM 2018.
  • 20-02-2018

    PC Member at ECML-PKDD 2018

    I will be on the program committee of ECML-PKDD 2018!
  • 22-06-2017

    Paper accepted at ECML-PKDD 2017

    Our paper with C. Dimitrakakis, " Bayesian Inference for Least Squares Temporal Difference Regularization" has been accepted for oral and poster presentation at ECML 2017.
  • 05-05-2017

    NVidia GPU Grant

    My NVIDIA GPU Grant request has been approved. NVidia Corporation is going to support my research with the donation of a Titan XPascal GPU.
  • 30-01-2017

    PC Member at ECML-PKDD,IJCAI, and AAAI 2017

    I will be on the program committee of ECML-PKDD, IJCAI and AAAI 2017!

Education & Training

  • PhD 2010-2015

    PhD in Computer Science & Engineering
    PhD Thesis: Machine Learning for Intelligent Agents

    Department of Computer Science & Engineering, University of Ioannina

  • MSc 2008-2010

    Master in Computer Science
    MSc Thesis: Autonomous Mobile Robot Navigation using Reinforcement Learning

    Department of Computer Science, University of Ioannina

  • BSc 2002-2007

    Bachelor in Computer Science

    Department of Computer Science, University of Ioannina

Filter by type:

Sort by year:

Predicting conversions in display advertising based on URL embeddings

Y. Qiu, N. Tziortziotis, M. Hue, and M. Vazirgiannis
Conference PaperAdKDD 2020 workshop at KDD'20 conference, San Diego, USA

Online display advertising is growing rapidly in recent years thanks to the automation of the ad buying process. Real-time bidding (RTB) allows the automated trading of ad impressions between advertisers and publishers through real-time auctions. In order to increase the effectiveness of their campaigns, advertisers should deliver ads to the users who are highly likely to be converted (i.e., purchase, registration, website visit, etc.) in the near future. In this study, we introduce and examine different models for estimating the probability of a user converting, given their history of visited URLs. Inspired by natural language processing, we introduce three URL embedding models to compute semantically meaningful URL representations. To demonstrate the effectiveness of the different proposed representation and conversion prediction models, we have conducted experiments on real logged events collected from an advertising platform.

Compressed k-Nearest Neighbors Ensembles for Evolving Data Streams

M. Bahri, A. Bifet, S. Maniu, R. Mello, and N. Tziortziotis
Conference Paper24th European Conference on Artificial Intelligence (ECAI 2020), Santiago de Compostela, Spain.

The unbounded and multidimensional nature, the evolution of data distributions with time, and the requirement of single-pass algorithms comprise the main challenges of data stream classification, which makes it impossible to infer learning models in the same manner as for batch scenarios. Data dimensionality reduction arises as a key factor to transform and select only the most relevant features from those streams in order to reduce algorithm space and time demands. In that context, Compressed Sensing (CS) encodes an input signal into lower-dimensional space, guaranteeing its reconstruction up to some distortion factor ε. This paper employs CS on data streams as a pre-processing step to support a k-Nearest Neighbors (kNN) classification algorithm, one of the most often used algorithms in the data stream mining area – all this while ensuring the key properties of CS hold. Based on topological properties, we show that our classification algorithm also preserves the neighborhood (withing an ε factor) of kNN after reducing the stream dimensionality with CS. As a consequence, end-users can set an acceptable error margin while performing such projections for kNN. For further improvements, we incorporate this method into an ensemble classifier, Leveraging Bagging, by combining a set of different CS matrices which increases the diversity inside the ensemble. An extensive set of experiments is performed on various datasets, and the results were compared against those yielded by current state-of-the-art approaches, confirming the good performance of our approaches.

Boosting Tricks for Word Mover’s Distance

K. Skianis, F. Malliaros, N. Tziortziotis, and M. Vazirgiannis
Conference Paper29th International Conference on Artificial Neural Networks (ICANN2020),

Word embeddings have opened a new path in creating novel approaches for addressing traditional problems in the natural language processing (NLP) domain. However, using word embeddings to com- pare text documents remains a relatively unexplored topic — with Word Mover’s Distance (WMD) being the prominent tool used so far. In this paper, we present a variety of tools that can further improve the compu- tation of distances between documents based on WMD. We demonstrate that, alternative stopwords, cross document-topic comparison, deep con- textualized word vectors and convex metric learning, constitute powerful tools that can boost WMD.

Adaptive Influence Maximization

B. Cautis, S. Maniu, N. Tziortziotis
Tutorial KDD'19 (Alaska, USA, August 2019), IJCAI'19 (Macau, China, August 2019), and ECML-PKDD'19 (Würzburg, Germany, September 2019).
∗ Authors names appeared in alphabetical order.

Information diffusion and social influence are more and more present in today's Web ecosystem. Having algorithms that optimize the presence and message diffusion on social media is indeed crucial to all actors (media companies, political parties, corporations, etc.) who advertise on the Web. Motivated by the need for effective viral marketing strategies, influence estimation and influence maximization have therefore become important research problems, leading to a plethora of methods. However, the majority of these methods are non-adaptive, and therefore not appropriate for scenarios in which influence campaigns may be ran and observed over multiple rounds, nor for scenarios which cannot assume full knowledge over the diffusion networks and the ways information spreads in them.

In this tutorial we intend to present the recent research on adaptive influence maximization, which aims to address these limitations. This can be seen as a particular case of the influence maximization problem (seeds in a social graph are selected to maximize information spread), one in which the decisions are taken as the influence campaign unfolds, over multiple rounds, and where knowledge about the graph topology and the influence process may be partial or even entirely missing. This setting, depending on the underlying assumptions, leads to variate and original approaches and algorithmic techniques, as we have witnessed in recent literature. We will review the most relevant research in this area, by organizing it along several key dimensions, and by discussing the methods' advantages and shortcomings, along with open research questions and the practical aspects of their implementation. Tutorial slides will become publicly available on

Orthogonal Matching Pursuit for Text Classification

K. Skianis, N. Tziortziotis, and M. Vazirgiannis
Conference Paper 4th Workshop on Noisy User-generated Text (W-NUT) at (EMNLP 2018), Brussels, Belgium, November 2018.

In text classification, the problem of overfitting arises due to the high dimensionality, making regularization essential. Although classic regularizers provide sparsity, they fail to return highly accurate models. On the contrary, state-of-the-art group-lasso regularizers provide better results at the expense of low sparsity. In this paper, we apply a greedy variable selection algorithm, called Orthogonal Matching Pursuit, for the text classification task. We also extend standard Group OMP by introducing overlapping group OMP to handle overlapping groups of features. Empirical analysis verifies that both OMP and overlapping GOMP constitute powerful regularizers, able to produce effective and very sparse models.

Randomised Bayesian Least-Squares Policy Iteration

N. Tziortziotis, C. Dimitrakakis, and M. Vazirgiannis
Conference Paper 14th European Workshop on Reinforcement Learning (EWRL 2018), Lille, France, October 2018.

We introduce Bayesian least-squares policy iteration (BLSPI), an off-policy, model-free, policy iteration algorithm that uses the Bayesian least-squares temporal-difference (BLSTD) learning algorithm to evaluate policies. An online variant of BLSPI has been also proposed, called randomised BLSPI (RBLSPI), that improves its policy based on an incomplete policy evaluation step. In online setting, the exploration-exploitation dilemma should be addressed as we try to discover the optimal policy by using samples collected by ourselves. RBLSPI exploits the advantage of BLSTD to quantify our uncertainty about the value function. Inspired by Thompson sampling, RBLSPI first samples a value function from a posterior distribution over value functions, and then selects actions based on the sampled value function. The effectiveness and the exploration abilities of RBLSPI are demonstrated experimentally in several environments.

Reinforcement learning for supply chain optimization

Lukas Kemmer, Henrik von Kleist, Diego María De Grimaudet De Rochebouët, N. Tziortziotis and Jesse Read
Conference Paper 14th European Workshop on Reinforcement Learning (EWRL 2018), Lille, France, October 2018.

In this paper we investigate the performance of two reinforcement learning (RL) agents within a supply chain optimization environment. We model the environment as a Markov decision process (MDP) where during each step it needs to be decided how many products should be produced in a factory and how many products should be shipped to different warehouses. We then design three different agents based on a static (ς, Q)-policy, the approximate SARSA and the REINFORCE algorithm. Here we pay special attention to different feature mapping functions that are used to model the value of state and stateaction pairs respectively. By testing the agents in different environment initializations, we find that both the approximate SARSA and the REINFORCE algorithms can outperform the static (ς, Q) agent in simple scenarios and that the REINFORCE agent performs best even in more complex settings.

MATI: An Efficient Algorithm for Influence Maximization in Social Networks

M. G. Rossi, B. Shi, N. Tziortziotis, F.D. Malliaros, C. Giatsidis, and M. Vazirgiannis
Journal Paper PLOS One, 13(11), 2018

Influence maximization has attracted a lot of attention due to its numerous applications, including diffusion of social movements, the spread of news, viral marketing and outbreak of diseases. The objective is to discover a group of users that are able to maximize the spread of influence across a network. The greedy algorithm gives a solution to the Influence Maximization problem while having a good approximation ratio. Nevertheless it does not scale well for large scale datasets. In this paper, we propose Matrix Influence, MATI, an efficient algorithm that can be used under both the Linear Threshold and Independent Cascade diffusion models. MATI is based on the precalculation of the influence by taking advantage of the simple paths in the node’s neighborhood. An extensive empirical analysis has been performed on multiple real-world datasets showing that MATI has competitive performance when compared to other well-known algorithms with regards to running time and expected influence spread.

Error-space representations for multi-dimensional data streams with temporal dependence

J. Read, N. Tziortziotis, M. Vazirgiannis
Journal Paper Pattern Analysis and Applications (PAA), 2018.

In many application scenarios data points are not only temporally dependent, but also expected in the form of a fast-moving stream. A broad selection of efficient learning algorithms exist which may be applied to data streams, but they typically do not take into account the temporal nature of the data. We motivate and design a method which creates an efficient representation of a data stream, where temporal information is embedded into each instance via the error space of forecasting models. Unlike many other methods in the literature, our approach can be rapidly initialized and does not require iterations over the full data sequence, thus it is suitable for a streaming scenario. This allows the application of off-the-shelf data-stream methods, depending on the application domain. In this paper we investigate classification. We compare to a large variety of methods (auto-encoders, HMMs, basis functions, clustering methodologies, and PCA) and find that our proposed methods performs very competitively, and offers much promise for future work.

Adaptive Submodular Influence Maximization with Myopic Feedback

G. Salha*, N. Tziortziotis*, M. Vazirgiannis
Conference Paper 10th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2018), Barcelona, France, August 2018.
∗ These authors contributed equally to this work.

This paper examines the problem of adaptive influence maximization in social networks. As adaptive decision making is a time-critical task, a realistic feedback model has been considered, called myopic. In this direction, we propose the myopic adaptive greedy policy that is guaranteed to provide a (1 - 1/e)-approximation of the optimal policy under a variant of the independent cascade diffusion model. This strategy maximizes an alternative utility function that has been proven to be adaptive monotone and adaptive submodular. The proposed utility function considers the cumulative number of active nodes through the time, instead of the total number of the active nodes at the end of the diffusion. Our empirical analysis on real-world social networks reveals the benefits of the proposed myopic strategy, validating our theoretical results.

Bayesian Inference for Least Squares Temporal Difference Regularization

N. Tziortziotis, C. Dimitrakakis
Conference Paper 27th European Conference on Machine Learning Learning (ECML 2017), Skopje, September 2017.

This paper proposes a fully Bayesian approach for Least-Squares Temporal Differences (LSTD), resulting in fully probabilistic inference of value functions that avoids the overfitting commonly experienced with classical LSTD when the number of features is larger than the number of samples. Sparse Bayesian learning provides an elegant solution through the introduction of a prior over value function parameters. This gives us the advantages of probabilistic predictions, a sparse model, and good generalisation capabilities, as irrelevant parameters are marginalised out. The algorithm efficiently approximates the posterior distribution through variational inference. We demonstrate the ability of the algorithm in avoiding overfitting experimentally.

MATI: An Efficient Algorithm for Influence Maximization in Social Networks

M. G. Rossi, B. Shi, N. Tziortziotis, F.D. Malliaros, C. Giatsidis, and M. Vazirgiannis
Conference Paper 6th International Conference on Complex Networks and their Applications (Complex Networks 2017), Lyon, France, November 2017.

In this study, we propose MATI, an efficient IM algorithm under both the LT and IC models. By taking advantage of the possible paths that are created in each node’s neighborhood, we have designed an algorithm that succeeds in locating the users that can maximize the influence in a social network while also being scalable for large datasets. In order to limit the computation of the possible paths and the respective probabilities of them being “active”, we use a pruning threshold θ that reduces the running time but also the accuracy of the influence computation. Extensive experiments show that MATI has competitive performance when compared with the baseline methods both in terms of influence and computation time.

A k-core Decomposition Framework for Graph Clustering

C. Giatsidis, F. D. Malliaros, N. Tziortziotis, C. Dhanjal, E. Kiagias, D. M. Thilikos, M. Vazirgiannis
Journal Paper CoRR, 2016.


Graph clustering or community detection constitutes an important task for investigating the internal structure of graphs, with a plethora of applications in several domains. Traditional techniques for graph clustering, such as spectral methods, typically suffer from high time and space complexity. In this article, we present CoreCluster, an efficient graph clustering framework based on the concept of graph degeneracy, that can be used along with any known graph clustering algorithm. Our approach capitalizes on processing the graph in an hierarchical manner provided by its core expansion sequence, an ordered partition of the graph into different levels according to the k-core decomposition. Such a partition provides an efficient way to process the graph in an incremental manner that preserves its clustering structure, while making the execution of the chosen clustering algorithm much faster due to the smaller size of the graph's partitions onto which the algorithm operates. An experimental analysis on a multitude of real and synthetic data demonstrates that our approach can be applied to any clustering algorithm accelerating the clustering process, while the quality of the clustering structure is preserved or even improved.

A Bayesian Ensemble Regression Framework on the Angry Birds Game

N. Tziortziotis , G. Papagiannis, K. Blekas
Journal Paper IEEE Transactions on Computational Intelligence and AI in Games (TCIAIG), 8(2):104-115, 2016


In this article we introduce AngryBER, an intelligent agent architecture on the Angry Birds domain that employs a Bayesian ensemble inference mechanism to promote decision making abilities. It is based on an efficient tree-like structure for encoding and representing game screenshots, where it exploits its enhanced modeling capabilities. This has the advantage to establish an informative feature space and translate the task of game playing into a regression analysis problem. A Bayesian ensemble regression framework is presented by considering that every combination of objects’ material and bird type has its own regression model. We address the problem of action selection as a multi-armed bandit problem, where the Upper Confidence Bound (UCB) strategy has been used. An efficient online learning procedure has been also developed for training the regression models. We have evaluated the proposed methodology on several game levels, and compared its performance with published results of all agents that participated in the 2013 and 2014 Angry Birds AI competitions. The superiority of the new method is readily deduced by inspecting the reported results.

Machine Learning for Intelligent Agents

N. Tziortziotis
PhD Thesis Department of Computer Science & Engineering, University of Ioannina, Greece, March 2015


This dissertation studies the problem of developing intelligent agents, which are able to acquire skills in an autonomous way, simulating human behaviour. An autonomous intelligent agent acts e ectively in an unknown environment, directing its activity to- wards achieving a specific goal based on some performance measure. Through this interaction, a rich amount of information is received, which allows the agent to per- ceive the consequences of its actions, identify important behavioural components, and adapt its behaviour through learning. In this direction, the present dissertation con- cerns the development, implementation and evaluation of machine learning techniques for building intelligent agents. Three important and very challenging tasks are consid- ered: i) approximate reinforcement learning, where the agent’s policy is evaluated and improved through the approximation of the value function, ii) Bayesian reinforcement learning, where the reinforcement learning problem is modeled as a decision-theoretic problem, by placing a prior distribution over Markov Decision Processes (MDPs) that encodes the agent’s belief about the true environment, and iii) Development of intel- ligent agents on games, which constitute a really challenging platform for developing machine learning methodologies, involving a number of issues that should be resolved, such as the appropriate choice of state representation, continuous action spaces, etc..

Quality Optimization of H.264/AVC Video Transmission over Noisy Environments Using a Sparse Regression Framework

K. Pandremmenou, N. Tziortziotis , S. Paluri, W. Zhang, K. Blekas, L. P. Kondi, S. Kumar
Conference Paper Visual Information Processing and Communication VI, Proceedings of SPIE-IS&T Electronic Imaging, San Francisco, CA, February 2015.


We propose the use of the Least Absolute Shrinkage and Selection Operator (LASSO) regression method in order to predict the Cumulative Mean Squared Error (CMSE), incurred by the loss of individual slices in video transmission. We extract a number of quality-relevant features from the H.264/AVC video sequences, which are given as input to the LASSO. This method has the benefit of not only keeping a subset of the features that have the strongest effects towards video quality, but also produces accurate CMSE predictions. Particularly, we study the LASSO regression through two different architectures; the Global LASSO (G.LASSO) and Local LASSO (L.LASSO). In G.LASSO, a single regression model is trained for all slice types together, while in L.LASSO, motivated by the fact that the values for some features are closely dependent on the considered slice type, each slice type has its own regression model, in an effort to improve LASSO’s prediction capability. Based on the predicted CMSE values, we group the video slices into four priority classes. Additionally, we consider a video transmission scenario over a noisy channel, where Unequal Error Protection (UEP) is applied to all prioritized slices. The provided results demonstrate the efficiency of LASSO in estimating CMSE with high accuracy, using only a few features.

Cover Tree Bayesian Reinforcement Learning

N. Tziortziotis , C. Dimitrakakis, K. Blekas
Journal Paper Journal of Machine Learning Reaserch (JMLR), (15):2313-2335, 2014


This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with a Gaussian process model, a linear model and simple least squares policy iteration.

The Reinforcement Learning Competition

C. Dimitrakakis, G. Li, N. Tziortziotis
Magazine Paper Artificial Intelligence (AI) Magazine, 2014


Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.

Usable ABC Reinforcement Learning

C. Dimitrakakis, N. Tziortziotis
Conference Paper Advances in Neural Information Processing Systems 27 (NIPS 2014), ABC in Montreal workshop, Montreal, Canada, December 2014

The issues with the use of Approximate Bayesian Computation in Reinforcement Learning is the following. Firstly, that the model set may comprise simulators which are purely deterministic. Secondly, that there is a dependence between the policy used and the data collected, which necessitate maintaining a representation of the policy used as well as the data history. Thirdly, there is the question of the statistics used. Finally, there is the problem selecting a policy given the data observed so far. In this paper, we report some progress on using more sophisticated statistics and policy search algorithms and show that they have significant impact.

A Bayesian Ensemble Regression Framework on the Angry Birds Game

N. Tziortziotis, G. Papagiannis, K. Blekas
Conference Paper ECAI Symposium on Artificial Intelligence in Angry Birds, Prague, Czech Republic, August 2014.
Second Place on the the Angry Birds AI Competiton 2014.

An ensemble inference mechanism is proposed on the Angry Birds domain. It is based on an efficient tree structure for encoding and representing game screenshots, where it exploits its enhanced modeling capability. This has the advantage to establish an informative feature space and modify the task of game playing to a regression analysis problem. To this direction, we assume that each type of object material and bird pair has its own Bayesian linear regression model. In this way, a multi-model regression framework is designed that simultaneously calculates the conditional expectations of several objects and makes a target decision through an ensemble of regression models. The learning procedure is performed according to an online estimation strategy for the model parameters. We provide comparative experimental results on several game levels that empirically illustrate the efficiency of the proposed methodology.

Play Ms. Pac-Man using an Advanced Reinforcement Learning Agent

N. Tziortziotis, K. Tziortziotis and K. Blekas
Conference Paper 8th Hellenic Conference on Artificial Intelligence (SETN 2014), Ioannina, Greece, May 2014.

Reinforcement Learning (RL) algorithms have been promising methods for designing intelligent agents in games. Although their capability of learning in real time has been already proved, the high dimensionality of state spaces in most game domains can be seen as a significant barrier. This paper studies the popular arcade video game Ms. Pac-Man and outlines an approach to deal with its large dynamical environment. Our motivation is to demonstrate that an abstract but informative state space description plays a key role in the design of efficient RL agents. Thus, we can speed up the learning process without the necessity of Q-function approximation. Several experiments were made using the multiagent MASON platform where we measured the ability of the approach to reach optimum generic policies which enhances its generalization abilities.

ABC Reinforcement Learning

C. Dimitrakakis, N. Tziortziotis
Conference Paper30th International Conference on Machine Learning Learning (ICML 2013), Atlanta, USA, June 2013, JMLR W & CP 28(3):684-692.

We introduce a simple, general framework for likelihood-free Bayesian reinforcement learning, through Approximate Bayesian Computation (ABC). The advantage is that we only require a prior distribution on a class of simulators. This is useful when a probabilistic model of the underlying process is too complex to formulate, but where detailed simulation models are available. ABC-RL allows the use of any Bayesian reinforcement learning technique in this case. It can be seen as an extension of simulation methods to both planning and inference. We experimentally demonstrate the potential of this approach in a comparison with LSPI. Finally, we introduce a theorem showing that ABC is sound.

Linear Bayesian Reinforcement Learning

N. Tziortziotis, C. Dimitrakakis, K. Blekas
Conference Paper23rd International Joint Conference on Artificial Intelligence (IJCAI 2013), Beijing, China, August 2013.

This paper proposes a simple linear Bayesian approach to reinforcement learning. We show that with an appropriate basis, a Bayesian linear Gaussian model is sufficient for accurately estimating the system dynamics, and in particular when we allow for correlated noise. Policies are estimated by first sampling a transition model from the current posterior, and then performing approximate dynamic programming on the sampled model. This form of approximate Thompson sampling results in good exploration in unknown environments. The approach can also be seen as a Bayesian generalisation of least-squares policy iteration, where the empirical transition matrix is replaced with a sample from the posterior.

Resource Allocation in Visual Sensor Networks Using a Reinforcement Learning Framework

K. Pandremmenou, N. Tziortziotis, L. P. Kondi, K. Blekas
Conference Paper18th IEEE International Conference on Digital Signal Processing (DSP), Santorini, Greece, July 2013.

In recent years, video delivery over wireless visual sensor networks (VSNs) has gained increasing attention. The lossy compression and channel errors that occur during wireless multimedia transmissions can degrade the quality of the transmitted video sequences. This paper addresses the problem of cross-layer resource allocation among the nodes of a wireless direct-sequence code division multiple access (DS-CDMA) VSN. The optimal group of pictures (GoP) length during the encoding process is also considered, based on the motion level of each video sequence. Three optimization criteria that optimize a different objective function of the video qualities of the nodes are used. The nodes' transmission parameters, i.e., the source coding rates, channel coding rates and power levels can only take discrete values. In order to tackle the resulting optimization problem, a reinforcement learning (RL) strategy that promises efficient exploration and exploitation of the parameters' space is employed. This makes the proposed methodology usable in large or continuous state spaces as well as in an online mode. Experimental results highlight the efficiency of the proposed method.

Model-based Reinforcement learning using online clustering

N. Tziortziotis, K. Blekas
Conference Paper24th IEEE International Conference onTools with Artificial Intelligence (ICTAI 2012), Pireus, Greece, November 2012.

A significant issue in representing reinforcement learning agents in Markov decision processes is how to design efficient feature spaces in order to estimate optimal policy. This particular study addresses this challenge by proposing a compact framework that employs an on-line clustering approach for constructing appropriate basis functions. Also, it performs a state-action trajectory analysis to gain valuable affinity information among clusters and estimate their transition dynamics. Value function approximation is used for policy evaluation in a least-squares temporal difference framework. The proposed method is evaluated in several simulated and real environments, where we took promising results.

An online kernel-based clustering approach for value function approximation

N. Tziortziotis, K. Blekas
Conference Paper 7th Hellenic Conference on Artificial Intelligence (SETN 2012), Lamia, Greece, May 2012.

Value function approximation is a critical task in solving Markov decision processes and accurately modeling reinforcement learning agents. A significant issue is how to construct efficient feature spaces from samples collected by the environment in order to obtain an optimal policy. The particular study addresses this challenge by proposing an on-line kernel-based clustering approach for building appropriate basis functions during the learning process. The method uses a kernel function capable of handling pairs of state-action as sequentially generated by the agent. At each time step, the procedure either adds a new cluster, or adjusts the winning cluster’s parameters. By considering the value function as a linear combination of the constructed basis functions, the weights are optimized in a temporal-difference framework in order to minimize the Bellman approximation error. The proposed method is evaluated in numerous known simulated environments.

Value Function Approximation through Sparse Bayesian Modeling

N. Tziortziotis, K. Blekas
Conference Paper 9th European Workshop on Reinforcement Learning (EWRL-9), Athens, Greece, September 2011.

In this study we present a sparse Bayesian framework for value function approximation. The proposed method is based on the on-line construction of a dictionary of states which are collected during the exploration of the environment by the agent. A linear regression model is established for the observed partial discounted return of such dictionary states, where we employ the Relevance Vector Machine (RVM) and exploit its enhanced modeling capability due to the embedded sparsity properties. In order to speed-up the optimization procedure and allow dealing with large-scale problems, an incremental strategy is adopted. A number of experiments have been conducted on both simulated and real environments, where we took promising results in comparison with another Bayesian approach that uses Gaussian processes.

A Bayesian Reinforcement Learning framework using Relevant Vector Machines

N. Tziortziotis, K. Blekas
Conference Paper 25th International Conference on Artificial Inteligence (AAAI-2011), San Francinco, USA, August 2011.

In this work we present an advanced Bayesian formulation to the task of control learning that employs the Relevance Vector Machines (RVM) generative model for value function evaluation. The key aspect of the proposed method is the design of the discount return as a generative linear model that constitutes a well-known probabilistic approach. This allows to augment the model with advantegeous sparse priors provided by the RVM's regression framework. We have also taken into account the significant issue of selecting the proper parameters of the kernel design matrix. Experiments have shown that our method produces improved performance in both simulated and real test environments.