DataSet management

_images/datasets.png

Solvers Hierarchy.

DataSet Class

class DataSet : public SolutionContainer

This class represents an optimization problem (a dataset), which is comprised of a set of points.

enum OptimizationType
enumerator minimization
enumerator maximization

Defines the optimization goal. Defaults to maximization.

Public Members

std::vector<Point*> points

A set of points in the dataset.

std::string filename

Full path to the source file.

OptimizationType typeOfOptimization

Type of this problem.

Constructors

DataSet(int nObjectives = 2)

Non-parametrized constructor.

DataSet(const DataSet &dataset)

Copy constructor.

DataSet(const std::string filename, bool normalizedName)

Reads data from a file. If normalizedName is true, meta properties are inferred from the filename (e.g., name_of_experiment_dXXX_nXXX_ZZ).

DataSet(const std::string filename, DataSetParameters settings)

Reads data from a file using provided dataset properties.

DataSet(const std::string filename, std::string name, int dimensions, int sample, int nPoints)

Reads data from a file with explicitly provided experiment metadata.

Accessors & Modifiers

Point *getIdeal()

Gets the value of ideal point.

Point *getNadir()

Gets the value of nadir point.

void setIdeal(Point*)

Aribtrarily sets the ideal point.

void setNadir(Point*)

Aribtrarily sets the nadir point.

void setParameters(DataSetParameters settings)
void setDimensionality(int dim)
void setName(std::string name)
void setNumberOfPoints(int npts)
void setSampleNumber(int sampleN)

Setters for dataset metadata.

Stream & File Operations

std::istream &Load(std::istream &stream)

Reads the dataset from an input stream.

void Save(const std::string filename)

Saves the dataset to a file.

static DataSet *LoadFromFilename(const std::string filename)

Factory method to load a dataset from a file.

static std::vector<DataSet*> LoadBulk(const std::string directory)

Loads multiple datasets from a directory.

NDTree<Point> toNDTree()

Converts the dataset to an NDTree.

Functions & Normalization

void normalize()

Normalizes the dataset.

void reverseObjectives()

Reverses the objectives for the points.

bool add(Point *point)

Adds a single point to the dataset.

bool remove(Point *point)

Removes a given point from the dataset (value driven).

void clear()

Clears the dataset.

void RemoveDominated()

Removes all dominated points from the dataset.

DataSetParameters getParameters()

This function returns hyperparameters of the dataset (such as filename, number of objectives) encapsulated as DataSetParameters object.

Grouping

static std::vector<std::vector<DataSet>> BulkGroup(std::vector<DataSet> problems, ProblemGrouping grouping)

Groups a set of datasets according to a given criteria. (enum ProblemGrouping { Name, Dimensionality, NameDimensionality };)

DatasetParameters

class DataSetParameters

Configuration parameters for defining and identifying datasets within the experiments.

OptimizationType optType

Deprecated since version 1.0: This member is currently unused.

std::string filename

The path or filename associated with the dataset.

std::string name

The descriptive name of the experiment.

int NumberOfObjectives

The dimensionality (number of objectives) of the dataset.

int nPoints

The total number of points contained in the dataset.

int sampleNumber

The specific sample index for the experiment.

DataSetParameters()

Non-parameterized constructor.

DataSetParameters(std::string name, int dimensions, int nPoints, int sampleNumber)

Parameterized constructor to initialize dataset properties.

DataSetParameters(std::string filename)

Constructs the object by parsing metadata directly from the provided filename.

~DataSetParameters()

Destructor for the class.

enum OptimizationType

Defines the optimization goal.

enumerator max

Maximization problem.

enumerator min

Minimization problem.

NDTree class

template<class Solution>
class NDTree

A tree-based data structure used for maintaining a non-dominated set of solutions.

std::vector<Solution*> listSet

Internal storage for solutions contained within the archive.

Point NadirPoint

The nadir point of the current non-dominated set.

Point IdealPoint

The ideal point of the current non-dominated set.

int NumberOfObjectives

The dimensionality of the objective space.

bool maximization

Determines if the tree is configured for maximization (true) or minimization (false).

TreeNode<Solution> *root

Pointer to the root node of the TreeNode structure.

void saveToList()

Synchronizes the tree structure to the listSet.

std::string listToString()

Serializes the contents of the non-dominated set to a string.

long numberOfSolutions()

Returns the total count of non-dominated solutions stored.

virtual bool isDominated(Point &ComparedSolution)

Checks if the provided solution is dominated by any point currently in the archive.

virtual bool update(Point &NewSolution, bool checkDominance)

Attempts to insert a new solution into the archive, optionally performing a dominance check.

virtual bool update(Point &NewSolution)

Standard update method to insert or reject a solution based on current archive state.

void Save(char *FileName)

Persists the non-dominated set to a file.

void DeleteAll()

Clears all stored solutions and resets the tree.

DataSet *toDataSet()

Converts the archive into a DataSet object.

NDTree(bool maximization = true)

Constructs an empty NDTree.

NDTree(DataSet dataset, bool maximization = true)

Constructs an NDTree initialized with an existing DataSet.

~NDTree()

Destructor for resource cleanup.

Point class

class Point

Represents a point in the objective space for multi-objective optimization problems.

int NumberOfObjectives

The number of objectives defining the space.

DType ObjectiveValues[MAXOBJECTIVES]

The array containing objective values for the point.

Point()

Default constructor.

Point(int NumberOfObjectives)

Parameterized constructor.

Point(const Point &Point)

Copy constructor.

ComparisonResult Compare(Point &point, bool maximization)

Compares this point with another based on the provided maximization flag.

DType operator[](int n) const

Getter operator for objective values.

DType get(int n) const

Getter for objective values.

DType &operator[](int n)

Setter operator for objective values.

std::istream &Load(std::istream &Stream)

Reads the point data from an input stream.

std::ostream &Save(std::ostream &Stream)

Saves objective values to an open output stream, separated by tabs.

DType Distance(Point &ComparedPoint, Point &IdealPoint, Point &NadirPoint)

Calculates the distance between points, normalized by Ideal and Nadir points.

DType CleanChebycheffScalarizingFunctionInverse(std::vector<DType> &weightVector, Point &referencePoint)

Calculates the inverse Clean Chebycheff scalarizing function.

DType CleanChebycheffScalarizingFunctionOriginal(std::vector<DType> &weightVector, Point &referencePoint)

Calculates the original Clean Chebycheff scalarizing function.

Code Snippets

Creating Dataset Ad-hoc

#include <moda\DataSet.h>
#include <iostream>
int main()
{
   moda::DataSet* dataSet = new moda::DataSet(2);
   for (int i = 0; i < 10; i++)
   {
      moda::Point* newPoint = new moda::Point(2);
      newPoint->ObjectiveValues[0] = i * 0.1;
      newPoint->ObjectiveValues[1] = i * 0.1;
      dataSet->add(newPoint);
   }
}

Loading dataset from file

#include <moda\DataSet.h>
#include <iostream>
int main()
{
   moda::DataSet* dataSet = moda::DataSet::LoadFromFilename("C://Users//kubad//moda//moda//sample-file//linear_d4n100_1");
   std::cout << "DataSet loaded. Number of points: " << dataSet->getParameters()->nPoints;
   std::cout << ", Number of objectives: " << dataSet->getParameters()->NumberOfObjectives << std::endl;
}

Loading multiple datasets from a single directory

#include <moda\DataSet.h>
#include <iostream>
int main()
{
   std::vector<moda::DataSet*> datasets = moda::DataSet::LoadBulk("C://Users//kubad//Downloads//HVE//HVE//source-code//data//data");
   for (moda::DataSet* dataset : datasets)
   {
      std::cout << "Dataset: " << dataset->getParameters()->filename << std::endl;
   }
}

Pruning datasset with dominated points with ND-Tree

#include <moda\DataSet.h>
#include <iostream>
int main()
{
   moda::DataSet* dataSet = moda::DataSet::LoadFromFilename("dominated_points");
   std::cout << "Loaded dataset with: " << dataSet->points.size() << " points." << std::endl;
   moda::NDTree ndtree = dataSet->toNDTree();
   moda::DataSet* nonDominatedDataSet = ndtree.toDataSet();
   std::cout << "Dataset size after ND-Tree pruning: " << nonDominatedDataSet->points.size() << " points." << std::endl;
   moda::IQHVParameters* parameters = new moda::IQHVParameters(moda::SolverParameters::ReferencePointCalculationStyle::zeroone, moda::SolverParameters::ReferencePointCalculationStyle::zeroone);
   moda::IQHVSolver solver;
   auto result = solver.Solve(dataSet, *parameters);
   auto resultPruned = solver.Solve(nonDominatedDataSet, *parameters);
   std::cout << "Solution 1: " << result->HyperVolume << " time: " << result->ElapsedTime << "ms" << std::endl;
   std::cout << "Solution 2: " << resultPruned->HyperVolume << " time: " << resultPruned->ElapsedTime << "ms";
   return 0;
}
// Result:
//   Loaded dataset with: 15 points.
//   Dataset size after ND-Tree pruning: 10 points.
//   Solution 1: 0.3825 time: 1ms
//   Solution 2: 0.3825 time: 0ms