StructureAnalysis/TODO at master · openalea/StructureAnalysis · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
TODO list for `StructureAnalysis` to be updated / completed. We target a next official release of the package that must integrate with openalea CI/CD workflow and work on all plateforms.

[ ] Repare `stat_tool` and `sequence_analysis` tests
    + create a new branch for each new development (do not systematically use new_python_api)
    + ~~update tests to be compliant with python 3~~
    + Gather all tutorials and data in consistent repositories
    + ~~Switch to pytest instead of nosetests~~
    + How to generate documentation: C++, python (does sphinx make all?). autodoc / autosum not working? Probably replace all .rst tutorials by jupyter notebooks.
    + ~~Fix SConscript for multiple C++ tests~~
    + Default EM initialization in switching LM estimation
    + Memcheck semi-Markov switching LM Simulation (C++ test in mode DEBUG)
    + Try tests in sequences_analysis
    + ~~export set_seed in stat_tool.h, fix C++ functioning~~
    + use valgrind to find memory errors (possibly due to probability computations in NB distributions, Distribution::copy)
    + find the cause of "hidden_semi_markov::likelihood_computation" displaying different results on the same sequence in DEBUG mode (check this behaviour)
    + either make HEADER_OS statements work in sequence_analysis wrappers or remove them.
    + merge implementations of Mixture and MultivariateMixture (same semantics)
    + matplotlib output to Survival (remove gnuplot)
    + Translate C++ comments
    + let enumerated types be systematically transferred from python to wrappers to C++ by intermediate conversions to int
    + check consistency of naming and methods vs. global functions (develop methods, reduce global functions). Maybe hide some low-level methods defined in low-level wrappers.
    + think about a more automated manner to handle and wrap enumerated types, possibly by replacing them by classes with dictionary and inverse dictionary: {0: "VALUE0", 1: "VALUE1"} and {"VALUE0": 0, "VALUE1": 1}
    + currently, distributions are picked withing a closed list and reimplemented from scratch. Consider finding existing C++ libraries with python interface (scipy, scikitlearn, boost?) and extend base classes to add estimation methods.
    + Reread the scikitlearn manual and get inspired by their API.
    + nosetests raised an error in test_mixture_functional.py (should be solved by robust_path?)
        File "StructureAnalysis/stat_tool/src/openalea/stat_tool/__init__.py", line 24, in get_shared_data
            return pj(shared_data_path, file)
        File "openalea/lib/python3.10/posixpath.py", line 76, in join
            a = os.fspath(a)
        TypeError: expected str, bytes or os.PathLike object, not NoneType
    + python
    + Update C++ version
    + Once downloadable, put a pointer on INRAE Gitlab INCA-HSMM
[ ] Add `stat_tool` and `sequence_analysis` documentation, including file formats
[ ] Fix auto-build: missing .conda file?
[ ] Specify and document classes design patterns (e.g. should all wrappers be adaptable to a potential `Rcpp` implementation)
    + Most models are composed by elementary distributions chained or aggregated in some manner. Data sets are associated with models estimated from these data through pointers. When displaying (text) or plotting (matplotlib) distributions, this allows the elementary distributions to be compared to histograms or curves.
    + Currently, multiple dispatch is handled manually through tedious nested tests on types and parameter values. Consider using https://pypi.org/project/multipledispatch/ instead to clarify code and minimize effort. Currently, multiple dispatch is extended to global functions, for example if Class1.Estimate(int, str), Class1.Estimate(int, int), Class2.Estimate(int, str), Class2.Estimate(int, int) exist, method Estimate is overloaded both in each class and globally, through a global function Estimate(obj, *args, *kwargs) that calls obj.Estimate(*args, *kwargs). This principle should not be maintained.

[ ] List prioritary functionalities to be implemented
    + Several functionalities in stat_tool are possibly already implemented in scikitlearn (hierarchical clustering, ...). Currently the feasibility of wrapping them has been investigated but we could spend more time on hidden markov models (Semi, Variable-order), which are more specific to the module.