CS019 from Brown University

modified 20/04/2024 22:53

Intro
Big-oh notation
- TODO Big-O lab
Sortacle
Tables
- Virtual Art Store
  - 1
  - 2
  - 3
  - 4
  - 5
- Titanic
  - Program plan
Sets
- How to represent sets as lists:

Intro

I’ve started doing this like 8 months ago, but then got a job and didn’t have the time/motivation to work on this. Today ([2023-10-19 Thu]), I’ve decided to continue working on this, since a lot of the things it teaches are things that I’ve used and have benefited from, as well as prerequisites to more interesting courses that I plan to continue doing (simple data structures, big-oh analysis, well-known cs conepts & so on). I’ll pick up from where I left off – big-oh analysis, specifically the big-oh lab, and the data scripting assignment. I’ll also be writing here as I work through the assignments and readings. Let’s go!

Big-oh notation

essentially allows us to predict stuff about the performance of a program looking only at its source code; not all sciences have something like This
big-oh is a function: input is the size of the initial input, output is how long the function took; big-oh is really just the upper bound (what’s the most time it can take)
O(3k) is really the same as O(k), because the constant (3) doens’t matter, you can just have a bigger c constant. then there’s a question of “what is big enough”? is something like 100000*O(k) big enough to be bigger than O(k^2)? well not really, you just take O(100000*k) <= 100000*O(k^2); but why can’t you then take k as the constant? well, it’s not really a constant anymore: O(k * k) <= k * O(k^2), but since k is a variable number – not a constant, you can’t do that.

TODO Big-O lab

Sortacle

not sure if what i did passes all tests, but it passes all tests i could’ve came up with. however, i think i’m cheating by making is-valid just sort and check? oh, so reading OP’s solution, it seems that he just assumes that the functions will be sorted, and he just checks instead that everything matches.. ok, after reading the paper, it seems that my solution is actually incorect, as it will only work for testing one kind of sorting – ascending, based only on age. instead, the point was to test that ANY sorter function works. i should probably go back and change it…

problem we’re trying to solve (something i’ve also have had to deal with at work) is that we write concrete tests for code that isn’t concrete (in this case, when ages of 2 people are equal, the ordering does not matter, but a test-case where it does would fail, because it would check the wrog thing);
assignment is writing a function that check the relationships, instead of concrete results
basically, we
1. write the random generator
2. then a function that checks pariwise that the elements from the first is <= the element from the other, for every element in the lists,
  - cases here:
    1. [3, 1, 2] and [1, 2, 3]: is 3 > 1: yes -> pop 3 and 1, set 3 as curent max; is 1 > 2: no
3. then a function that takes a function and calls is-valid on a list of 50 randomly-generated results, and the result of applying said function to that list (not sure if 50 is the best number)

Tables

Virtual Art Store

First assignment requires making a program that will find and calculate the price of a artwork using two tables. The code itself doesn’t differ a lot accross them, though, since the logic is essentially the same for all of the tasks.

All of the assignments require a “program plan” (they don’t really mention what that means, so I just did some thinking and came up with a pseudo-code implementation of what I’m supposed to do an went from there, researching the table functions that will allow me to filter data, how to access the values of the rows etc.)

https://dcic-world.org/2023-02-21/processing-tables.html#%28part._task-plans%29 Seems like good advice.

1

For this, we don’t need to handle exceptions in any way. This means we’re not supossed to specifically treat cases where the artwork or the currency isn’t in the table isn’t in the table.

Find artwork by id in artwork table
If artwork has same currency as the one we need, return artwork["cost"]
Otherwise, find the currency in the currency table where currency["from-c"] \== artwork["currency"] && currency["to-c"] \== required-currency. Return artwork["cost"] * currency.

Here’s my implementation:

fun get-first(predicate :: (Row -> Boolean), table :: Table) -> Row:
      table.filter(predicate).row-n(0)
where:
      get-first((lam(x): x["foo"] == 1 end), table: foo row: 1 end) is [T.raw-row: {"foo"; 1}]
      # duplicate
      get-first((lam(x): x["foo"] == 1 end), table: foo row: 1 row: 1 end) is [T.raw-row: {"foo"; 1}]
      # not found
      get-first((lam(x): x["foo"] == 2 end), table: foo row: 1 end) raises "row-n-too-large"
end

fun get-art-in-1(art-table :: Table, currency-table :: Table, art_id :: Number, curr :: String) -> Number:
      artwork = get-first((lam(a): a["id"] == art_id end), art-table)
      art-curr = artwork["currency"]
      art-cost = artwork["cost"]
      if art-curr == curr:
              art-cost
      else:
              conv-curr = get-first((lam(c): (c["from-c"] == art-curr) and (c["to-c"] == curr) end), currency-table)
              conv-curr["conv-rate"] * art-cost
end
where:
      # we don't have to worry about these 2 cases:
      # 1. the article isn't in the table
      # get-art-in-1(artt(1, 10, "eur"), curt("usd", "mdl", 18), 2, "eur") is ???
      # 2. the currency isn't in the table
      # get-art-in-1(artt(1, 10, "eur"), curt("usd", "mdl", 18), 1, "usd") is ???
      get-art-in-1(one-art-table, no-curr-table, 1, "usd") is 10
      get-art-in-1(one-art-table, one-curr-table, 1, "mdl") is 10 * 18
end

I defined a generic get-first that will not handle duplicates/missing rows in any way.

2

This is esentially the same as the previous task but we’re supposed to raise an exception if some rows are duplicated or no row is found. The only thing that changes here is the function we use to do the actual lookup – get-first-or-raise.

Find artwork by id in artwork table
- If not found or duplicated, raise an error
If artwork has same currency as the one we need, return artwork["cost"]
Otherwise, find the currency in the currency table where currency["from-c"] \== artwork["currency"] && currency["to-c"] \== required-currency.

Return artwork["cost"] * currency

If not found or duplicated, raise an error

fun get-first-or-raise(predicate :: (Row -> Boolean), table :: Table) -> Row:
      matching-rows = table.filter(predicate)
      if matching-rows.length() == 0:
              raise("Row matching predicate not found")
      else if matching-rows.length() > 1:
              raise("Multiple rows matched the predicate")
      else:
              matching-rows.row-n(0)
      end
where:
      get-first-or-raise((lam(x): x["foo"] == 1 end), table: foo row: 1 end) is get-first((lam(x): x["foo"] == 1 end), table: foo row: 1 end)
      get-first-or-raise((lam(x): x["foo"] == 1 end), table: foo end) raises "Row matching predicate not found"
      get-first-or-raise((lam(x): x["foo"] == 1 end), table: foo row: 1 row: 1 end) raises "Multiple rows matched the predicate"
end

fun get-art-in-2(art-table :: Table, currency-table :: Table, art_id :: Number, curr :: String) -> Number:
      artwork = get-first-or-raise((lam(a): a["id"] == art_id end), art-table)
  art-curr = artwork["currency"]
  art-cost = artwork["cost"]
  if art-curr == curr:
    art-cost
  else:
    conv-curr = get-first-or-raise((lam(c): (c["from-c"] == art-curr) and (c["to-c"] == curr) end), currency-table)
    conv-curr["conv-rate"] * art-cost
  end
where:
  get-art-in-2(one-art-table, no-curr-table, 1, "usd") is get-art-in-1(one-art-table, no-curr-table, 1, "usd")
  get-art-in-2(one-art-table, one-curr-table, 1, "mdl") is get-art-in-1(one-art-table, one-curr-table, 1, "mdl")
  get-art-in-2(one-art-table, no-curr-table, 1, "mdl") raises "Row matching predicate not found"
end

I’m yet to find a way to generify this for all 3 cases.

3

This is a bit tricky because we can’t use our previously-defined functions, and can’t generify the structure of the function, because there’s additional logic in the case that the currency is not found the first time.

Find artwork by id in artwork table
- If not found or duplicated, raise an error
If artwork has same currency as the one we need, return artwork["cost"]
Otherwise, find the currency in the currency table where currency["from-c"] \== artwork["currency"] && currency["to-c"] \== required-currency.
If found: Return artwork["cost"] * currency. If duplicated, raise an error.
If not found: find a currency where (c["to-c"] \== art-curr) and (c["from-c"] \== curr) (reverse of the currency). If not found or duplicated, raise an error. Return (1 / rev-curr["conv-rate"]) * art-cost

fun get-art-in-3(art-table :: Table, currency-table :: Table, art_id :: Number, curr :: String) -> Number:
  artwork = get-first-or-raise((lam(a): a["id"] == art_id end), art-table)
  art-curr = artwork["currency"]
  art-cost = artwork["cost"]
  if art-curr == curr:
    art-cost
  else:
    maybe-curr = currency-table.filter((lam(c): (c["from-c"] == art-curr) and (c["to-c"] == curr) end))
    if maybe-curr.length() > 1:
      raise("Multiple rows matched the predicate")
    else if maybe-curr.length() == 1:
      maybe-curr.row-n(0)["conv-rate"] * art-cost
    else:
      (1 / (get-first-or-raise((lam(c): (c["to-c"] == art-curr) and (c["from-c"] == curr) end), currency-table)["conv-rate"])) * art-cost
  end
end
where:
  get-art-in-3(one-art-table, no-curr-table, 1, "usd") is get-art-in-2(one-art-table, no-curr-table, 1, "usd")
  get-art-in-3(one-art-table, one-curr-table, 1, "mdl") is get-art-in-2(one-art-table, one-curr-table, 1, "mdl")
  get-art-in-3(one-art-table, no-curr-table, 1, "mdl") raises "Row matching predicate not found"
  get-art-in-3(duplicate-art-table, no-curr-table, 1, "mdl") raises "Multiple rows matched the predicate"
  get-art-in-3(one-art-table, duplicate-curr-table, 1, "mdl") raises "Multiple rows matched the predicate"
  get-art-in-3(one-art-table, inverse-of-one-curr-table, 1, "mdl") is (1 / (1 / 18)) * 10
  get-art-in-3(one-art-table, duplicate-inverse-of-one-curr-table, 1, "mdl") raises "Multiple rows matched the predicate"
end

4

We’re not supposed to write any code for this one, only a Testing Plan. This eans we need to think of a list of cases to test, which will guarantee that our solution is correct. Or, as they put in the Sortacle assignment: suppose you were asked to grade a solution while giving partial credit; (…) decompose the problem into a set of sub-problems, so that the combination of these solves the problem as a whole

No row in either column should appear more than once. If they do, an exception should be raised
If a direct conversion is available, it should be used (if the artwork’s currency is usd and we require usd, no conversion is needed)
If there is a conversion from the currency of the artwork to the required currency, it should be used (if the artwork’s currency is usd and we require eur, but a usd -> eur conversion is found, it should be used)
If there is a conversion from the required currency to the currency of the artwork, it should be used in reverse (if the artwork’s currency is usd and we require eur but a eur -> usd conversion is found, 1/conversion-rate shuold be used)
If there is one or more intermediary conversions that would end with our required currency, it should be used (if the artwork’s currency is usd and we require eur, but a usd -> gbp and gbp -> eur conversion exists, it should be used). It can also include reverses (if a usd->gbp and eur->gbp conversionts are found, it still works as we can do eur->gbp and 1/gbp->eur)
If no such conversions are found, an exception should get raised

5

Again, no code required. We need to compare the strengths and weaknesses of representing the currency as a datatype.

Strengths:
- clearer purpose: we’re specific about what represents a currency and what currencies are allowed
- validation: if we use a currency that isn’t in the datatype, we’ll get an error about it, making sure all input of that column is valid
Weaknesses:
- every time we want to use a new currency, we have to add it to the datatype

Titanic

We’re also required to provide a Program Plan for the assignments.

Program plan

We’ll need a function that extracts the name and a function that extracts the title out of raw-name
We’ll need a function that counts the frequencies of all words. It should produce a list of WordFreq objects.
We’ll need to then go through the freqeuencies and sort them. I’d use a dict if I were in Python, but here we can use a custom datetype, something like

type NameFreq : name :: String, occurences :: Number end. Then sorting and displaying it is easy. We may not even need to know the fact that we’re dealing with a name, and rename NameFreq to something generic, like WordFreq. This way, we can also use it for titles and their frequencies, not only names.

Joining everything together, we’ll have 3 functions: most-popular-male-names which will do something like

map freq.name for every freq in rsort-freq(find-freq(set(filter(get-name if sex is male))))

We’ll have the same thing for females.

map freq.name for every freq in rsort-freq(find-freq(set(filter(get-name if sex is female))))

And for titles we’ll just do find-freq(set(get-title for every person)))

Sets

How to represent sets as lists:

Problems
- sets != lists: set.size() != list.size() if list contains duplicates
- order doesn’t matter for sets; it does for lists
Main question: what do we do about duplicates?
Choice 1: don’t store duplicates
- insert will check if element is already in set (but only once, so it’s not O(k^2), but only O(k)) before inserting
  - this means insert is more expensive, but we get size and is-in (member) for free & they’re faster too
    - insert does a is-in (O(k)) for each element, and then calls link (O(c), c is constant); still O(k)
  - for k elements, length and is-in are O(k) (linear, meaning it grows directly proportional to the number of elements)
- better if we expect a lot of duplicates and we be calling insert and is-in or size often
Choice 2: store duplicates
- insert is link
  - size is constant time (just link)
  - size and is-in will have to check for duplicates
    - size and is-in are going to be linear, but since we’re saving duplicates, k is going to be bigger, so it’ll take more time
    - size will have to check if each element is in the list (O(k)) for each element (O(k)) so we have O(k^2)/quadratic (and since the set contains duplicates, k is going to be way bigger)
- better if we know we won’t have a lot of duplicates and we’ll be calling insert often and size rarely
When the performance depends on a sequence of operatinos big-Oh is too lose to allow measuring each operation, and we should instead measure the entire sequence.
I won’t be doing the excersises, but let’s think about them
- if we use the representation without duplicates remove will be linear (at worst), since we only need to traverse the list once and remove the element (a constant time operation), but we only traverse the entire list if the element we need to remove is the last element of the list.
- if we use the representation with duplicates remove will also be linear, but the constant operation will need to be performed for all of the elements, so it’s linear at best (we need to traverse the entire list no matter what, to make sure there isn’t any other duplicate left)
- it’s impossible for one(s) to produce the same thing as one(others(s)) because one returns a random value, and others may return the same value, but we have only one occurence of each element, thus it doesn’t make sense for one and others to return the same element.
- Set<Any>) -> set<Any>=